Graph Mining and Learning @ NeurIPS
The Graph Mining team at Google is excited to be presenting at the 2020 NeurIPS Conference. Please join us on Sunday, December 6th, at 1PM EST. The Expo information page can be found here. This page will be updated with video links after the workshop.
To read more about the Graph Mining team, check out our research page.
Check out the master slide deck here.
In this talk, Vahab Mirrokni, the founder of the Graph Mining team, introduces Graph Mining and Learning at a high level. This talk touches on what graphs are, why they are important, and where they appear in the world of big data. The talk then dives into the core tools that make up the Graph Mining and Learning toolbox, and lays out several canonical use cases. It also touches upon discussion of how to combine algorithms, systems, and machine learning to build a scalable graph-based learning system in different distributed environments. Finally, it provides a brief history of graph mining and learning projects at Google. The talk sets up the rest of the Expo by introducing common terms and themes that will appear in the following talks.
Modeling COVID with Graph Neural Networks
In this talk, Amol Kapoor discusses how GNNs can be used to predict the change in COVID caseloads in US counties when supplemented with spatio-temporal mobility information. This presentation covers the recent paper: Examining COVID-19 Forecasting using Spatio-Temporal Graph Neural Networks (arxiv).
Using Graph Mining for Privacy
In this talk, Alessandro Epasto reviews applications of graph mining to privacy. First, we will see an application of graph-based clustering to the privacy-preserving Google effort of Federated Learning of Cohort (FLOC). Second, we will discuss research on how to ensure privacy on graph-based applications using on-device computations. For more on FLOC, take a look at the public announcement. This presentation covers the following papers: On-Device Algorithms for Public-Private Data with Absolute Privacy (pdf); and Efficient Algorithms for Private-Public Graphs (pdf).
In this short talk, we look at how clustering can be used to run better randomized experiments. Randomized experiments allow us to estimate causal effects, but such estimation suffers when the units of interest are not independent. Clustering is used to mitigate this problem by avoiding interactions between groups of units with different treatment assignments. We also take a specific look at market experiments where different clustering techniques may be more suitable. This talk covers the following papers: Variance Reduction in Bipartite Experiments through Correlation Clustering (pdf); and Randomized Experimental Design via Geographic Clustering (pdf).
Graph Mining At Scale
Grale: Learning Graphs
In this talk, Jonathan Halcrow discusses the Grale graph building framework, a highly scalable tool for generating learned similarity graphs from arbitrary data. This talk covers the recent paper: Grale: Designing Networks for Graph Learning (arxiv).
In this talk, Alessandro Epasto addresses the following question: how can we measure the similarity of two nodes in a graph? Similarity rankings have important applications ranging from recommender systems, link prediction and anomaly detection. We will review standard techniques in unsupervised graph similarity ranking with a focus on scalable algorithms. We will also show some recent applications of similarity ranking. This presentation covers the following papers: Ego-net Community Mining Applied to Friend Suggestion (pdf); and Reduce and Aggregate: Similarity Rankings in Multi-Categorical Bipartite Graphs (pdf).
Clustering at Scale
In this talk, Vahab Mirrokni provides an overview of clustering at scale. The talk starts with affinity hierarchical clustering (pdf), which provides the backbone of other clustering algorithms. The talk then covers a scalable distributed balanced partitioning algorithm (arxiv), and highlights an application of balanced partitioning in cache-aware load balancing to save 32% flash bandwidth (pdf). Finally, it discusses the techniques of distributed composable core-set and sketching and how they are applied to develop distributed algorithms for k-clustering and k-cover (pdf, pdf).
Jakub “Kuba” Łącki
In this talk, Jakub Łącki presents graph clustering techniques that can be used to find communities in a social network. In addition to reviewing some well-known techniques, the talk introduces a new method for detecting and evaluating communities. The new method exhibits competitive empirical performance and good theoretical properties.
In this talk, Allan Heydon describes one of Google’s systems for doing large-scale semi-supervised learning via label propagation. The algorithm requires only a small fraction of the input data instances to be labeled, and works by iteratively propagating labels along the edges of a similarity graph. Because it is implemented as a massively parallel computation, it scales to graphs with XT edges, XXXB nodes, and potentially millions of distinct labels. Due to the generality of the data model, it can be applied to a wide variety of problems, including spam/abuse detection, image/video labeling, natural language processing, noisy label cleaning, and label augmentation for downstream supervised model training. More can be found on the Google Research Blog. This presentation covers the paper: Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation (arxiv).
Graph Neural Networks
GNNs and Graph Embeddings
In this talk, Bryan Perozzi presents an overview of Graph Embeddings and Graph Convolutions. The talk begins with a high level discussion of graph embeddings – how they are created and why they are useful. The talk then shifts to talk about Graph Convolutions. It covers how graph embeddings are used in graph convolutions, and why graph convolutional networks provide a flexible and powerful way of incorporating node context in a single unifying deep ml framework. Finally, the talk closes out with a brief discussion of some of the challenges of graph learning. This talk covers the following papers: DeepWalk: Online Learning of Social Representations (pdf); Semi-Supervised Classification with Graph Convolutional Networks (arxiv); Neural Message Passing for Quantum Chemistry (arxiv); N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification (arxiv); and MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing (arxiv).
PPRGo: GNNs at Scale
In this talk, Amol Kapoor talks about some of the challenges with running GNNs at scale, and presents a solution called PPRGo. This presentation covers the recent paper: Scaling Graph Neural Networks with Approximate PageRank (arxiv).
In this talk, John Palowitch talks about a training-time projection for debiasing graph representations learned from unsupervised algorithms. This presentation covers the recent paper: Debiasing Graph Embeddings via the Metadata-Orthogonal Training Unit (arxiv).
Learning Multiple Embeddings
In this talk, Alessandro Epasto presents recent advances in learning graph embeddings. We will show a novel methodology to learning multiple embeddings per node that allows us to understand better the community structure of the graph and obtain improved results in downstream ML tasks such as link prediction. Our method is based on the Persona Graph method, a novel framework for graph analysis that identifies clusters in complex networks through the use of ego-network analysis. This presentation covers the following papers: Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social Contexts (arxiv); and Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters (pdf).
Algorithms, Systems, and Scalability
Graph Neural Networks in TensorFlow
In this talk, Martin Blais discusses the infrastructure required to train graph neural network models at Google scale. We introduce a soon-to-be open-sourced TensorFlow library (“TF GNN”, or “graph tensors”) and a set of associated tools to prepare, represent and stream irregular graph-shaped data concurrently as TensorFlow tensors and build GNN models on top of it. We also discuss our approach to scalability and details about the implementation.
Distributed Graph Algorithms
Jakub “Kuba” Łącki
In this talk, Jakub Łącki describes the challenges and techniques for processing trillion-edge graphs. The talk discusses how practical aspects of running a distributed computation are captured in the theoretical computation models, as well as how modelling and algorithmic advancements result in better empirical running times. This talk covers material from multiple research papers, including: Connected Components at Scale via Local Contractions (arxiv), Massively Parallel Computation via Remote Memory Access (arxiv), and Parallel Graph Algorithms in Constant Adaptive Rounds: Theory meets Practice (arxiv).
Multi-core Parallel Graph Clustering
Jakub “Kuba” Łącki
In this talk, Jakub Łącki presents single-machine parallel clustering algorithms that can cluster graphs of billions of edges in a few minutes.
The presentations above are based on the work of many folks in Graph Mining. Without their hard work and dedication, this expo could not happen. Special thanks to:
Aaron Archer, André Linhares, Andrew Tomkins, Arjun Gopalan, Ashkan Fard, CJ Carey, David Eisenstat, Dustin Zelle, Filipe Almeida, Hossein Esfandiari, Kevin Aydin, Jason Lee, Matthew Fahrbach, MohammadHossein Bateni, Nikos Parotsidis, Reah Miyara, Sam Ruth, Silvio Lattanzi, Warren Schudy, and many collaborators.