Joseph Gonzalez Co-Founder, GraphLab Inc. [email protected] Postdoc, UC Berkeley AMPLab [email protected] Machine Learning on Graphs
May 11, 2015
Joseph Gonzalez Co-Founder, GraphLab Inc.
[email protected] Postdoc, UC Berkeley AMPLab
Machine Learning on Graphs
6. Before
8. After
7. After
2!
Big Graphs Data
More Signal
More Noise
Social Media
Graphs encode relationships between:
Big: billions of vertices and edges & rich metadata Facebook (10/2012): 1B users, 144B friendships Twi>er (2011): 15B follower edges
Advertising Science Web
People Facts
Products Interests
Ideas
3
Graphs are Essential to "Data Mining and Machine Learning Identify influential people and information Find communities
Understand people’s shared interests Model complex data dependencies
Liberal Conservative
Post
Post
Post
Post
Post
Post
Post
Post
Predicting User Behavior
Post
Post
Post
Post
Post
Post
Post
Post
Post
Post
Post
Post
Post
Post
? ?
?
?
? ?
?
? ? ?
?
?
? ?
? ?
?
?
?
?
?
?
?
?
?
?
?
? ?
?
5
Conditional Random Field!Belief Propagation!
Count triangles passing through each vertex: "
Measures “cohesiveness” of local community
More Triangles Stronger Community
Fewer Triangles Weaker Community
1 2 3
4
Finding Communities
Ratings Items
Recommending Products Users
Low-Rank Matrix Factorization:
8
r13
r14
r24
r25
f(1)
f(2)
f(3)
f(4)
f(5) User
Fac
tors
(U)
Movie Factors (M
) Us
ers Movies
Netflix Us
ers
≈ x
Movies
f(i)
f(j)
Iterate:
f [i] = arg minw2Rd
X
j2Nbrs(i)
�rij � wT f [j]
�2+ �||w||22
Recommending Products
9
Identifying Leaders
Everyone starts with equal ranks Update ranks in parallel Iterate until convergence
Rank of user i Weighted sum of
neighbors’ ranks
10
R[i] = 0.15 +X
j2Nbrs(i)
wjiR[j]
Identifying Leaders
Graph-Parallel Algorithms
11
Model / Alg. State
Computation depends only on the neighbors
Many More Graph Algorithms • Collaborative Filtering!
– Alternating Least Squares!– Stochastic Gradient Descent!– Tensor Factorization!– SVD!
• Structured Prediction!– Loopy Belief Propagation!– Max-Product Linear
Programs!– Gibbs Sampling!
• Semi-supervised ML!– Graph SSL !– CoEM!
• Graph Analytics!– PageRank!– Shortest Path!– Triangle-Counting!– Graph Coloring!– K-core Decomposition!– Personalized PageRank!
• Classification!– Neural Networks!– Lasso!…!
12
How should we program"graph-parallel algorithms?
13
Dependency Graph
Table
Structure of Computation
14
Result
Data-Parallel Graph-Parallel
Row
Row
Row
Row
6. Before
8. After
7. After
How should we program"graph-parallel algorithms?
“Think like a Vertex.” - Pregel [SIGMOD’10]
15
The Graph-Parallel Abstraction A user-defined Vertex-Program runs on each vertex Graph constrains interaction along edges
Using messages (e.g. Pregel [PODC’09, SIGMOD’10]) Through shared state (e.g., GraphLab [UAI’10, VLDB’12])
Parallelism: run multiple vertex programs simultaneously 16
The GraphLab Vertex Program Vertex Programs directly access adjacent vertices and edges
GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach (j in neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = 0.15 + total // Trigger neighbors to run again if R[i] not converged then signal nbrsOf(i) to be recomputed
17
R[4] * w41
+ +
4 1
3 2
Signaled vertices are recomputed eventually.
Convergence of Dynamic PageRank
1 100
10000 1000000
100000000
0 10 20 30 40 50 60 70
Num
-‐Ver1ces
Number of Updates
51% updated only once!
Be>er
18
Adaptive Belief Propagation
Noisy “Sunset” Image
Cumulative Vertex Updates
Many Updates
Few Updates
Algorithm identifies and focuses on hidden sequential structure
Graphical Model
Challenge = Boundaries
Splash
BeDer for Machine Learning
Graph-‐parallel Abstrac(ons
20
Shared State
6. Before
8. After
7. After
i
Dynamic Asynchronous
Messaging
i
Synchronous
21
Natural GraphsGraphs derived from natural
phenomena
Properties of Natural Graphs
22
Power-Law Degree Distribution
Regular Mesh Natural Graph
Power-Law Degree Distribution
23
“Star Like” Motif
President Obama Followers
Challenges of High-‐Degree VerMces
Touches a large fracMon of graph
SequenMally process edges
24
CPU 1 CPU 2
Provably Difficult to ParMMon
Machine 1 Machine 2
Random ParMMoning
• GraphLab resorts to random (hashed) parMMoning on natural graphs
3"2"
1"
D
A"
C"
B" 2"3"
C"
D
B"A"
1"
D
A"
C"C"
B"
(a) Edge-Cut
B"A" 1"
C" D3"
C" B"2"
C" D
B"A" 1"
3"
(b) Vertex-Cut
Figure 4: (a) An edge-cut and (b) vertex-cut of a graph intothree parts. Shaded vertices are ghosts and mirrors respectively.
5 Distributed Graph Placement
The PowerGraph abstraction relies on the distributed data-graph to store the computation state and encode the in-teraction between vertex programs. The placement ofthe data-graph structure and data plays a central role inminimizing communication and ensuring work balance.
A common approach to placing a graph on a cluster of pmachines is to construct a balanced p-way edge-cut (e.g.,Fig. 4a) in which vertices are evenly assigned to machinesand the number of edges spanning machines is minimized.Unfortunately, the tools [21, 31] for constructing balancededge-cuts perform poorly [1, 26, 23] or even fail on power-law graphs. When the graph is difficult to partition, bothGraphLab and Pregel resort to hashed (random) vertexplacement. While fast and easy to implement, hashedvertex placement cuts most of the edges:
Theorem 5.1. If vertices are randomly assigned to pmachines then the expected fraction of edges cut is:
E|Edges Cut|
|E|
�= 1� 1
p(5.1)
For example if just two machines are used, half of theof edges will be cut requiring order |E|/2 communication.
5.1 Balanced p-way Vertex-CutThe PowerGraph abstraction enables a single vertex pro-gram to span multiple machines. Hence, we can ensurework balance by evenly assigning edges to machines.Communication is minimized by limiting the number ofmachines a single vertex spans. A balanced p-way vertex-cut formalizes this objective by assigning each edge e2 Eto a machine A(e) 2 {1, . . . , p}. Each vertex then spansthe set of machines A(v)✓ {1, . . . , p} that contain its ad-jacent edges. We define the balanced vertex-cut objective:
minA
1|V | Â
v2V|A(v)| (5.2)
s.t. maxm
|{e 2 E | A(e) = m}|< l |E|p
(5.3)
where the imbalance factor l � 1 is a small constant. Weuse the term replicas of a vertex v to denote the |A(v)|copies of the vertex v: each machine in A(v) has a replicaof v. The objective term (Eq. 5.2) therefore minimizes the
average number of replicas in the graph and as a conse-quence the total storage and communication requirementsof the PowerGraph engine.
Vertex-cuts address many of the major issues associatedwith edge-cuts in power-law graphs. Percolation theory[3] suggests that power-law graphs have good vertex-cuts.Intuitively, by cutting a small fraction of the very highdegree vertices we can quickly shatter a graph. Further-more, because the balance constraint (Eq. 5.3) ensuresthat edges are uniformly distributed over machines, wenaturally achieve improved work balance even in the pres-ence of very high-degree vertices.
The simplest method to construct a vertex cut is torandomly assign edges to machines. Random (hashed)edge placement is fully data-parallel, achieves nearly per-fect balance on large graphs, and can be applied in thestreaming setting. In the following we relate the expectednormalized replication factor (Eq. 5.2) to the number ofmachines and the power-law constant a .
Theorem 5.2 (Randomized Vertex Cuts). Let D[v] denotethe degree of vertex v. A uniform random edge placementon p machines has an expected replication factor
E"
1|V | Â
v2V|A(v)|
#=
p|V | Â
v2V
1�✓
1� 1p
◆D[v]!. (5.4)
For a graph with power-law constant a we obtain:
E"
1|V | Â
v2V|A(v)|
#= p� pLia
✓p�1
p
◆/z (a) (5.5)
where Lia (x) is the transcendental polylog function andz (a) is the Riemann Zeta function (plotted in Fig. 5a).
Higher a values imply a lower replication factor, con-firming our earlier intuition. In contrast to a random 2-way edge-cut which requires order |E|/2 communicationa random 2-way vertex-cut on an a = 2 power-law graphrequires only order 0.3 |V | communication, a substantialsavings on natural graphs where E can be an order ofmagnitude larger than V (see Tab. 1a).
5.2 Greedy Vertex-CutsWe can improve upon the randomly constructed vertex-cut by de-randomizing the edge-placement process. Theresulting algorithm is a sequential greedy heuristic whichplaces the next edge on the machine that minimizes theconditional expected replication factor. To construct thede-randomization we consider the task of placing the i+1edge after having placed the previous i edges. Using theconditional expectation we define the objective:
argmink
E"
Âv2V
|A(v)|
����� Ai,A(ei+1) = k
#(5.6)
6
10 Machines ! 90% of edges cut 100 Machines ! 99% of edges cut!
25
Machine 1 Machine 2
• Split High-‐Degree verMces • New Abstrac1on ! Equivalence on Split Ver(ces
26
Program For This
Run on This
Gather Informa1on About Neighborhood
Update Vertex
Signal Neighbors & Modify Edge Data
A Common Pattern inVertex Programs
GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = total // Trigger neighbors to run again priority = |R[i] – oldR[i]| if R[i] not converged then signal neighbors(i) with priority
27
Machine 2 Machine 1
Machine 4 Machine 3
GAS Decomposition
Σ1 Σ2
Σ3 Σ4
+ + +
Y Y Y Y
Y’
Σ
Y’ Y’ Y’ Gather
Apply
Sca>er
28
Master
Mirror
Mirror Mirror
Minimizing Communication in PowerGraph
Y Y Y
A vertex-cut minimizes "machines each vertex spans
Percolation theory suggests that power law graphs have good vertex cuts. [Albert et al. 2000]
Communication is linear in "the number of machines "
each vertex spans
29
EC2 HPC Nodes
MPI/TCP-‐IP PThreads HDFS
GraphLab2 System
Graph AnalyMcs
Graphical Models
Computer Vision Clustering Topic
Modeling CollaboraMve
Filtering
Machine Learning and Data-Mining Toolkits
Apache 2 License http://graphlab.org
PageRank on Twitter Follower Graph Natural Graph with 40M Users, 1.4 Billion Links
Hadoop results from [Kang et al. '11] Twister (in-memory MapReduce) [Ekanayake et al. ‘10]
31
0 50 100 150 200
Hadoop
GraphLab
Twister
Piccolo
PowerGraph
Run1me Per Itera1on
Order of magnitude by exploiting
properties of Natural Graphs
GraphLab2 is Scalable Yahoo Altavista Web Graph (2002):
One of the largest publicly available web graphs
1.4 Billion Webpages, 6.6 Billion Links
1024 Cores (2048 HT) 64 HPC Nodes
7 Seconds per Iter. 1B links processed per second
30 lines of user code
32
Topic Modeling English language Wikipedia
– 2.6M Documents, 8.3M Words, 500M Tokens – Computationally intensive algorithm
33
0 20 40 60 80 100 120 140 160
Smola et al.
PowerGraph
Million Tokens Per Second
100 Yahoo! Machines Specifically engineered for this task
64 cc2.8xlarge EC2 Nodes 200 lines of code & 4 human hours
Counted: 34.8 Billion Triangles
34
Triangle Counting on Twitter
64 Machines 15 Seconds
1536 Machines 423 Minutes
Hadoop���[WWW’11]
S. Suri and S. Vassilvitskii, “CounMng triangles and the curse of the last reducer,” WWW’11
1000 x Faster
40M Users, 1.4 Billion Links
Orders of magnitude improvements over existing systems
New ways execute graph algorithms
Machine 1 Machine 2
New ways to represent real-world graphs
6. Before
8. After
7. After
By exploiting common patterns in graph data and computation:
6. Before
8. After
7. After
Possibility
Scalability
Usability
Exciting Time to Work in ML
With ML, I will" cure cancer!!!
With ML I will "find true love.
Why won’t "ML read"
my mind???
L Building scalable learning system requires experts …
J Unique opportunities to change the world!!
ML key to any new service we want to build
But…
Even basics of scalable ML can be challenging
>6 months from prototype to producMon
State-‐of-‐art ML algorithms trapped in research papers
Goal of GraphLab 3: Make large-scale machine learning accessible to all! J
EC2 HPC Nodes
MPI/TCP-‐IP PThreads HDFS
GraphLab2 System
Graph AnalyMcs
Graphical Models
Computer Vision Clustering Topic
Modeling CollaboraMve
Filtering
Adding a Python Layer
Python API
Learning ML with GraphLab Notebook
https://beta.graphlab.com/examples!
Prototype to Productionwith Python GraphLab:
Easily install & prototype locally
Deploy to the cluster in one step
Learn: GraphLab Notebook
Prototype: pip install graphlab
è local prototyping
Production: Same code scales
to EC2 cluster
GraphLab Toolkits
Highly scalable, state-of-the-art machine learning straight from python
Graph Analytics
GraphicalModels
ComputerVision Clustering Topic
Modeling Collaborative
Filtering
Joseph Gonzalez Co-Founder, GraphLab Inc.
NIPS Workshop on Big Learning: biglearn.org"Lake Tahoe, December 9th
Machine Learning on Graphs
6. Before
8. After
7. After