Joey gonzalez, graph lab, m lconf 2013

Joseph Gonzalez Co-Founder, GraphLab Inc.

[email protected] Postdoc, UC Berkeley AMPLab

[email protected]

Machine Learning on Graphs

6. Before

8. After

7. After

2!

Big Graphs Data

More Signal

More Noise

Social Media

Graphs encode relationships between:

Big: billions of vertices and edges & rich metadata Facebook (10/2012): 1B users, 144B friendships Twi>er (2011): 15B follower edges

Advertising Science Web

People Facts

Products Interests

Ideas

3

Graphs are Essential to "Data Mining and Machine Learning Identify influential people and information Find communities

Understand people’s shared interests Model complex data dependencies

Liberal Conservative

Post

Post

Post

Post

Post

Post

Post

Post

Predicting User Behavior

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

? ?

?

?

? ?

?

? ? ?

?

?

? ?

? ?

?

?

?

?

?

?

?

?

?

?

?

? ?

?

5

Conditional Random Field!Belief Propagation!

Count triangles passing through each vertex: "

Measures “cohesiveness” of local community

More Triangles Stronger Community

Fewer Triangles Weaker Community

1 2 3

4

Finding Communities

Ratings Items

Recommending Products Users

Low-Rank Matrix Factorization:

8

r13

r14

r24

r25

f(1)

f(2)

f(3)

f(4)

f(5) User

Fac

tors

(U)

Movie Factors (M

) Us

ers Movies

Netflix Us

ers

≈ x

Movies

f(i)

f(j)

Iterate:

f [i] = arg minw2Rd

X

j2Nbrs(i)

�rij � wT f [j]

�2+ �||w||22

Recommending Products

9

Identifying Leaders

Everyone starts with equal ranks Update ranks in parallel Iterate until convergence

Rank of user i Weighted sum of

neighbors’ ranks

10

R[i] = 0.15 +X

j2Nbrs(i)

wjiR[j]

Identifying Leaders

Graph-Parallel Algorithms

11

Model / Alg. State

Computation depends only on the neighbors

Many More Graph Algorithms •  Collaborative Filtering!

–  Alternating Least Squares!–  Stochastic Gradient Descent!–  Tensor Factorization!–  SVD!

•  Structured Prediction!–  Loopy Belief Propagation!–  Max-Product Linear

Programs!–  Gibbs Sampling!

•  Semi-supervised ML!– Graph SSL !– CoEM!

•  Graph Analytics!–  PageRank!–  Shortest Path!–  Triangle-Counting!–  Graph Coloring!–  K-core Decomposition!–  Personalized PageRank!

•  Classification!– Neural Networks!–  Lasso!…!

12

How should we program"graph-parallel algorithms?

13

Dependency Graph

Table

Structure of Computation

14

Result

Data-Parallel Graph-Parallel

Row

Row

Row

Row

6. Before

8. After

7. After

How should we program"graph-parallel algorithms?

“Think like a Vertex.” - Pregel [SIGMOD’10]

15

The Graph-Parallel Abstraction A user-defined Vertex-Program runs on each vertex Graph constrains interaction along edges

Using messages (e.g. Pregel [PODC’09, SIGMOD’10]) Through shared state (e.g., GraphLab [UAI’10, VLDB’12])

Parallelism: run multiple vertex programs simultaneously 16

The GraphLab Vertex Program Vertex Programs directly access adjacent vertices and edges

GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach (j in neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = 0.15 + total // Trigger neighbors to run again if R[i] not converged then signal nbrsOf(i) to be recomputed

17

R[4] * w41

+ +

4 1

3 2

Signaled vertices are recomputed eventually.

Convergence of Dynamic PageRank

1 100

10000 1000000

100000000

0 10 20 30 40 50 60 70

Num

-‐Ver1ces

Number of Updates

51% updated only once!

Be>er

18

Adaptive Belief Propagation

Noisy “Sunset” Image

Cumulative Vertex Updates

Many Updates

Few Updates

Algorithm identifies and focuses on hidden sequential structure

Graphical Model

Challenge = Boundaries

Splash

BeDer for Machine Learning

Graph-‐parallel Abstrac(ons

20

Shared State

6. Before

8. After

7. After

i

Dynamic Asynchronous

Messaging

i

Synchronous

21

Natural GraphsGraphs derived from natural

phenomena

Properties of Natural Graphs

22

Power-Law Degree Distribution

Regular Mesh Natural Graph

Power-Law Degree Distribution

23

“Star Like” Motif

President Obama Followers

Challenges of High-‐Degree VerMces

Touches a large fracMon of graph

SequenMally process edges

24

CPU 1 CPU 2

Provably Difficult to ParMMon

Machine 1 Machine 2

Random ParMMoning

•  GraphLab resorts to random (hashed) parMMoning on natural graphs

3"2"

1"

D

A"

C"

B" 2"3"

C"

D

B"A"

1"

D

A"

C"C"

B"

(a) Edge-Cut

B"A" 1"

C" D3"

C" B"2"

C" D

B"A" 1"

3"

(b) Vertex-Cut

Figure 4: (a) An edge-cut and (b) vertex-cut of a graph intothree parts. Shaded vertices are ghosts and mirrors respectively.

5 Distributed Graph Placement

The PowerGraph abstraction relies on the distributed data-graph to store the computation state and encode the in-teraction between vertex programs. The placement ofthe data-graph structure and data plays a central role inminimizing communication and ensuring work balance.

A common approach to placing a graph on a cluster of pmachines is to construct a balanced p-way edge-cut (e.g.,Fig. 4a) in which vertices are evenly assigned to machinesand the number of edges spanning machines is minimized.Unfortunately, the tools [21, 31] for constructing balancededge-cuts perform poorly [1, 26, 23] or even fail on power-law graphs. When the graph is difficult to partition, bothGraphLab and Pregel resort to hashed (random) vertexplacement. While fast and easy to implement, hashedvertex placement cuts most of the edges:

Theorem 5.1. If vertices are randomly assigned to pmachines then the expected fraction of edges cut is:

E|Edges Cut|

|E|

�= 1� 1

p(5.1)

For example if just two machines are used, half of theof edges will be cut requiring order |E|/2 communication.

5.1 Balanced p-way Vertex-CutThe PowerGraph abstraction enables a single vertex pro-gram to span multiple machines. Hence, we can ensurework balance by evenly assigning edges to machines.Communication is minimized by limiting the number ofmachines a single vertex spans. A balanced p-way vertex-cut formalizes this objective by assigning each edge e2 Eto a machine A(e) 2 {1, . . . , p}. Each vertex then spansthe set of machines A(v)✓ {1, . . . , p} that contain its ad-jacent edges. We define the balanced vertex-cut objective:

minA

1|V | Â

v2V|A(v)| (5.2)

s.t. maxm

|{e 2 E | A(e) = m}|< l |E|p

(5.3)

where the imbalance factor l � 1 is a small constant. Weuse the term replicas of a vertex v to denote the |A(v)|copies of the vertex v: each machine in A(v) has a replicaof v. The objective term (Eq. 5.2) therefore minimizes the

average number of replicas in the graph and as a conse-quence the total storage and communication requirementsof the PowerGraph engine.

Vertex-cuts address many of the major issues associatedwith edge-cuts in power-law graphs. Percolation theory[3] suggests that power-law graphs have good vertex-cuts.Intuitively, by cutting a small fraction of the very highdegree vertices we can quickly shatter a graph. Further-more, because the balance constraint (Eq. 5.3) ensuresthat edges are uniformly distributed over machines, wenaturally achieve improved work balance even in the pres-ence of very high-degree vertices.

The simplest method to construct a vertex cut is torandomly assign edges to machines. Random (hashed)edge placement is fully data-parallel, achieves nearly per-fect balance on large graphs, and can be applied in thestreaming setting. In the following we relate the expectednormalized replication factor (Eq. 5.2) to the number ofmachines and the power-law constant a .

Theorem 5.2 (Randomized Vertex Cuts). Let D[v] denotethe degree of vertex v. A uniform random edge placementon p machines has an expected replication factor

E"

1|V | Â

v2V|A(v)|

#=

p|V | Â

v2V

1�✓

1� 1p

◆D[v]!. (5.4)

For a graph with power-law constant a we obtain:

E"

1|V | Â

v2V|A(v)|

#= p� pLia

✓p�1

p

◆/z (a) (5.5)

where Lia (x) is the transcendental polylog function andz (a) is the Riemann Zeta function (plotted in Fig. 5a).

Higher a values imply a lower replication factor, con-firming our earlier intuition. In contrast to a random 2-way edge-cut which requires order |E|/2 communicationa random 2-way vertex-cut on an a = 2 power-law graphrequires only order 0.3 |V | communication, a substantialsavings on natural graphs where E can be an order ofmagnitude larger than V (see Tab. 1a).

5.2 Greedy Vertex-CutsWe can improve upon the randomly constructed vertex-cut by de-randomizing the edge-placement process. Theresulting algorithm is a sequential greedy heuristic whichplaces the next edge on the machine that minimizes theconditional expected replication factor. To construct thede-randomization we consider the task of placing the i+1edge after having placed the previous i edges. Using theconditional expectation we define the objective:

argmink

E"

Âv2V

|A(v)|

�� Ai,A(ei+1) = k

#(5.6)

6

10 Machines ! 90% of edges cut 100 Machines ! 99% of edges cut!

25

Machine 1 Machine 2

•  Split High-‐Degree verMces •  New Abstrac1on ! Equivalence on Split Ver(ces

26

Program For This

Run on This

Gather Informa1on About Neighborhood

Update Vertex

Signal Neighbors & Modify Edge Data

A Common Pattern inVertex Programs

GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = total // Trigger neighbors to run again priority = |R[i] – oldR[i]| if R[i] not converged then signal neighbors(i) with priority

27

Machine 2 Machine 1

Machine 4 Machine 3

GAS Decomposition

Σ1 Σ2

Σ3 Σ4

+ + +

Y Y Y Y

Y’

Σ

Y’ Y’ Y’ Gather

Apply

Sca>er

28

Master

Mirror

Mirror Mirror

Minimizing Communication in PowerGraph

Y Y Y

A vertex-cut minimizes "machines each vertex spans

Percolation theory suggests that power law graphs have good vertex cuts. [Albert et al. 2000]

Communication is linear in "the number of machines "

each vertex spans

29

EC2 HPC Nodes

MPI/TCP-‐IP PThreads HDFS

GraphLab2 System

Graph AnalyMcs

Graphical Models

Computer Vision Clustering Topic

Modeling CollaboraMve

Filtering

Machine Learning and Data-Mining Toolkits

Apache 2 License http://graphlab.org

PageRank on Twitter Follower Graph Natural Graph with 40M Users, 1.4 Billion Links

Hadoop results from [Kang et al. '11] Twister (in-memory MapReduce) [Ekanayake et al. ‘10]

31

0 50 100 150 200

Hadoop

GraphLab

Twister

Piccolo

PowerGraph

Run1me Per Itera1on

Order of magnitude by exploiting

properties of Natural Graphs

GraphLab2 is Scalable Yahoo Altavista Web Graph (2002):

One of the largest publicly available web graphs

1.4 Billion Webpages, 6.6 Billion Links

1024 Cores (2048 HT) 64 HPC Nodes

7 Seconds per Iter. 1B links processed per second

30 lines of user code

32

Topic Modeling English language Wikipedia

–  2.6M Documents, 8.3M Words, 500M Tokens –  Computationally intensive algorithm

33

0 20 40 60 80 100 120 140 160

Smola et al.

PowerGraph

Million Tokens Per Second

100 Yahoo! Machines Specifically engineered for this task

64 cc2.8xlarge EC2 Nodes 200 lines of code & 4 human hours

Counted: 34.8 Billion Triangles

34

Triangle Counting on Twitter

64 Machines 15 Seconds

1536 Machines 423 Minutes

Hadoop��[WWW’11]

S. Suri and S. Vassilvitskii, “CounMng triangles and the curse of the last reducer,” WWW’11

1000 x Faster

40M Users, 1.4 Billion Links

Orders of magnitude improvements over existing systems

New ways execute graph algorithms

Machine 1 Machine 2

New ways to represent real-world graphs

6. Before

8. After

7. After

By exploiting common patterns in graph data and computation:

6. Before

8. After

7. After

Possibility

Scalability

Usability

Exciting Time to Work in ML

With ML, I will" cure cancer!!!

With ML I will "find true love.

Why won’t "ML read"

my mind???

L Building scalable learning system requires experts …

J Unique opportunities to change the world!!

ML key to any new service we want to build

But…

Even basics of scalable ML can be challenging

>6 months from prototype to producMon

State-‐of-‐art ML algorithms trapped in research papers

Goal of GraphLab 3: Make large-scale machine learning accessible to all! J

EC2 HPC Nodes

MPI/TCP-‐IP PThreads HDFS

GraphLab2 System

Graph AnalyMcs

Graphical Models

Computer Vision Clustering Topic

Modeling CollaboraMve

Filtering

Adding a Python Layer

Python API

Learning ML with GraphLab Notebook

https://beta.graphlab.com/examples!

Prototype to Productionwith Python GraphLab:

Easily install & prototype locally

Deploy to the cluster in one step

Learn: GraphLab Notebook

Prototype: pip install graphlab

è local prototyping

Production: Same code scales

to EC2 cluster

GraphLab Toolkits

Highly scalable, state-of-the-art machine learning straight from python

Graph Analytics

GraphicalModels

ComputerVision Clustering Topic

Modeling Collaborative

Filtering

Joseph Gonzalez Co-Founder, GraphLab Inc.

[email protected]

[email protected]

NIPS Workshop on Big Learning: biglearn.org"Lake Tahoe, December 9th

Machine Learning on Graphs

6. Before

8. After

7. After

Joey gonzalez, graph lab, m lconf 2013

Technology

vertex graph

vertex spansa vertex

vertex programs graphlab

graph analytics

graph ssl

graph coloring

postpost post

graphparallel abstraction