Top Banner
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Large-Scale Machine Learning and Graphs Carlos Guestrin November 15, 2013
93

GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Jan 15, 2015

Download

Technology

GraphLab is like Hadoop for graphs in that it enables users to easily express and execute machine learning algorithms on massive graphs. In this session, we illustrate how GraphLab leverages Amazon EC2 and advances in graph representation, asynchronous communication, and scheduling to achieve orders-of-magnitude performance gains over systems like Hadoop on real-world data.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Large-Scale Machine Learning and Graphs

Carlos Guestrin

November 15, 2013

Page 2: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

PHASE 1: POSSIBILITY

Page 3: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Page 4: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

PHASE 2: SCALABILITY

Page 5: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Page 6: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

PHASE 3: USABILITY

Page 7: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Page 8: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Three Phases in Technological

Development

Wide

Adoption

Beyond

Experts &

Enthusiast

3. Usability

2. Scalability

1. Possibility

Page 9: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Machine Learning PHASE 1:

POSSIBILITY

Page 10: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Rosenblatt 1957

Page 11: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Page 12: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Machine Learning PHASE 2:

SCALABILITY

Page 13: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Needless to Say, We Need Machine

Learning for Big Data

72 Hours a Minute YouTube 28 Million

Wikipedia Pages

1 Billion Facebook Users

6 Billion Flickr Photos

“… data a new class of economic

asset, like currency or gold.”

Page 14: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Big Learning How will we design and implement

parallel learning systems?

Page 15: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

MapReduce for Data-Parallel ML

Excellent for large data-parallel tasks!

Data-Parallel Graph-Parallel

Cross

Validation

Feature

Extraction

MapReduce

Computing Sufficient Statistics

Is there more to

Machine Learning

?

Page 16: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Page 17: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Page 18: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Page 19: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

The Power of

Dependencies

where the value is!

Page 20: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Flashback to 1998

Why?

First Google advantage:

a Graph Algorithm & a System to Support it!

Page 21: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

It’s all about the

graphs…

Page 22: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Social Media

• Graphs encode the relationships between:

• Big: 100 billions of vertices and edges and rich metadata – Facebook (10/2012): 1B users, 144B friendships

– Twitter (2011): 15B follower edges

Advertising Science Web

People Facts

Products Interests

Ideas

Page 23: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Examples of

Graphs in

Machine Learning

Page 24: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Label a Face

and Propagate

Page 25: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Pairwise similarity not enough…

Not similar enough

to be sure

Page 26: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Propagate Similarities & Co-occurrences for

Accurate Predictions

similarity

edges

co-occurring

faces

further evidence

Probabilistic Graphical Models

Page 27: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Collaborative Filtering: Exploiting Dependencies

City of God

Wild Strawberries

The Celebration

La Dolce Vita

Women on the Verge of a

Nervous Breakdown

What do I

recommend???

Latent Factor Models

Non-negative Matrix Factorization

Page 28: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Liberal Conservative

Post

Post

Post

Post

Post

Post

Post

Post

Estimate Political Bias

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

Post

? ?

?

?

? ? ?

? ? ?

?

?

? ?

? ?

?

?

?

?

?

? ?

?

?

?

?

? ?

?

Semi-Supervised &

Transductive Learning

Page 29: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Topic Modeling

Cat

Apple

Growth

Hat

Plant

LDA and co.

Page 30: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Data

Machine Learning Pipeline

images

docs

Movie ratings

Social activity

Extract

Features

Graph

Formation Structured

Machine

Learning

Algorithm

Value

from

Data

Face labels

Doc topics

movie recommend

sentiment analysis

Page 31: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

ML Tasks Beyond Data-Parallelism

Data-Parallel Graph-Parallel

Cross

Validation

Feature

Extraction

Map Reduce

Computing Sufficient

Statistics

Graphical Models Gibbs Sampling

Belief Propagation

Variational Opt.

Semi-Supervised

Learning Label Propagation

CoEM Graph Analysis PageRank

Triangle Counting

Collaborative

Filtering Tensor Factorization

Page 32: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Example of a

Graph-Parallel

Algorithm

Page 33: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

PageRank

What’s the rank

of this user?

Rank?

Depends on rank

of who follows her

Depends on rank

of who follows them…

Loops in graph Must iterate!

Page 34: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

PageRank Iteration

– α is the random reset probability

– wji is the prob. transitioning (similarity) from j to i

R[i]

R[j] wji

Iterate until convergence: “My rank is weighted

average of my friends’ ranks”

Page 35: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Properties of Graph Parallel Algorithms

Dependency

Graph Iterative

Computation

My Rank

Friends Rank

Local

Updates

Page 36: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

The Need for a New Abstraction

Data-Parallel Graph-Parallel

Cross

Validation

Feature

Extraction

Map Reduce

Computing Sufficient

Statistics

Graphical Models Gibbs Sampling

Belief Propagation

Variational Opt.

Semi-Supervised

Learning Label Propagation

CoEM

Data-Mining PageRank

Triangle Counting

Collaborative

Filtering Tensor Factorization

• Need: Asynchronous, Dynamic Parallel Computations

Page 37: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

The GraphLab Goals

Efficient

parallel

predictions

Know how to

solve ML problem

on 1 machine

Page 38: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

POSSIBILITY

Page 39: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Data Graph Data associated with vertices and edges

Vertex Data:

• User profile text

• Current interests estimates

Edge Data:

• Similarity weights

Graph:

• Social Network

Page 40: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

How do we program

graph computation?

“Think like a Vertex.” -Malewicz et al. [SIGMOD’10]

Page 41: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

pagerank(i, scope){

// Get Neighborhood data

(R[i], wij, R[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed

if R[i] changes then

reschedule_neighbors_of(i);

}

R[i]¬a + (1-a) w ji ´R[ j]jÎN[i]

å ;

Update Functions User-defined program: applied to

vertex transforms data in scope of vertex

Dynamic

computation

Update function applied (asynchronously)

in parallel until convergence

Many schedulers available to prioritize computation

Page 42: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

The GraphLab Framework

Scheduler Consistency

Model

Graph Based

Data Representation

Update Functions

User Computation

Page 43: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Bayesian Tensor

Factorization

Gibbs Sampling

Dynamic Block Gibbs Sampling Matrix

Factorization

Lasso

SVM

Belief Propagation PageRank

CoEM

K-Means

SVD

LDA

…Many others… Linear Solvers

Splash Sampler Alternating Least

Squares

Page 44: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Never Ending Learner Project (CoEM)

Hadoop 95 Cores 7.5 hrs

Distributed

GraphLab

32 EC2 machines 80 secs

0.3% of Hadoop time

2 orders of mag faster

2 orders of mag cheaper

Page 45: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

– ML algorithms as vertex programs

– Asynchronous execution and consistency

models

Page 46: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

GraphLab 1 provided exciting

scaling performance

But…

Thus far…

We couldn’t scale up to

Altavista Webgraph 2002

1.4B vertices, 6.7B edges

Page 47: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Natural Graphs

[Image from WikiCommons]

Page 48: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Problem:

Existing distributed graph

computation systems perform

poorly on Natural Graphs

Page 49: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Achilles Heel: Idealized Graph Assumption Assumed… But, Natural Graphs…

Many high degree vertices

(power-law degree distribution)

Very hard to partition

Small degree

Easy to partition

Page 50: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Power-Law Degree Distribution

100

102

104

106

108

100

102

104

106

108

1010

degree

coun

t

High-Degree Vertices:

1% vertices adjacent to

50% of edges

Num

ber

of V

ert

ices

AltaVista WebGraph

1.4B Vertices, 6.6B Edges

Degree

Page 51: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

High Degree Vertices are Common

Users

Movies

Netflix

“Social” People Popular Movies

θ Z w Z w Z w Z w

θ Z w Z w Z w Z w

θ Z w Z w Z w Z w

θ Z w Z w Z w Z w

b α

Hyper Parameters

Docs

Words

LDA

Common Words

Obama

Page 52: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Power-Law Degree Distribution

“Star Like” Motif

President

Obama Followers

Page 53: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Problem: High Degree Vertices High

Communication for Distributed Updates

Y

Machine 1 Machine 2

Natural graphs do not have low-cost balanced cuts [Leskovec et al. 08, Lang 04]

Popular partitioning tools (Metis, Chaco,…) perform poorly [Abou-Rjeili et al. 06]

Extremely slow and require substantial memory

Data transmitted

across network

O(# cut edges)

Page 54: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Random Partitioning • Both GraphLab 1, Pregel, Twitter, Facebook,… rely on

Random (hashed) partitioning for Natural Graphs

Machine 1 Machine 2

For p Machines:

10 Machines 90% of edges cut

100 Machines 99% of edges cut!

All data is communicated… Little advantage over MapReduce

Page 55: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

In Summary

GraphLab 1 and Pregel are not well

suited for natural graphs

• Poor performance on high-degree vertices

• Low Quality Partitioning

Page 56: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

PowerGraph

SCALABILITY

2

Page 57: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Gather Information

About Neighborhood

Apply Update to Vertex

Scatter Signal to Neighbors

& Modify Edge Data

Common Pattern for Update Fncs.

GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in in_neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = 0.1 + total // Trigger neighbors to run again if R[i] not converged then foreach( j in out_neighbors(i)) signal vertex-program on j

R[i]

R[j] wji

Page 58: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

GAS Decomposition Y

+ … +

Y

Parallel

“Sum”

Y

Gather (Reduce)

Apply the accumulated

value to center vertex

Apply

Update adjacent edges

and vertices.

Scatter

Accumulate information

about neighborhood

Y +

Y Σ Y’

Y’

Page 59: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Many ML Algorithms fit

into GAS Model graph analytics, inference in graphical

models, matrix factorization,

collaborative filtering, clustering, LDA, …

Page 60: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

A vertex-cut minimizes # machines per vertex

Minimizing Communication in GL2

PowerGraph: Vertex Cuts

Y Communication linear

in # spanned machines

Y Y

Percolation theory suggests Power Law graphs can be split by

removing only a small set of vertices [Albert et al. 2000]

Small vertex cuts possible!

GL2 PowerGraph includes novel vertex cut algorithms

Provides order of magnitude gains in performance

Page 61: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

From the Abstraction

to a System

2

Page 62: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

34.8 Billion Triangles Triangle Counting on Twitter Graph

64 Machines

15 Seconds

1636 Machines

423 Minutes

Hadoop

[WWW’11]

S. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’11

Why? Wrong Abstraction

Broadcast O(degree2) messages per Vertex

Page 63: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Topic Modeling (LDA)

• English language Wikipedia – 2.6M Documents, 8.3M Words, 500M Tokens

– Computationally intensive algorithm

64 cc2.8xlarge EC2 Nodes

Specifically engineered for this task

200 lines of code & 4 human hours

Page 64: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

How well does GraphLab scale?

Yahoo Altavista Web Graph (2002):

One of the largest publicly available webgraphs

1.4B Webpages, 6.7 Billion Links

1024 Cores (2048 HT) 4.4 TB RAM

64 HPC Nodes

7 seconds per iter. 1B links processed per second

30 lines of user code

Page 65: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

GraphChi: Going small with GraphLab

Solve huge problems on

small or embedded devices?

Key: Exploit non-volatile memory

(starting with SSDs and HDs)

Page 66: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

GraphChi – disk-based GraphLab

Challenge:

Random Accesses

Novel GraphChi solution:

Parallel sliding windows method

minimizes number of random accesses

Page 67: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Triangle Counting on Twitter Graph

40M Users

1.2B Edges Total: 34.8 Billion Triangles

Hadoop results from [Suri

& Vassilvitskii '11]

64 Machines, 1024 Cores

15 Seconds

1636 Machines

423 Minutes

59 Minutes, 1 Mac Mini!

Page 68: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

– ML algorithms as vertex programs

– Asynchronous execution and

consistency models

– Natural graphs change the nature of

computation

– Vertex cuts and gather/apply/scatter

model

PowerGraph 2

Page 69: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

GL2 PowerGraph

focused on Scalability

at the loss of Usability

Page 70: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

GraphLab 1

Explicitly described operations

PageRank(i, scope){ acc = 0 for (j in InNeighbors) { acc += pr[j] * edge[j].weight } pr[i] = 0.15 + 0.85 * acc }

Code is intuitive

Page 71: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

GL2 PowerGraph GraphLab 1

Explicitly described operations

PageRank(i, scope){ acc = 0 for (j in InNeighbors) { acc += pr[j] * edge[j].weight } pr[i] = 0.15 + 0.85 * acc }

Implicit operation

Implicit

aggregation

Need to understand engine to

understand code Code is intuitive

gather(edge) { return edge.source.value * edge.weight }

merge(acc1, acc2) {

return accum1 + accum2

}

apply(v, accum) { v.pr = 0.15 + 0.85 * acc }

Page 72: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

What now?

Great flexibility,

but hit scalability wall

Scalability,

but very rigid abstraction (many contortions needed to implement

SVD++, Restricted Boltzmann Machines)

Page 73: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

WarpGraph

USABILITY

3

Page 74: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Machine 1 Machine 2

GL3 WarpGraph Goals

Program Like GraphLab 1

Run Like GraphLab 2

Page 75: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Fine-Grained Primitives

Y

PageRankUpdateFunction(Y) { Y.pagerank = 0.15 + 0.85 * MapReduceNeighbors( lambda nbr: nbr.pagerank*nbr.weight, lambda (a,b): a + b ) }

Expose Neighborhood Operations through Parallelizable Iterators

(aggregate sum over neighbors)

Page 76: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Expressive, Extensible Neighborhood API0

+ + …+

Y Y Y

Parallel

Sum

Y

MapReduce over Neighbors

Y

Modify adjacent edges

Parallel Transform Adjacent Edges

Y

Schedule a selected subset of adjacent vertices

Broadcast

Page 77: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Can express every GL2 PowerGraph program (more easily) in GL3 WarpGraph

Multiple gathers

Scatter before gather

Conditional execution

But GL3 is more

expressive

UpdateFunction(v) {

if (v.data == 1)

accum = MapReduceNeighs(g,m)

else ...

}

Page 78: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Graph Coloring Twitter Graph: 41M Vertices 1.4B Edges

WarpGraph outperforms PowerGraph

with simpler code

32 Nodes x 16 Cores (EC2 HPC cc2.8x)

2.5x Faster GL3 WarpGraph 89 seconds

227 seconds GL2 PowerGraph

Page 79: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

– ML algorithms as vertex programs

– Asynchronous execution and consistency models

– Natural graphs change the nature of computation

– Vertex cuts and gather/apply/scatter model

– Usability is key

– Access neighborhood through parallelizable iterators and latency hiding

PowerGraph 2

WarpGraph

Page 80: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Usability

Consensus that WarpGraph is much

easier to use than PowerGraph

“User study” group biased… :-)

RECENT RELEASE: GRAPHLAB 2.2,

INCLUDING WARPGRAPH ENGINE

And support for

streaming/dynamic graphs!

Page 81: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Usability for Whom???

… GL3

WarpGraph GL2

PowerGraph

Page 82: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Machine Learning

PHASE 3

USABILITY

Page 83: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Exciting Time to Work in ML With Big Data, I’ll take over the world!!!

We met because of

Big Data

Why won’t Big Data read my mind???

Unique opportunities to change the world!!

But, every deployed system is an one-off solution,

and requires PhDs to make work…

Page 84: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

ML key to any

new service we

want to build

But…

Even basics of scalable ML can be challenging

6 months from R/Matlab to production, at best

State-of-art ML algorithms trapped in research papers

Goal of GraphLab 3:

Make huge-scale machine learning accessible to all!

Page 85: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Step 1

Learning ML in Practice

with GraphLab Notebook

Page 86: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Step 2

GraphLab+Python:

ML Prototype to Production

Page 87: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Learn: GraphLab Notebook

Prototype: pip install graphlab

local prototyping

Production: Same code scales -

execute on EC2 cluster

Page 88: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Step 3

GraphLab Toolkits:

Integrated State-of-the-Art

ML in Production

Page 89: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

GraphLab Toolkits

Highly scalable, state-of-the-art

machine learning straight from python

Graph

Analytics

Graphical

Models

Computer

Vision Clustering

Topic

Modeling

Collaborative

Filtering

Page 90: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Now with GraphLab: Learn/Prototype/Deploy

Even basics of scalable ML can be challenging

6 months from R/Matlab to production, at best

State-of-art ML algorithms trapped in research papers

Learn ML with

GraphLab Notebook

pip install graphlab

then deploy on EC2

Fully integrated

via GraphLab Toolkits

Page 91: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

We’re selecting strategic partners

Help define our strategy & priorities And, get the value of GraphLab in your company

[email protected]

Page 92: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Possibility

Scalability

Usability

GraphLab 2.2 available now: graphlab.com

Define our future: [email protected]

Needless to say: [email protected]

Page 93: GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

BDT204