Top Banner
SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos, and Eric Xing
64

SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Jan 12, 2016

Download

Documents

Roger Hawkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

SGD ON HADOOPFOR BIG DATA & HUGE MODELSAlex Beutel

Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos Faloutsos, and Eric Xing

Page 2: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Outline

1. When to use SGD for distributed learning

2. Optimization• Review of DSGD• SGD for Tensors• SGD for ML models – topic modeling, dictionary learning, MMSB

3. Hadoop1. General algorithm

2. Setting up the MapReduce body

3. Reducer communication

4. Distributed normalization

5. “Always-On SGD” – How to deal with the straggler problem

4. Experiments

Page 3: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

When distributed SGD is useful

Collaborative FilteringPredict movie preferences

Topic ModelingWhat are the topics of webpages,

tweets, or status updatesDictionary Learning

Remove noise or missing pixels from images

Tensor DecompositionFind communities in temporal graphs

300 Million Photos uploaded to Facebook per day!

1 Billion users on Facebook

400 million tweets per day

Page 4: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Gradient Descent

Page 5: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Stochastic Gradient Descent (SGD)

Page 6: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Stochastic Gradient Descent (SGD)

Page 7: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

DSGD for Matrices (Gemulla, 2011)

XU

V

≈Users

Movies

Genres

Page 8: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

DSGD for Matrices (Gemulla, 2011)

XU

V

≈Independent!

Page 9: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

DSGD for Matrices (Gemulla, 2011)

Independent Blocks

Page 10: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

DSGD for Matrices (Gemulla, 2011)

Partition your data & model into d × d blocks

Results in d=3 strata

Process strata sequentially, process blocks in each stratum in parallel

Page 11: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

TENSORS

Page 12: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

What is a tensor?• Tensors are used for structured data > 2 dimensions• Think of as a 3D-matrix

Subject

Verb

Object

For example:

Derek Jeter plays baseball

Page 13: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Tensor Decomposition

≈U

V

W

X

Page 14: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Tensor Decomposition

≈U

V

W

X

Page 15: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Tensor Decomposition

≈U

V

W

X

Independent

Not Independent

Page 16: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Tensor Decomposition

Page 17: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

For d=3 blocks per stratum, we require d2=9 strata

Page 18: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Coupled Matrix + Tensor Decomposition

XY

Subject

Verb

Object

Document

Page 19: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Coupled Matrix + Tensor Decomposition

≈U

V

W

XY

A

Page 20: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Coupled Matrix + Tensor Decomposition

Page 21: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

CONSTRAINTS & PROJECTIONS

Page 22: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Example: Topic Modeling

Documents

Words

Topics

Page 23: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Constraints

• Sometimes we want to restrict response:• Non-negative

• Sparsity

• Simplex (so vectors become probabilities)

• Keep inside unit ball

Page 24: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

How to enforce? Projections• Example: Non-negative

Page 25: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

More projections• Sparsity (soft thresholding):

• Simplex

• Unit ball

Page 26: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Dictionary Learning• Learn a dictionary of concepts and a sparse

reconstruction• Useful for fixing noise and missing pixels of images

Sparse encoding

Within unit ball

Page 27: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Mixed Membership Network Decomp.

• Used for modeling communities in graphs (e.g. a social network)

Simplex

Non-negative

Page 28: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

IMPLEMENTING ON HADOOP

Page 29: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

High level algorithm

for Epoch e = 1 … T do

for Subepoch s = 1 … d2 do

Let be the set of blocks in stratum s

for block b = 1 … d in parallel do

Run SGD on all points in block

end

end

end

Stratum 1 Stratum 2 Stratum 3 …

Page 30: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Bad Hadoop Algorithm: Subepoch 1

Run SGD on Update:

Run SGD on Update:

Run SGD on Update:

ReducersMappers

U2 V1 W3

U3 V2 W1

U1 V3 W2

Page 31: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Bad Hadoop Algorithm: Subepoch 2

Run SGD on Update:

Run SGD on Update:

Run SGD on Update:

ReducersMappers

U2 V1 W2

U3 V2 W3

U1 V3 W1

Page 32: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Hadoop Challenges• MapReduce is typically very bad for iterative algorithms

• T × d2 iterations

• Sizable overhead per Hadoop job• Little flexibility

Page 33: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

High Level Algorithm

V1

V2

V3

U1 U

2 U3

W 1

W 2

W 3

V1

V2

V3

U1 U

2 U3

W 1

W 2

W 3

U1 V1 W1 U2 V2 W2 U3 V3 W3

Page 34: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

High Level Algorithm

V1

V2

V3

U1 U

2 U3

W 1

W 2

W 3

V1

V2

V3

U1 U

2 U3

W 1

W 2

W 3

U1 V1 W1 U2 V2 W2 U3 V3 W3

Page 35: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

High Level Algorithm

V1

V2

V3

U1 U

2 U3

W 1

W 2

W 3

V1

V2

V3

U1 U

2 U3

W 1

W 2

W 3

U1 V1 W3 U2 V2 W1 U3 V3 W2

Page 36: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

High Level Algorithm

V1

V2

V3

U1 U

2 U3

W 1

W 2

W 3

V1

V2

V3

U1 U

2 U3

W 1

W 2

W 3

U1 V1 W2 U2 V2 W3 U3 V3 W1

Page 37: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Hadoop Algorithm

Process points:

Map each point

to its block

with necessary info to order

Reducers

Mappers

Partition &

Sort

Use:PartitionerKeyComparatorGroupingComparator

Page 38: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Hadoop Algorithm

Process points:

Map each point

to its block

with necessary info to order

Reducers

Mappers

Partition &

Sort

Page 39: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Hadoop Algorithm

Process points:

Map each point

to its block

with necessary info to order

U1 V1 W1

Run SGD on Update:

U2 V2 W2

Run SGD on Update:

U3 V3 W3

Run SGD on Update:

Reducers

Mappers

Partition &

Sort

Page 40: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Hadoop Algorithm

Process points:

Map each point

to its block

with necessary info to order

U1 V1 W1

Run SGD on Update:

U2 V2 W2

Run SGD on Update:

U3 V3 W3

Run SGD on Update:

Reducers

Mappers

Partition &

Sort

Page 41: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Hadoop Algorithm

Process points:

Map each point

to its block

with necessary info to order

U1 V1

Run SGD on Update:

U2 V2

Run SGD on Update:

U3 V3

Run SGD on Update:

Reducers

Mappers

Partition &

Sort

HDFS

HDFS

W2

W1

W3

Page 42: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Hadoop Summary

1. Use mappers to send data points to the correct reducers in order

2. Use reducers as machines in a normal cluster

3. Use HDFS as the communication channel between reducers

Page 43: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Distributed Normalization

Documents

Words

Topics

π1 β1

π2 β2

π3 β3

Page 44: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Distributed Normalization

π1 β1

π2 β2π3 β3

σ(1)

σ(2)

σ(3)

σ(b) is a k-dimensional vector, summing the terms of βb

σ(1)

σ(1)

σ(3)

σ(3)

σ(2) σ(2)

Transfer σ(b) to all machinesEach machine calculates σ:

Normalize:

Page 45: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Barriers & Stragglers

Process points:

Map each point

to its block

with necessary info to order

Run SGD on

Run SGD on

Run SGD on

Reducers

Mappers

Partition &

Sort

…U1 V1

Update:

U2 V2

Update:

U3 V3

Update:

HDFS

HDFS

W2

W1

W3

Wasting time waiting!

Page 46: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Solution: “Always-On SGD”For each reducer:

Run SGD on all points in current block Z

Shuffle points in Z and decrease step size Check if other reducers

are ready to syncRun SGD on points in Z

againIf not ready to sync

Wait

If not ready to sync

Sync parameters and get new block Z

Page 47: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

“Always-On SGD”

Process points:

Map each point

to its block

with necessary info to order

Run SGD on

Run SGD on

Run SGD on

Reducers

Partition &

Sort

…U1 V1

Update:

U2 V2

Update:

U3 V3

Update:

HDFS

HDFS

W2

W1

W3

Run SGD on old points again!

Page 48: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

“Always-On SGD”

First SGD pass of block Z

Extra SGD Updates

Read Parameters from HDFS

Write Parameters to HDFS

Reducer 1

Reducer2

Reducer 3

Reducer 4

Page 49: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

EXPERIMENTS

Page 50: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

FlexiFaCT (Tensor Decomposition)Convergence

Page 51: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

FlexiFaCT (Tensor Decomposition)Scalability in Data Size

Page 52: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

FlexiFaCT (Tensor Decomposition)Scalability in Tensor Dimension

Handles up to 2 billion parameters!

Page 53: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

FlexiFaCT (Tensor Decomposition)Scalability in Rank of Decomposition

Handles up to 4 billion parameters!

Page 54: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

FlexiFaCT (Tensor Decomposition)Scalability in Number of Machines

Page 55: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Fugue (Using “Always-On SGD”)Dictionary Learning: Convergence

Page 56: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Fugue (Using “Always-On SGD”)Community Detection: Convergence

Page 57: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Fugue (Using “Always-On SGD”)Topic Modeling: Convergence

Page 58: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Fugue (Using “Always-On SGD”)Topic Modeling: Scalability in Data Size

Page 59: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Fugue (Using “Always-On SGD”)Topic Modeling: Scalability in Rank

Page 60: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Fugue (Using “Always-On SGD”)Topic Modeling: Scalability over Machines

Page 61: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Fugue (Using “Always-On SGD”)Topic Modeling: Number of Machines

Page 62: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Fugue (Using “Always-On SGD”)

Page 63: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Key Points• Flexible method for tensors & ML models• Can use stock Hadoop through using HDFS for

communication• When waiting for slower machines, run updates on old

data again

Page 64: SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

Questions?

Alex [email protected]://alexbeutel.com