Top Banner
Carnegie Mellon Distributed Parallel Inference on Large Factor Graphs Joseph E. Gonzalez Yucheng Low Carlos Guestrin David O’Hallaron
39

pptx - Distributed Parallel Inference on Large Factor Graphs

Nov 28, 2014

Download

Documents

butest

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: pptx - Distributed Parallel Inference on Large Factor Graphs

Carnegie Mellon

Distributed Parallel Inference on Large Factor

Graphs

Joseph E. GonzalezYucheng Low

Carlos GuestrinDavid O’Hallaron

Page 2: pptx - Distributed Parallel Inference on Large Factor Graphs

2

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

2010

0.01

0.1

1

10

Exponential Parallelism

Exponentially

Incre

asing

Sequential P

erform

ance

Constant SequentialPerformance

Pro

cess

or

Sp

eed

GH

z

Exponentially Increasing Parallel Performance

Release Date

Page 3: pptx - Distributed Parallel Inference on Large Factor Graphs

3

Distributed Parallel Setting

Opportunities:Access to larger systems: 8 CPUs 1000 CPUsLinear Increase:

RAM, Cache Capacity, and Memory Bandwidth

Challenges:Distributed state, Communication and Load Balancing

Fast Reliable Network

Node

CPU BusMemory

Cache

Node

CPU BusMemory

Cache

Page 4: pptx - Distributed Parallel Inference on Large Factor Graphs

4

Graphical Models and Parallelism

Graphical models provide a common language for general purpose parallel algorithms in machine learning

A parallel inference algorithm would improve:

Protein Structure Prediction

Movie Recommendation

Computer Vision

Inference is a key step in Learning Graphical Models

Page 5: pptx - Distributed Parallel Inference on Large Factor Graphs

5

Belief Propagation (BP)Message passing algorithm

Naturally Parallel Algorithm

Page 6: pptx - Distributed Parallel Inference on Large Factor Graphs

6

Parallel Synchronous BPGiven the old messages all new messages can be computed in parallel:

NewMessages

OldMessages

CPU 2

CPU 1

CPU 3

CPU n

Map-Reduce Ready!

Page 7: pptx - Distributed Parallel Inference on Large Factor Graphs

7

Hidden Sequential Structure

Page 8: pptx - Distributed Parallel Inference on Large Factor Graphs

8

Hidden Sequential Structure

Page 9: pptx - Distributed Parallel Inference on Large Factor Graphs

9

Hidden Sequential Structure

Running Time:

EvidenceEvidence

Time for a singleparallel iteration

Number of Iterations

Page 10: pptx - Distributed Parallel Inference on Large Factor Graphs

Optimal Sequential Algorithm

Forward-Backward

Naturally Parallel

2n2/p

p ≤ 2n

10

RunningTime

2n

Gap

p = 1

Optimal Parallel

n

p = 2

Page 11: pptx - Distributed Parallel Inference on Large Factor Graphs

11

Parallelism by Approximation

τε represents the minimal sequential structure

True Messages

τε -Approximation

1 2 3 4 5 6 7 8 9 10

1

Page 12: pptx - Distributed Parallel Inference on Large Factor Graphs

12

Synchronous Schedule Optimal Schedule

Optimal Parallel Scheduling

In [AIStats 09] we demonstrated that this algorithm is optimal:

Processor 1 Processor 2 Processor 3

ParallelComponent

SequentialComponent

Gap

Page 13: pptx - Distributed Parallel Inference on Large Factor Graphs

13

The Splash OperationGeneralize the optimal chain algorithm:

to arbitrary cyclic graphs:

~

1) Grow a BFS Spanning tree with fixed size

2) Forward Pass computing all messages at each vertex

3) Backward Pass computing all messages at each vertex

Page 14: pptx - Distributed Parallel Inference on Large Factor Graphs

14

Local State

CPU 2

Local State

CPU 3

Local State

CPU 1

Running Parallel Splashes

Partition the graphSchedule Splashes locallyTransmit the messages along the boundary of the partition

Splash SplashSplash

Key Challenges:1) How do we schedules Splashes?2) How do we partition the Graph?

Page 15: pptx - Distributed Parallel Inference on Large Factor Graphs

Local State

Sch

edu

ling

Queu

e

Where do we Splash?Assign priorities and use a scheduling queue to select roots:

Splash

Splash

??

?

CPU 1

How do we assign priorities?

Page 16: pptx - Distributed Parallel Inference on Large Factor Graphs

16

Message SchedulingResidual Belief Propagation [Elidan et al., UAI 06]:

Assign priorities based on change in inbound messages

1

Message

Message

Message

2

Message

Message

Message

Large ChangeSmall Change

Small Change

Large Change

Small Change:Expensive No-Op

Large Change:Informative Update

Page 17: pptx - Distributed Parallel Inference on Large Factor Graphs

17

Problem with Message Scheduling

Small changes in messages do not imply small changes in belief:

Small change inall message

Large change inbelief

Message

Message

BeliefMessage

Message

Page 18: pptx - Distributed Parallel Inference on Large Factor Graphs

18

Problem with Message Scheduling

Large changes in a single message do not imply large changes in belief:

Large change ina single message

Small changein belief

Message

BeliefMessageMessage

Message

Page 19: pptx - Distributed Parallel Inference on Large Factor Graphs

19

Belief Residual Scheduling

Assign priorities based on the cumulative change in belief:

1 1

+1

+rv =

MessageChange

A vertex whose belief has changed substantially

since last being updatedwill likely produce

informative new messages.

Page 20: pptx - Distributed Parallel Inference on Large Factor Graphs

20

Message vs. Belief Scheduling

Belief scheduling improves accuracy more quicklyBelief scheduling improves convergence

Belief Residuals

Message Residual

0%

20%

40%

60%

80%

100%

% Converged in 4Hrs

Bett

er

0 20 40 60 80 100

0.02

0.03

0.04

0.05

0.06Message Schedul-ing

Belief Scheduling

Time (Seconds)

L1

Err

or

in B

eliefs

Page 21: pptx - Distributed Parallel Inference on Large Factor Graphs

Splash PruningBelief residuals can be used to dynamically reshape and resize Splashes:

LowBeliefs

Residual

Page 22: pptx - Distributed Parallel Inference on Large Factor Graphs

22

Splash SizeUsing Splash Pruning our algorithm is able to dynamically select the optimal splash size

0 10 20 30 40 50 600

50

100

150

200

250

300

350Without Pruning

With Pruning

Splash Size (Messages)

Ru

nn

ing

Tim

e (

Secon

ds)

Bett

er

Page 23: pptx - Distributed Parallel Inference on Large Factor Graphs

23

Example

Synthetic Noisy Image

Factor Graph

Vertex Updates

ManyUpdate

s

FewUpdate

s

Algorithm identifies and focuses on hidden sequential structure

Page 24: pptx - Distributed Parallel Inference on Large Factor Graphs

24

Distributed Belief Residual Splash Algorithm

Partition factor graph over processorsSchedule Splashes locally using belief residualsTransmit messages on boundary

Local State

CPU 1

Splash

Local State

CPU 2

Local State

CPU 3

Splash

Fast Reliable Network

Splash

SchedulingQueue

SchedulingQueue

SchedulingQueue

Given a uniform partitioning of the chain graphical model, DBRSplash will run in time:

retaining optimality.

Theorem:

Page 25: pptx - Distributed Parallel Inference on Large Factor Graphs

25

CPU 1 CPU 2

Partitioning ObjectiveThe partitioning of the factor graph determines:

Storage, Computation, and Communication

Goal: Balance Computation and Minimize Communication

EnsureBalanceComm.

cost

Page 26: pptx - Distributed Parallel Inference on Large Factor Graphs

26

The Partitioning ProblemObjective:

Depends on:

NP-Hard METIS fast partitioning heuristic

Work:

Comm:

Minimize Communication

Ensure Balance

Update counts are not known!

Page 27: pptx - Distributed Parallel Inference on Large Factor Graphs

27

Unknown Update Counts Determined by belief schedulingDepends on: graph structure, factors, …Little correlation between past & future update counts

Noisy Image Update CountsUninformed Cut

Page 28: pptx - Distributed Parallel Inference on Large Factor Graphs

28

Uniformed Cuts

Greater imbalance & lower communication cost

Update Counts Uninformed Cut Optimal Cut

Denoise UW-Syst.0

1

2

3

4Work Imbalance

Denoise UW-Syst.0.5

0.7

0.9

1.1

Communication Cost

Unin-formedB

ett

er

Bett

er

Too Much Work

Too Little Work

Page 29: pptx - Distributed Parallel Inference on Large Factor Graphs

29

Over-PartitioningOver-cut graph into k*p partitions and randomly assign CPUs

Increase balanceIncrease communication cost (More Boundary)

CPU 1

CPU 2

CPU 1 CPU 2 CPU 2

CPU 1 CPU 1 CPU 2

CPU 1 CPU 2 CPU 1

CPU 2 CPU 1 CPU 2

Without Over-Partitioning

k=6

Page 30: pptx - Distributed Parallel Inference on Large Factor Graphs

30

Over-Partitioning ResultsProvides a simple method to trade between work balance and communication cost

-1 1 3 5 7 9 11 13 151

1.5

2

2.5

3

3.5

4

Communication Cost

Partition Factor k

-1 1 3 5 7 9 11 13 151.5

2

2.5

3

3.5

Work Imbalance

Partition Factor k

Bett

er

Bett

er

Page 31: pptx - Distributed Parallel Inference on Large Factor Graphs

31

CPU UtilizationOver-partitioning improves CPU utilization:

0 50 100 150 200 2500

10203040506070

UW-Systems MLN

Time (Seconds)

Acti

ve C

PU

s

0 10 20 30 40 500

10203040506070

Denoise

No Over-Part

10x Over-Part

Time (Seconds)

Page 32: pptx - Distributed Parallel Inference on Large Factor Graphs

32

DBRSplash Algorithm

Over-Partition factor graph Randomly assign pieces to processors

Schedule Splashes locally using belief residualsTransmit messages on boundary

Local State

CPU 1

Splash

Local State

CPU 2

Local State

CPU 3

Splash

Fast Reliable Network

Splash

SchedulingQueue

SchedulingQueue

SchedulingQueue

Page 33: pptx - Distributed Parallel Inference on Large Factor Graphs

33

ExperimentsImplemented in C++ using MPICH2 as a message passing API

Ran on Intel OpenCirrus cluster: 120 processors

15 Nodes with 2 x Quad Core Intel Xeon ProcessorsGigabit Ethernet Switch

Tested on Markov Logic Networks obtained from Alchemy [Domingos et al. SSPR 08]

Present results on largest UW-Systems and smallest UW-Languages MLNs

Page 34: pptx - Distributed Parallel Inference on Large Factor Graphs

34

Parallel Performance (Large Graph)

0 30 60 90 1200

20

40

60

80

100

120

No Over-Part

5x Over-Part

Number of CPUs

Sp

eed

up

UW-Systems8K Variables406K Factors

Single Processor Running Time:

1 Hour

Linear to Super-Linear up to 120 CPUs

Cache efficiency

Linear

Bett

er

Page 35: pptx - Distributed Parallel Inference on Large Factor Graphs

35

Parallel Performance (Small Graph)

UW-Languages1K Variables27K Factors

Single Processor Running Time:

1.5 Minutes

Linear to Super-Linear up to 30 CPUs

Network costs quickly dominate short running-time

0 30 60 90 1200

10

20

30

40

50

60

No Over-Part

5x Over-Part

Number of CPUs

Sp

eed

up

Linear

Bett

er

Page 36: pptx - Distributed Parallel Inference on Large Factor Graphs

36

SummarySplash Operation generalization of the optimal parallel schedule on chain graphs

Belief-based scheduling

Addresses message scheduling issues

Improves accuracy and convergence

DBRSplash an efficient distributed parallel inference algorithm

Over-partitioning to improve work balance

Experimental results on large factor graphs:

Linear to super-linear speed-up using up to 120 processors

Page 37: pptx - Distributed Parallel Inference on Large Factor Graphs

Carnegie Mellon

Thank You

AcknowledgementsIntel Research Pittsburgh: OpenCirrus Cluster

AT&T LabsDARPA

Page 38: pptx - Distributed Parallel Inference on Large Factor Graphs

38

Exponential ParallelismFrom Saman Amarasinghe:

1985 199019801970 1975 1995 2000 2005

Raw

Power4Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128

256

512

Opteron 4PXeon MP

AmbricAM2045

4004

8008

80868080 286 386 486 Pentium P2 P3P4

Itanium

Itanium 2Athlon

Page 39: pptx - Distributed Parallel Inference on Large Factor Graphs

39

AIStats09 Speedup3D –Video Prediction Protein Side-Chain Prediction