pptx - Distributed Parallel Inference on Large Factor Graphs

Carnegie Mellon

Distributed Parallel Inference on Large Factor

Graphs

Joseph E. GonzalezYucheng Low

Carlos GuestrinDavid O’Hallaron

Exponential Parallelism

Exponentially

Sequential P

erform

Constant SequentialPerformance

Exponentially Increasing Parallel Performance

Release Date

Distributed Parallel Setting

Opportunities:Access to larger systems: 8 CPUs 1000 CPUsLinear Increase:

RAM, Cache Capacity, and Memory Bandwidth

Challenges:Distributed state, Communication and Load Balancing

Fast Reliable Network

CPU BusMemory

Graphical Models and Parallelism

Graphical models provide a common language for general purpose parallel algorithms in machine learning

A parallel inference algorithm would improve:

Protein Structure Prediction

Movie Recommendation

Computer Vision

Inference is a key step in Learning Graphical Models

Belief Propagation (BP)Message passing algorithm

Naturally Parallel Algorithm

Parallel Synchronous BPGiven the old messages all new messages can be computed in parallel:

NewMessages

OldMessages

Map-Reduce Ready!

Hidden Sequential Structure

Running Time:

EvidenceEvidence

Time for a singleparallel iteration

Number of Iterations

Optimal Sequential Algorithm

Forward-Backward

Naturally Parallel

p ≤ 2n

RunningTime

Optimal Parallel

Parallelism by Approximation

τε represents the minimal sequential structure

True Messages

τε -Approximation

1 2 3 4 5 6 7 8 9 10

Synchronous Schedule Optimal Schedule

Optimal Parallel Scheduling

In [AIStats 09] we demonstrated that this algorithm is optimal:

Processor 1 Processor 2 Processor 3

ParallelComponent

SequentialComponent

The Splash OperationGeneralize the optimal chain algorithm:

to arbitrary cyclic graphs:

1) Grow a BFS Spanning tree with fixed size

2) Forward Pass computing all messages at each vertex

3) Backward Pass computing all messages at each vertex

Local State

Running Parallel Splashes

Partition the graphSchedule Splashes locallyTransmit the messages along the boundary of the partition

Splash SplashSplash

Key Challenges:1) How do we schedules Splashes?2) How do we partition the Graph?

Local State

Where do we Splash?Assign priorities and use a scheduling queue to select roots:

Splash

How do we assign priorities?

Message SchedulingResidual Belief Propagation [Elidan et al., UAI 06]:

Assign priorities based on change in inbound messages

Message

Large ChangeSmall Change

Small Change

Large Change

Small Change:Expensive No-Op

Large Change:Informative Update

Problem with Message Scheduling

Small changes in messages do not imply small changes in belief:

Small change inall message

Large change inbelief

Message

BeliefMessage

Message

Problem with Message Scheduling

Large changes in a single message do not imply large changes in belief:

Large change ina single message

Small changein belief

Message

BeliefMessageMessage

Message

Belief Residual Scheduling

Assign priorities based on the cumulative change in belief:

MessageChange

A vertex whose belief has changed substantially

since last being updatedwill likely produce

informative new messages.

Message vs. Belief Scheduling

Belief scheduling improves accuracy more quicklyBelief scheduling improves convergence

Belief Residuals

Message Residual

% Converged in 4Hrs

0 20 40 60 80 100

0.06Message Schedul-ing

Belief Scheduling

Time (Seconds)

eliefs

Splash PruningBelief residuals can be used to dynamically reshape and resize Splashes:

LowBeliefs

Residual

Splash SizeUsing Splash Pruning our algorithm is able to dynamically select the optimal splash size

0 10 20 30 40 50 600

350Without Pruning

With Pruning

Splash Size (Messages)

Example

Synthetic Noisy Image

Factor Graph

Vertex Updates

ManyUpdate

FewUpdate

Algorithm identifies and focuses on hidden sequential structure

Distributed Belief Residual Splash Algorithm

Partition factor graph over processorsSchedule Splashes locally using belief residualsTransmit messages on boundary

Local State

Splash

Local State

Splash

SchedulingQueue

Given a uniform partitioning of the chain graphical model, DBRSplash will run in time:

retaining optimality.

Theorem:

CPU 1 CPU 2

Partitioning ObjectiveThe partitioning of the factor graph determines:

Storage, Computation, and Communication

Goal: Balance Computation and Minimize Communication

EnsureBalanceComm.

The Partitioning ProblemObjective:

Depends on:

NP-Hard METIS fast partitioning heuristic

Minimize Communication

Ensure Balance

Update counts are not known!

Unknown Update Counts Determined by belief schedulingDepends on: graph structure, factors, …Little correlation between past & future update counts

Noisy Image Update CountsUninformed Cut

Uniformed Cuts

Greater imbalance & lower communication cost

Update Counts Uninformed Cut Optimal Cut

Denoise UW-Syst.0

4Work Imbalance

Denoise UW-Syst.0.5

Communication Cost

Unin-formedB

Too Much Work

Too Little Work

Over-PartitioningOver-cut graph into k*p partitions and randomly assign CPUs

Increase balanceIncrease communication cost (More Boundary)

CPU 1 CPU 2 CPU 2

CPU 1 CPU 1 CPU 2

CPU 1 CPU 2 CPU 1

CPU 2 CPU 1 CPU 2

Without Over-Partitioning

Over-Partitioning ResultsProvides a simple method to trade between work balance and communication cost

-1 1 3 5 7 9 11 13 151

Communication Cost

Partition Factor k

-1 1 3 5 7 9 11 13 151.5

Work Imbalance

Partition Factor k

CPU UtilizationOver-partitioning improves CPU utilization:

0 50 100 150 200 2500

10203040506070

UW-Systems MLN

Time (Seconds)

0 10 20 30 40 500

10203040506070

Denoise

No Over-Part

10x Over-Part

Time (Seconds)

DBRSplash Algorithm

Over-Partition factor graph Randomly assign pieces to processors

Schedule Splashes locally using belief residualsTransmit messages on boundary

Local State

Splash

Local State

Splash

SchedulingQueue

ExperimentsImplemented in C++ using MPICH2 as a message passing API

Ran on Intel OpenCirrus cluster: 120 processors

15 Nodes with 2 x Quad Core Intel Xeon ProcessorsGigabit Ethernet Switch

Tested on Markov Logic Networks obtained from Alchemy [Domingos et al. SSPR 08]

Present results on largest UW-Systems and smallest UW-Languages MLNs

Parallel Performance (Large Graph)

0 30 60 90 1200

No Over-Part

5x Over-Part

Number of CPUs

UW-Systems8K Variables406K Factors

Single Processor Running Time:

1 Hour

Linear to Super-Linear up to 120 CPUs

Cache efficiency

Linear

Parallel Performance (Small Graph)

UW-Languages1K Variables27K Factors

Single Processor Running Time:

1.5 Minutes

Linear to Super-Linear up to 30 CPUs

Network costs quickly dominate short running-time

0 30 60 90 1200

No Over-Part

5x Over-Part

Number of CPUs

Linear

SummarySplash Operation generalization of the optimal parallel schedule on chain graphs

Belief-based scheduling

Addresses message scheduling issues

Improves accuracy and convergence

DBRSplash an efficient distributed parallel inference algorithm

Over-partitioning to improve work balance

Experimental results on large factor graphs:

Linear to super-linear speed-up using up to 120 processors

Carnegie Mellon

Thank You

AcknowledgementsIntel Research Pittsburgh: OpenCirrus Cluster

AT&T LabsDARPA

Exponential ParallelismFrom Saman Amarasinghe:

1985 199019801970 1975 1995 2000 2005

Power4Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

# ofcores

Opteron 4PXeon MP

AmbricAM2045

80868080 286 386 486 Pentium P2 P3P4

Itanium

Itanium 2Athlon

AIStats09 Speedup3D –Video Prediction Protein Side-Chain Prediction

pptx - Distributed Parallel Inference on Large Factor Graphs

belief schedulingdepends

cpu nmap

distributed parallel

splash sizeusing splash

belief propagation bpmessage

old messages

parallel synchronous

optimal splash size22better

Documents

Inference on Graphs: From Probability Methods to Deep...

Generalized graphlet kernels for probabilistic inference...

Introduction to Causal Inference and directed acyclic...

Convolutional Networks with Adaptive Computation...

Concurrent Inference Graphs

Factor Graphs and Inference - University at...

Inference Graphs: A Roadmap

INTERVENTIONS AND INFERENCE / REASONING. Causal models ...

Statistical Inference on Random Dot Product Graphs: a Survey

Semi-Blind Inference of Topologies and Dynamical Processes.....

1998: ADAPTIVE SAMPLING IN GRAPHS · based approach to...

Convolutional Networks with Adaptive Inference Graphs ·...

Towards Scalable Cluster Auditing through Grammatical...

Protein and gene model inference based on statistical...

Statistical inference on graphs: Selected...

(Hyper)-Graphs Inference via Convex Relaxations and … ·....