Brice Lecture

8/13/2019 Brice Lecture

1/41

Communication Cost of Distributed Computing

Abbas El Gamal

Stanford University

Brice Colloquium, 2009

A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 1 / 41
http://-/?-http://-/?-


2/41

Motivation

Performance of distributed information processing systems oftenlimited by communicationDigital VLSI

Multi-processors

Data centersPeer-peer networks

Sensor networks

Networked mobile agentsPurpose of communication is to make decision, compute function,coordinate action based on distributed data

How much communication is needed to perform such a task?



3/41

Wireless Video Camera NetworkTodays surveillance systems: Analog; costly; human operatedFuture systems: Digital; networked; self-conguring; automateddetection, e.g., suspicious activity, localization, tracking, . . .

Sending all video data requires large communication bandwidth , energyA. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 3 / 41


4/41

Distributed Computing Approach (YSEG 04, EEG 07)

Scan-lines

Clusterhead

Local processing

Subtract backgroundScan-lines



5/41

Experimental Setup

View of the setup

View froma camera

Top viewof room

Scan-linesfrom 16

cameras



6/41

Problem Setup

Network with m nodes. Node j has data x j Node j wishes to estimate g j (x 1, x 2, . . . , x m ), j = 1 , 2, . . . , mNodes communicate and perform local computing

Network

x 1 x 2

x 3

x j x m

What is the minimum amount of communication needed?



7/41

Communication Cost of Distributed Computing

Problem formulated and studied in several elds:Computer science: Communication complexity; gossipInformation theory: Coding for computing; CEO problemControl: Distributed consensus

Formulations differ in:Data modelType of communication-local computing protocolEstimation criterionMetric for communication cost



8/41

Outline

1 Communication Complexity

2 Information Theory: Lossless Computing

3 Information Theory: Lossy Computing

4 Distributed Consensus

5 Distributed Lossy Averaging

6 Conclusion


Communication Complexity
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


9/41


Communication Complexity (Yao 79)

Two nodes, two-way communicationNode j = 1 , 2 has binary k -vector x j Both nodes wish to compute g (x 1, x 2)Use round robin communication and local computing protocol

x 1 x 2g g

What is minimum number of bits of communication needed(communication complexity, C (g )) ?




10/41


Example (Equality Function)g (x 1, x 2) = 1 if x 1 = x 2, and 0, otherwise

Upper bound: C (g ) k + 1 bitsFor k = 2: C (g ) 3 bits

x 1

x 2

11100100

00 01 10 11

0 0 0 10 0 1 00 1 0 01 0 0 0

Lower bound: Every communication protocol partitions {0, 1}k { 0, 1}k

into g -monochromatic rectangles , each having distinct code If r (g ) is minimum number of such rectangles, then C (g ) log r (g )For k = 2 , r (g ) = 8 C (g ) 3 bitsIn general, C (g ) k + 1 C (g ) = k + 1 bits




11/41


Number-on-Forehead Game

m players. Player j has binary k -vector x j on her forehead

Player j can see all numbers except her own, x j Every player wants to know if x 1 = x 2 = . . . = x m or notPlayers take turns in broadcasting to all other players

Recall for m = 2, C (g ) = k + 1

What is C (g ) for m 3?

Consider the following protocol:Player 1 broadcasts A = 1 if x 2 = x 3 = . . . = x m , A = 0, otherwiseIf A = 0, then all players know that values are not equalIf A = 1 only player 1 does not know if all values are equal or not;Player 2 broadcasts B = 1 if x 1 = x 3 , B = 0, otherwise

C (g ) = 2 bits suffice independent of m and k !!

Overlap in information can signicantly reduce communication




12/41

p y

More on Communication ComplexityOther denitions of communication complexity:

Average: Uniform distribution over node valuesRandomized: Nodes use coin ips in communicationNon-deterministic : Non-zero probability of computing error

Some references:

A. Yao, Some complexity questions related to distributivecomputing, STOC, 1979

A. Orlitsky, A. El Gamal, Average and randomized communicationcomplexity, IEEE Trans. on Info. Th., 1990

R. Gallager, Finding parity in a simple broadcast network, IEEE Trans. on Info. Th. , 1988

E. Kushilevitz and N. Nisan, Communication Complexity, CambridgeUniversity Press, 2006


Information Theory: Lossless Computing


13/41

y p g


m node network. Node j observes source X j X j generates i.i.d. process {X j 1, X j 2, . . .}, X ji X a nite setSources may be correlated: (X 1, X 2, . . . , X m ) p (x 1, x 2, . . . , x m )Node j wishes to nds an estimate g ji of g j (X 1i , X 2i , . . . , X mi ) foreach sample i = 1 , 2, . . .

Block coding: Nodes communicate, perform local computing overblocks of n-samples

Lossless criterion

g n

j = g n

j for all j { 1, 2, . . . , m} with high probability (whp )

What is the minimum sum rate in bits/sample-tuple?Problem is in general open, even for 3-nodes




14/41

y p g

Lossless Compression

Two nodes, one-way communication

Sender node observes i.i.d. source X p (x )Receiver node wishes to losslessly estimate X n of X n

X n X n

Theorem (Shannon 1948)The minimum lossless compression rate is the entropy of X

H (X ) := E (log p (X )) bits/sample

Example (Binary Source)X Bern (p ): H (p ) := p log p (1 p ) log(1 p ) [0, 1]



15/41



16/41

Lossless Compression with Side InformationTwo nodes, one way communication

Sender node has i.i.d. source X , receiver node has i.i.d. source Y (X , Y ) p (x , y )Receiver node wishes to estimate X n losslessly

X n Y n X n

H (X ) sufficient; Can we do better?If sender also knows Y n , then minimum rate is conditional entropy

H (X |Y ) := EX , Y (log p (X |Y )) H (X )

Theorem (Slepian-Wolf 1973)Minimum rate with side information at receiver only is H (X |Y )




17/41

Proof uses random binning

Example (Doubly Symmetric Binary Sources)Y Bern (1/ 2) and Z Bern (p ) be independent, X = Y Z Receiver has Y , wishes to losslessly estimate X

Minimum rate: H (X |Y ) = H (Y Z |Y ) = H (Z ) = H (p )For example, if Z Bern (0.01), H (p ) 0.08Versus H (X ) = H (1/ 2) = 1 if correlation is ignored

RemarksShannon theorem holds for error-free compression (e.g., Lempel-Ziv)Slepian-Wolf does not hold in general for error-free compression




18/41

Lossless Computing with Side Information

Receiver node wishes to compute g (X , Y )Let R g be minimum rate

X n Y n g n

Upper and lower Bounds

R g H (X |Y )R g H (g (X , Y )|Y )




19/41

Bounds sometimes coincide

Example (Mod-2 Sum)

Y Bern (1/ 2), Z Bern (p ) independent, X = Y Z Function : g (X , Y ) = X Y = Z Lower bound: H (X Y |Y ) = H (Z |Y ) = H (p )Upper bound: H (X |Y ) = H (p )

R g = H (p )

Bounds do not coincide in general

ExampleX = ( V 1, V 2, . . . , V 10), V j i.i.d. Bern (1/ 2), Y Unif {1, 2, . . . , 10}Function : g (X , Y ) = V Y Lower bound: H (V Y |Y ) = 1 bitUpper bound: H (X |Y ) = 10 bitsCan show that R g = 10 bits




20/41

Theorem (Orlitsky, Roche 2001)

R g = H G (X |Y )H G is conditional entropy of characteristic graph of X , Y, and g

Generalizations:Two way, two rounds:

A. Orlisky, J. Roche, Coding for computing, IEEE Trans. on Info.Th. , 2001

Innite number of rounds:N. Ma, P. Ishwar, Two-terminal distributed source coding withalternating messages for function computation, ISIT , 2008




21/41

Lossless Computing: m = 3

2-sender, 1-receiver network, one-way communicationReceiver wishes to estimate a function g (X , Y ) losslesslyWhat is the minimum sum rate in bits/sample-pair?

X n

Y n

g n

Problem is in general open




22/41

Distributed Lossless Compression

Let g (X , Y ) = ( X , Y )

Theorem (Slepian-Wolf)The minimum achievable sum rate is the joint entropy

H (X , Y ) = H (X ) + H (Y |X )= H (Y ) + H (X |Y )

Same as if X , Y were jointly encoded!Achievability uses random binningTheorem can be generalized to m nodes




23/41

Distributed Mod-2 Sum Computing

Let Y Bern (1/ 2), Z Bern (p ) independent, Y = X Z

Receiver wishes to compute g (X , Y ) = X Y = Z Slepian-Wolf gives sum rate of H (X , Y ) = 1 + H (p )Can we do better?

Theorem (Korner, Marton 1979)Minimum sum rate is: 2H (p )

Minimum sum rate can be nH (p ), AZ n uniquely determines Z n whp


Information Theory: Lossy Computing


24/41


Node j observes i.i.d. source X j Every node wishes to estimate g (X 1i , X 2i , . . . , X mi ), i = 1 , 2, . . .Nodes communicate and perform local computingAs in lossless case, we use block codes

MSE distortion criterion

1mn

m

j =1

n

i =1

E (g ij g i )2 D

What is the minimum sum rate R (D ) bits/sample-tuple?Problem is in general open (even for 2-nodes)




25/41

Lossy Compression

Two nodes, one-way communication

Sender node observes i.i.d. Gaussian source X N (0, P )Receiver node wishes to estimate X to prescribed MSE distortion D

X n X n

What is the minimum required rate R (D )?

Theorem (Shannon 1949)

Let d := D / P be normalized distortion. The rate distortion function is

R (D ) =12 log

1d d 1

0 d > 1




26/41

Lossy Compression with Side Information

Two nodes, one-way communication(X , Y ) Gaussian; 0 mean; average power P ; correlation coeff. Receiver wishes to estimate X to MSE distortion D

X n Y n X n

Theorem (Wyner-Ziv 1976)

R (D ) =12 log

(1 2)d d < (1

2

)0 otherwise

d = D / P




27/41

Lossy Averaging with Side InformationReceiver node now wishes to estimate (X + Y )/ 2 to MSE distortion D

PropositionR (D ) =

12 log

(1 2)d d < (1

2)

0 otherwise

d = 4D / P

Same rate as sending X / 2 with MSE distortion D For two-way; both nodes wish to estimate average with MSE D

Proposition (Su, El Gamal 2009)

R (D ) =log (1

2)d d < (1

2)

0 otherwise

Same as two independent one-way roundsA. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 27 / 41


28/41

Distributed Consensus


29/41

Distributed Consensus: MotivationDistributed coordination, information aggregation in distributed systems

Flocking and schooling behaviors in nature

Spread of rumors, epidemics

Coordination of mobile autonomous vehicles

Load balancing in parallel computers

Finding le sizes, free memory size in peer-to-peer networks

Information aggregation in sensor networks

In many of these scenarios, communication is asynchronous, subject to linkand node failures and topology changes




30/41

Distributed Averaging

m-node network. Node j observes real-valued scalar x j Every node wishes to estimate the average (1/ m) m j =1 x j Communication performed in rounds , for example,

2 nodes selected in each roundThey exchange information and perform local computing

Distributed Protocol : Communication and local computing do not depend on node identity

Synchronous (deterministic) : Node-pair selection predeterminedAsynchronous (Gossip): Node-pairs selected at random

How many rounds are needed to estimate the average (time-to-consenus)?




31/41

Recent Related Work

D. Kemp, A. Dobra, J. Gehrke, Gossip-based computation of aggregate information, FOCS , 2003

L. Xiao and S. Boyd, Fast linear iterations for distributed averaging,CDC, 2003

S. Boyd, A. Ghosh, B. Prabhakar, D. Shah, Gossip algorithms:design, analysis and applications, INFOCOM, 2005 gossip,convergence time as function of graph connectivity




32/41

Example: Centralized Protocol

Node 1 acts as cluster-headRounds 14: Cluster-headreceives values from other nodes

The cluster head computes theaverageRounds 58: Cluster-head sendsaverage to other nodesThe number of rounds form-node network: 2m 2Computation is perfect

x 1

x 2

x 3 x 4

x 5

7.1

4.3

8.2 5.5

2.9

7.1 4. 3

4.3

4.3

8 . 2

8.2

4.3, 8.2

5 . 5

5.5

4.3, 8.2, 5.5

2 .9

2.9

4.3, 8.2, 5.5, 2.9

5.6 5. 6

5.65.6 5

. 6

5.65.6

5 . 6

5.65.6

5 .6

5.6

5.6

5.6



33/41


I


34/41

Issues

Communicating/ computing with innite precision not realistic

Number of rounds not good measure of communication cost

Quantized consensus:L. Xiao, S. Boyd, S.J. Kim, Distributed average consensus withleast-mean-square deviation, J. Parallel Distrib. Comput., 2007

M. Yildiz, A. Scaglione, Coding with side information for rateconstrained consensus, IEEE Trans. on Sig. Proc., 2008

O. Ayaso, D. Shah, M. Dahleh, Information theoretic bounds ondistributed computatiton, preprint, 2008

We recently studied lossy computing formulation:H. Su, A. El Gamal,Distributed lossy averaging, ISIT , 2009


Distributed Lossy Averaging

Di ib d L A i (S El G l 2009)


35/41

Distributed Lossy Averaging (Su, El Gamal 2009)

Network with m nodes

Node j observes Gaussian i.i.d. source X j N (0, 1)Assume sources are independentEach node wishes to estimate the average g n := (1 / m) m j =1 X

n j

Rounds of two-way, node-pair communication; block codes used

Network rate distortion functionR (D ) is minimum sum rate such that

1mn

m

j =1

n

i =1E (g ij g i )2 D

R (D ) known only for m = 2



C t t L B d R (D )


36/41

Cutset Lower Bound on R (D )

x j

x 1

x 2x 3

x 4

x 5

x 6

x mSuper-nodeP = m 1m 2

Theorem (Cutset Lower Bound (Su, El Gamal 20090))

R (D ) m2

log m 1m2D

for D < (m 1)/ m2

Bound is tight for m = 2Can achieve within factor of 2 using centralized protocol for large m



G i P t l


37/41

Gossip ProtocolTwo nodes are selected at random in each round

Use weighted-average-and-compress schemeBefore communication, node j sets g n j (0) = X n

j

At round t + 1: Suppose nodes j , k selected:Nodes exchange lossy estimates of their states g n j (t ), g nk (t )at normalized distortion d Nodes update their estimates:

g n j (t + 1) = 12

g n j (t ) + 1

2(1 d )g nk (t ),

g nk (t + 1) = 1

2g nk (t ) +

1

2(1 d )

g n j (t )

Final estimate for node j is g n j (T )Rate at each round is 2 (1/ 2) log(1/ d )Update equations reduce to gossip when d = 0, rate



Gossip Protocol


38/41

Gossip Protocol

Let E(R (D )) be expected sum rate over node-pair selectionR (D ) E(R (D ))

Can show that E(R (D )) larger than cutset bound by roughly a factorof logmCutset bound achievable within factor to 2 using centralized protocol

Price of gossiping is roughly logm

Protocol does not exploit build-up in correlation to reduce rate



Effect of Using Correlation (m = 50)


39/41

Effect of Using Correlation (m = 50)

0 0.005 0.01 0.015 0.020

20

40

60

80

D

E ( R ( D ) )

Without CorrelationWith Correlation)


Conclusion

Summary


40/41

Summary

Communication cost of distributed computing studied in several elds

Source Coding/ Estimation CommunicationModel Computing Criterion Cost

Comm. Comp. discrete, scalar zero error bitsScalar

Info. theory random block lossless bits/sample-tupleLossy

Dist. Consensus continuous, scalar MSE roundsscalar

Many cool results, but much remains to be done


Conclusion


41/41

Thank You


Brice Lecture

Documents

Brice Lecture