8/13/2019 Brice Lecture
1/41
Communication Cost of Distributed Computing
Abbas El Gamal
Stanford University
Brice Colloquium, 2009
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 1 / 41
http://-/?-http://-/?-8/13/2019 Brice Lecture
2/41
Motivation
Performance of distributed information processing systems oftenlimited by communicationDigital VLSI
Multi-processors
Data centersPeer-peer networks
Sensor networks
Networked mobile agentsPurpose of communication is to make decision, compute function,coordinate action based on distributed data
How much communication is needed to perform such a task?
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 2 / 41
8/13/2019 Brice Lecture
3/41
Wireless Video Camera NetworkTodays surveillance systems: Analog; costly; human operatedFuture systems: Digital; networked; self-conguring; automateddetection, e.g., suspicious activity, localization, tracking, . . .
Sending all video data requires large communication bandwidth , energyA. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 3 / 41
8/13/2019 Brice Lecture
4/41
Distributed Computing Approach (YSEG 04, EEG 07)
Scan-lines
Clusterhead
Local processing
Subtract backgroundScan-lines
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 4 / 41
8/13/2019 Brice Lecture
5/41
Experimental Setup
View of the setup
View froma camera
Top viewof room
Scan-linesfrom 16
cameras
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 5 / 41
8/13/2019 Brice Lecture
6/41
Problem Setup
Network with m nodes. Node j has data x j Node j wishes to estimate g j (x 1, x 2, . . . , x m ), j = 1 , 2, . . . , mNodes communicate and perform local computing
Network
x 1 x 2
x 3
x j x m
What is the minimum amount of communication needed?
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 6 / 41
8/13/2019 Brice Lecture
7/41
Communication Cost of Distributed Computing
Problem formulated and studied in several elds:Computer science: Communication complexity; gossipInformation theory: Coding for computing; CEO problemControl: Distributed consensus
Formulations differ in:Data modelType of communication-local computing protocolEstimation criterionMetric for communication cost
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 7 / 41
8/13/2019 Brice Lecture
8/41
Outline
1 Communication Complexity
2 Information Theory: Lossless Computing
3 Information Theory: Lossy Computing
4 Distributed Consensus
5 Distributed Lossy Averaging
6 Conclusion
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 8 / 41
Communication Complexity
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-8/13/2019 Brice Lecture
9/41
Communication Complexity
Communication Complexity (Yao 79)
Two nodes, two-way communicationNode j = 1 , 2 has binary k -vector x j Both nodes wish to compute g (x 1, x 2)Use round robin communication and local computing protocol
x 1 x 2g g
What is minimum number of bits of communication needed(communication complexity, C (g )) ?
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 9 / 41
Communication Complexity
8/13/2019 Brice Lecture
10/41
Communication Complexity
Example (Equality Function)g (x 1, x 2) = 1 if x 1 = x 2, and 0, otherwise
Upper bound: C (g ) k + 1 bitsFor k = 2: C (g ) 3 bits
x 1
x 2
11100100
00 01 10 11
0 0 0 10 0 1 00 1 0 01 0 0 0
Lower bound: Every communication protocol partitions {0, 1}k { 0, 1}k
into g -monochromatic rectangles , each having distinct code If r (g ) is minimum number of such rectangles, then C (g ) log r (g )For k = 2 , r (g ) = 8 C (g ) 3 bitsIn general, C (g ) k + 1 C (g ) = k + 1 bits
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 10 / 41
Communication Complexity
8/13/2019 Brice Lecture
11/41
Communication Complexity
Number-on-Forehead Game
m players. Player j has binary k -vector x j on her forehead
Player j can see all numbers except her own, x j Every player wants to know if x 1 = x 2 = . . . = x m or notPlayers take turns in broadcasting to all other players
Recall for m = 2, C (g ) = k + 1
What is C (g ) for m 3?
Consider the following protocol:Player 1 broadcasts A = 1 if x 2 = x 3 = . . . = x m , A = 0, otherwiseIf A = 0, then all players know that values are not equalIf A = 1 only player 1 does not know if all values are equal or not;Player 2 broadcasts B = 1 if x 1 = x 3 , B = 0, otherwise
C (g ) = 2 bits suffice independent of m and k !!
Overlap in information can signicantly reduce communication
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 11 / 41
Communication Complexity
8/13/2019 Brice Lecture
12/41
p y
More on Communication ComplexityOther denitions of communication complexity:
Average: Uniform distribution over node valuesRandomized: Nodes use coin ips in communicationNon-deterministic : Non-zero probability of computing error
Some references:
A. Yao, Some complexity questions related to distributivecomputing, STOC, 1979
A. Orlitsky, A. El Gamal, Average and randomized communicationcomplexity, IEEE Trans. on Info. Th., 1990
R. Gallager, Finding parity in a simple broadcast network, IEEE Trans. on Info. Th. , 1988
E. Kushilevitz and N. Nisan, Communication Complexity, CambridgeUniversity Press, 2006
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 12 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
13/41
y p g
Information Theory: Lossless Computing
m node network. Node j observes source X j X j generates i.i.d. process {X j 1, X j 2, . . .}, X ji X a nite setSources may be correlated: (X 1, X 2, . . . , X m ) p (x 1, x 2, . . . , x m )Node j wishes to nds an estimate g ji of g j (X 1i , X 2i , . . . , X mi ) foreach sample i = 1 , 2, . . .
Block coding: Nodes communicate, perform local computing overblocks of n-samples
Lossless criterion
g n
j = g n
j for all j { 1, 2, . . . , m} with high probability (whp )
What is the minimum sum rate in bits/sample-tuple?Problem is in general open, even for 3-nodes
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 13 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
14/41
y p g
Lossless Compression
Two nodes, one-way communication
Sender node observes i.i.d. source X p (x )Receiver node wishes to losslessly estimate X n of X n
X n X n
Theorem (Shannon 1948)The minimum lossless compression rate is the entropy of X
H (X ) := E (log p (X )) bits/sample
Example (Binary Source)X Bern (p ): H (p ) := p log p (1 p ) log(1 p ) [0, 1]
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 14 / 41
8/13/2019 Brice Lecture
15/41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
16/41
Lossless Compression with Side InformationTwo nodes, one way communication
Sender node has i.i.d. source X , receiver node has i.i.d. source Y (X , Y ) p (x , y )Receiver node wishes to estimate X n losslessly
X n Y n X n
H (X ) sufficient; Can we do better?If sender also knows Y n , then minimum rate is conditional entropy
H (X |Y ) := EX , Y (log p (X |Y )) H (X )
Theorem (Slepian-Wolf 1973)Minimum rate with side information at receiver only is H (X |Y )
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 16 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
17/41
Proof uses random binning
Example (Doubly Symmetric Binary Sources)Y Bern (1/ 2) and Z Bern (p ) be independent, X = Y Z Receiver has Y , wishes to losslessly estimate X
Minimum rate: H (X |Y ) = H (Y Z |Y ) = H (Z ) = H (p )For example, if Z Bern (0.01), H (p ) 0.08Versus H (X ) = H (1/ 2) = 1 if correlation is ignored
RemarksShannon theorem holds for error-free compression (e.g., Lempel-Ziv)Slepian-Wolf does not hold in general for error-free compression
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 17 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
18/41
Lossless Computing with Side Information
Receiver node wishes to compute g (X , Y )Let R g be minimum rate
X n Y n g n
Upper and lower Bounds
R g H (X |Y )R g H (g (X , Y )|Y )
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 18 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
19/41
Bounds sometimes coincide
Example (Mod-2 Sum)
Y Bern (1/ 2), Z Bern (p ) independent, X = Y Z Function : g (X , Y ) = X Y = Z Lower bound: H (X Y |Y ) = H (Z |Y ) = H (p )Upper bound: H (X |Y ) = H (p )
R g = H (p )
Bounds do not coincide in general
ExampleX = ( V 1, V 2, . . . , V 10), V j i.i.d. Bern (1/ 2), Y Unif {1, 2, . . . , 10}Function : g (X , Y ) = V Y Lower bound: H (V Y |Y ) = 1 bitUpper bound: H (X |Y ) = 10 bitsCan show that R g = 10 bits
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 19 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
20/41
Theorem (Orlitsky, Roche 2001)
R g = H G (X |Y )H G is conditional entropy of characteristic graph of X , Y, and g
Generalizations:Two way, two rounds:
A. Orlisky, J. Roche, Coding for computing, IEEE Trans. on Info.Th. , 2001
Innite number of rounds:N. Ma, P. Ishwar, Two-terminal distributed source coding withalternating messages for function computation, ISIT , 2008
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 20 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
21/41
Lossless Computing: m = 3
2-sender, 1-receiver network, one-way communicationReceiver wishes to estimate a function g (X , Y ) losslesslyWhat is the minimum sum rate in bits/sample-pair?
X n
Y n
g n
Problem is in general open
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 21 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
22/41
Distributed Lossless Compression
Let g (X , Y ) = ( X , Y )
Theorem (Slepian-Wolf)The minimum achievable sum rate is the joint entropy
H (X , Y ) = H (X ) + H (Y |X )= H (Y ) + H (X |Y )
Same as if X , Y were jointly encoded!Achievability uses random binningTheorem can be generalized to m nodes
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 22 / 41
Information Theory: Lossless Computing
8/13/2019 Brice Lecture
23/41
Distributed Mod-2 Sum Computing
Let Y Bern (1/ 2), Z Bern (p ) independent, Y = X Z
Receiver wishes to compute g (X , Y ) = X Y = Z Slepian-Wolf gives sum rate of H (X , Y ) = 1 + H (p )Can we do better?
Theorem (Korner, Marton 1979)Minimum sum rate is: 2H (p )
Minimum sum rate can be nH (p ), AZ n uniquely determines Z n whp
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 23 / 41
Information Theory: Lossy Computing
8/13/2019 Brice Lecture
24/41
Information Theory: Lossy Computing
Node j observes i.i.d. source X j Every node wishes to estimate g (X 1i , X 2i , . . . , X mi ), i = 1 , 2, . . .Nodes communicate and perform local computingAs in lossless case, we use block codes
MSE distortion criterion
1mn
m
j =1
n
i =1
E (g ij g i )2 D
What is the minimum sum rate R (D ) bits/sample-tuple?Problem is in general open (even for 2-nodes)
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 24 / 41
Information Theory: Lossy Computing
8/13/2019 Brice Lecture
25/41
Lossy Compression
Two nodes, one-way communication
Sender node observes i.i.d. Gaussian source X N (0, P )Receiver node wishes to estimate X to prescribed MSE distortion D
X n X n
What is the minimum required rate R (D )?
Theorem (Shannon 1949)
Let d := D / P be normalized distortion. The rate distortion function is
R (D ) =12 log
1d d 1
0 d > 1
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 25 / 41
Information Theory: Lossy Computing
8/13/2019 Brice Lecture
26/41
Lossy Compression with Side Information
Two nodes, one-way communication(X , Y ) Gaussian; 0 mean; average power P ; correlation coeff. Receiver wishes to estimate X to MSE distortion D
X n Y n X n
Theorem (Wyner-Ziv 1976)
R (D ) =12 log
(1 2)d d < (1
2
)0 otherwise
d = D / P
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 26 / 41
Information Theory: Lossy Computing
8/13/2019 Brice Lecture
27/41
Lossy Averaging with Side InformationReceiver node now wishes to estimate (X + Y )/ 2 to MSE distortion D
PropositionR (D ) =
12 log
(1 2)d d < (1
2)
0 otherwise
d = 4D / P
Same rate as sending X / 2 with MSE distortion D For two-way; both nodes wish to estimate average with MSE D
Proposition (Su, El Gamal 2009)
R (D ) =log (1
2)d d < (1
2)
0 otherwise
Same as two independent one-way roundsA. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 27 / 41
8/13/2019 Brice Lecture
28/41
Distributed Consensus
8/13/2019 Brice Lecture
29/41
Distributed Consensus: MotivationDistributed coordination, information aggregation in distributed systems
Flocking and schooling behaviors in nature
Spread of rumors, epidemics
Coordination of mobile autonomous vehicles
Load balancing in parallel computers
Finding le sizes, free memory size in peer-to-peer networks
Information aggregation in sensor networks
In many of these scenarios, communication is asynchronous, subject to linkand node failures and topology changes
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 29 / 41
Distributed Consensus
8/13/2019 Brice Lecture
30/41
Distributed Averaging
m-node network. Node j observes real-valued scalar x j Every node wishes to estimate the average (1/ m) m j =1 x j Communication performed in rounds , for example,
2 nodes selected in each roundThey exchange information and perform local computing
Distributed Protocol : Communication and local computing do not depend on node identity
Synchronous (deterministic) : Node-pair selection predeterminedAsynchronous (Gossip): Node-pairs selected at random
How many rounds are needed to estimate the average (time-to-consenus)?
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 30 / 41
Distributed Consensus
8/13/2019 Brice Lecture
31/41
Recent Related Work
D. Kemp, A. Dobra, J. Gehrke, Gossip-based computation of aggregate information, FOCS , 2003
L. Xiao and S. Boyd, Fast linear iterations for distributed averaging,CDC, 2003
S. Boyd, A. Ghosh, B. Prabhakar, D. Shah, Gossip algorithms:design, analysis and applications, INFOCOM, 2005 gossip,convergence time as function of graph connectivity
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 31 / 41
Distributed Consensus
8/13/2019 Brice Lecture
32/41
Example: Centralized Protocol
Node 1 acts as cluster-headRounds 14: Cluster-headreceives values from other nodes
The cluster head computes theaverageRounds 58: Cluster-head sendsaverage to other nodesThe number of rounds form-node network: 2m 2Computation is perfect
x 1
x 2
x 3 x 4
x 5
7.1
4.3
8.2 5.5
2.9
7.1 4. 3
4.3
4.3
8 . 2
8.2
4.3, 8.2
5 . 5
5.5
4.3, 8.2, 5.5
2 .9
2.9
4.3, 8.2, 5.5, 2.9
5.6 5. 6
5.65.6 5
. 6
5.65.6
5 . 6
5.65.6
5 .6
5.6
5.6
5.6
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 32 / 41
8/13/2019 Brice Lecture
33/41
Distributed Consensus
I
8/13/2019 Brice Lecture
34/41
Issues
Communicating/ computing with innite precision not realistic
Number of rounds not good measure of communication cost
Quantized consensus:L. Xiao, S. Boyd, S.J. Kim, Distributed average consensus withleast-mean-square deviation, J. Parallel Distrib. Comput., 2007
M. Yildiz, A. Scaglione, Coding with side information for rateconstrained consensus, IEEE Trans. on Sig. Proc., 2008
O. Ayaso, D. Shah, M. Dahleh, Information theoretic bounds ondistributed computatiton, preprint, 2008
We recently studied lossy computing formulation:H. Su, A. El Gamal,Distributed lossy averaging, ISIT , 2009
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 34 / 41
Distributed Lossy Averaging
Di ib d L A i (S El G l 2009)
8/13/2019 Brice Lecture
35/41
Distributed Lossy Averaging (Su, El Gamal 2009)
Network with m nodes
Node j observes Gaussian i.i.d. source X j N (0, 1)Assume sources are independentEach node wishes to estimate the average g n := (1 / m) m j =1 X
n j
Rounds of two-way, node-pair communication; block codes used
Network rate distortion functionR (D ) is minimum sum rate such that
1mn
m
j =1
n
i =1E (g ij g i )2 D
R (D ) known only for m = 2
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 35 / 41
Distributed Lossy Averaging
C t t L B d R (D )
8/13/2019 Brice Lecture
36/41
Cutset Lower Bound on R (D )
x j
x 1
x 2x 3
x 4
x 5
x 6
x mSuper-nodeP = m 1m 2
Theorem (Cutset Lower Bound (Su, El Gamal 20090))
R (D ) m2
log m 1m2D
for D < (m 1)/ m2
Bound is tight for m = 2Can achieve within factor of 2 using centralized protocol for large m
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 36 / 41
Distributed Lossy Averaging
G i P t l
8/13/2019 Brice Lecture
37/41
Gossip ProtocolTwo nodes are selected at random in each round
Use weighted-average-and-compress schemeBefore communication, node j sets g n j (0) = X n
j
At round t + 1: Suppose nodes j , k selected:Nodes exchange lossy estimates of their states g n j (t ), g nk (t )at normalized distortion d Nodes update their estimates:
g n j (t + 1) = 12
g n j (t ) + 1
2(1 d )g nk (t ),
g nk (t + 1) = 1
2g nk (t ) +
1
2(1 d )
g n j (t )
Final estimate for node j is g n j (T )Rate at each round is 2 (1/ 2) log(1/ d )Update equations reduce to gossip when d = 0, rate
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 37 / 41
Distributed Lossy Averaging
Gossip Protocol
8/13/2019 Brice Lecture
38/41
Gossip Protocol
Let E(R (D )) be expected sum rate over node-pair selectionR (D ) E(R (D ))
Can show that E(R (D )) larger than cutset bound by roughly a factorof logmCutset bound achievable within factor to 2 using centralized protocol
Price of gossiping is roughly logm
Protocol does not exploit build-up in correlation to reduce rate
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 38 / 41
Distributed Lossy Averaging
Effect of Using Correlation (m = 50)
8/13/2019 Brice Lecture
39/41
Effect of Using Correlation (m = 50)
0 0.005 0.01 0.015 0.020
20
40
60
80
D
E ( R ( D ) )
Without CorrelationWith Correlation)
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 39 / 41
Conclusion
Summary
8/13/2019 Brice Lecture
40/41
Summary
Communication cost of distributed computing studied in several elds
Source Coding/ Estimation CommunicationModel Computing Criterion Cost
Comm. Comp. discrete, scalar zero error bitsScalar
Info. theory random block lossless bits/sample-tupleLossy
Dist. Consensus continuous, scalar MSE roundsscalar
Many cool results, but much remains to be done
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 40 / 41
Conclusion
8/13/2019 Brice Lecture
41/41
Thank You
A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 41 / 41