Gossip Algorithm

8/13/2019 Gossip Algorithm

1/12

Gossip Algorithms: Design, Analysis andApplications

Stephen Boyd Arpita Ghosh Salaji Prabhakar Devavrat Shah *Information Systems Laboratory, Stanford UniversityStanford, CA 94105-9510

Ahtruct- Motivated by applications to sensor, peer-to-peer and ad hoc networks, we study distributed asyn-chronous algorithms, also known as gossip algorithms, forcomputation and information exchange in an arbitrarilyconnected network of nodes. Nodes in such networksoperate under limited computational, communication andenergy resources. These constraints naturally give rise togossip algorithms: schemes which distribute the compu-tational burden and in which a node communicates witha randomly chosen neighbor.

We analyze the averaging problem under the gossip con-straint for arbitrary network, and find that the averagingtime of a gossip algorithm depends on the second largesteigenvalue of a doubly stochastic mairix characterizing thealgorithm. Using recent results of Boyd, Diaconis and Xiao(2003),we show that minimizing this quantity to designthe fastest averaging algorithm on the network is a semi-definite program(SDP). In general, SDPs cannot be solveddistributedly; however, exploiting problem structure, w epropose a subgradient method that distrihutedly solves theoptimization problem over the network.The relation of averaging time to the second largesteigenvalue naturally relates it to the mixing time of arandom walk with transition probabilities that are derived

from the gossip algorithm. We use this connection tostudy the performance of gossip algorithm on two popularnetworks: Wireless Sensor Networks, which are modeledas Geometric Random Graphs, and the Internet graphunder the so-called Preferential Connectivity Model.

I . INTRODUCTIONThe advent of sen sor, wireless ad hoc and peer-to-peernetworks has necessitated the design of asynchronous,distributed and fault-tolerant computation and informa-tion exchange algorithms. This is mainly because suchnetworks are constrained by the following operationalcharacteristics: (i) they may not have a centralized entity*Author names appear i n alphabetical order.This work is supported in part by a Stanford Graduate Fellowship,and by C2S2. the MARCO Focus Center for Circuit and SystemSolution. under MARCO contract 2003-CT-SS8.

devavrat}@stanford.edu

for facilitating computation, communication and time-synchronization, (ii) the network topology may not becomplete ly h o w n to the nodes of the network, (iii)nodes may join or leave the network (even expire),so that the network topology itself may change, and(iv) in the case of sensor networks, the computationalpower and energy resources may be very limited. Theseconstraints motivate the design of simple asynchronousdecentralized algorithms for computation where eachnode exchanges information with only a few of itsimmediate neighbors in a time instance (or, a round).The goal in this setting is to design algorithms so thatthe desired computation and communication is done asquickly and efficiently as possible.

We study the problem of averaging as an instanceof the distributed computation problem, A toy exampleto explain the motivation for the averaging problem issensing temperature of some small region of space bya network of sensors. For example, in Figure 1 , sensorsare deployed to measure the temperature T of a source.Sensor i , i = 1, . . 4 measures T;=T + v i , where theare IID ero mean Gaussian sensor noise variables.The unbiased, minimum mean squared error (MMSE)estimate is the average P = - hus, to combat

= T

Fig. 1. Sensor nodes deployed to measure ambienttemperature.minor fluctuations in the ambient temperature and thenoise in sensor reading s, the nodes need to average theirreadings,

0-7803-8968-9/05/ 20.00 C)2005 I EEE 1653
mailto:devavrat%[email protected]:devavrat%[email protected]


2/12

Distributed averaging arises in many applications suchas coordination of autonomous agents, estimation anddistributed data fusion on ad-hoc networks, and de-centralized optimization. ' Fast distributed averagingalgorithms are also important in other contexts; seeKempe et a1 [KDGO3], for example. For an extensivebody of related work, see [KK02],EKKDOlI, [HHLW,[GvRBOl1, [KEWO2], [MFHHOZ], [vROO], EGHK991,EIEGH021, [KSSVOOa], [SMKfOl I, RFH+OI] .

This pape r undertakes an in-depth study of the designand analysis of gossip algorithms for averaging in anarbitrrrril l connected network oE nodes. (By gossip algo-rithm, we mean specifically an algorithm in which eachnode communicates with no m ore than one neighbour ineach time slot.} Thus, given a graph G , we determinethe averaging time, Tave,which is the time taken forthe value at each node to be close to the averagevalue (a more precise definition is given later). We findthat the averaging time depends on the second largesteigenvalue of a doubly stochastic matrix characterizingthe averaging algorithm: the smaller this eigenvalue, thefaster the averaging algorithm. The fastest averagingalgorithm is obtained by m inimizing this eigenvalue overthe set of allowed gossip algorithms on the graph. Thisminimization is shown to be a semi-definite program,which is a convex problem, and therefore can be solvedefficiently to obtain the global optimum.

The averaging time, Tave, s closely related to themixing time, Tmix,of the random walk defined bythe matrix that characterizes the algorithm. This meanswe can study also averaging algorithms by studyingthe mixing time of the corresponding random walk onthe graph. The recent work of Boyd et al [BDX03]shows that the ratio of the mixing times of the naturalrandom walk to the fastest-mixing random walk cangrow without bound as the number of nodes increases;correspondingly, therefore, the o ptimal averaging algo-rim can perform arbitrarily better than the one basedon the natural random walk. Thus, computing the op-timal averaging algorithm is important: however, thisinvolves solving a semi-definite program, which requiresa knowledge of the com plete topology. Surprisingly, wefind that we can exploit the problem structure to devise adistributed subgradient method to solve the semidefinite

'The theoretical framework developed in th i s paper is not merelyrestricted to averaging a l g o r i t h s . It easily extends to the computationof other functions which can be com puted via pair-wise operations:e.g.. the maximum, minimum or product functions. It can also beextended for analyzing information exchange algorithms, althoughthis extension is not as direct. For concreteness and for statingour results as precisely as possible, we shall consider averagingalgorithms in the rest of the paper.

program and obtain a near-optimal averaging algorithm.Finally, we study the perform ance of gossip algorithmson two network graphs which are very important inpractice: Geometric Random Graphs which are used tomodel wireless sensor networks, ilod the Internet graphunder the preferential connectivity model. We find thatfor geometric random graphs, the averaging time of

the natural is the same order as the optimal averagingalgorithm, which, as remarked earlier, need not be thecase in a genera1 graph.We shall state our main results after setting out somenotation and definitions in the next section.

A. Problenr Fonnutation and DejnirionsConsider a connected graph G = V,E) ,where thevertex set V contains n nodes and E is the edge set. The

it' component of the vector s(0) = [z1(0), .., xn(0)lTL i X i i 0 )represents the initial value at node i. Let zave=be the average of the entries of z(0) and the goal rs tocompute xave n a distributed and asynchronous manner.Asynchronous time model: Each node has aclock which ticks at the times of a rate 1 Poissonprocess. Thus, the inter-tick times at each node arerate 1 exponentials, independent across no des andover time. Equivalently, this corresponds to a singleclock ticking according to a rate n, oisson processat times Z k , k 2 1, where {Zr,+l- Zr,} are Iexponentials of rate 'n. Let I k E {1, ..., IZ} denoteh e node whose clock ticked at time Z k . Clearly,the J k are IID variables distributed uniformlyover { 1 , . . ,n } . We discretize time accordingto clock ticks since these are the on ly times atwhich the value of x +) changes, Therefore, theinterval [ Z k k + 1 ) denotes the k th time-slot and,on average, there are n clock ticks per unit ofabsolute time, Lemma 1 states a precise translationof clock ticks into absolute time.Synchronous time model: In the synchronous timemodel, time is assumed to be slotted commonlyacross nodes. In each time slot, each node contactsone of its neighbors independently and (not neces-sarily uniformly) at random . Note that in this modelall nodes com municate simultaneously, in contrastto the asynchronous model where only one nodecommunicates at a given time. On the other hand,in both models each node contacts only one othernode at a time.This paper uses the asynchronous time modelwh ereas previous work, notably that of [KSSVOObl,

1654


3/12

[KDG03], considers the syncfironous time m odel.The qualitative and quantitative conclusions areunaffected by the type of model; we choose theasynchronous time model for convenience.Algorithm A(P) :We consider a class of algo-rithms, denoted by A. An algorithm in this class ischaracterized by an n x n, matrix Y = (Pij]of non-negative entries with the condition that Pij > 0 onlyif ( i , j ) E. For technical reasons, we assume thatP is a stochastic matrix with its larsesl eigenvalueequal to 1 and at1 the remaining st?. - 1 eigenvaluesare strictly less than 1 in magnitude. (Such a matrixcan always be found if the underlying graph Gis connected and non-bipartite. We will assumethat the network graph G satisfies these conditionsfor the remainder of the paper.) The algorithmassociated with P, denoted by d P), s describedas follows:I n the kth time-slot, let node 2 s clock tick and letit contact some neighboring node j with probabilityPij. At this time both nodes set their values equalto the average of their current v alues. Formally, letr c l c ) denote the vector of values at the end of thetime-slot k . Then,

( I ; ) = W ( k ) X ( k- 1 ) : (1)where with probability APij(is the probabilitythat the i thnode's clock ticked and ij is the chancethat it contacted node j ) he random matrix W { k )is

where e; = [ O . . . 0 1 O . - - O ] * is an n x 1 unitvector with the i t h component equal to 1.Quantity of Interest: Our interest is in determiningthe time (number of clock ticks) i t takes for z ( k )to converge to s,l, where 1 is the vector of allones.Definition 1: For any 0 < e < 1, the E-averagingtime of an algorithm A (P ) s denoted by T,,, E,P )and equals

(3)where llvll denotes the 12 norm of the vector U .Thus the -averaging time is the sm allest number ofclock ticks it takes for x(-) o get within E of z,l

with high probability, regardless of the initial valueThe following lemma relates the number of clock ticksto absolute time.

Lemma I: For any 6 2 1, E[Zk]= k / .n . Further, for

4 0 ) .

a n y b > O ,

PmoJ By definition, E[ZkJ= Ci= lE (Z ,-Zj-11 = cj=l/ 7 7 = k / n . Equation (4) foIlows directlyfrom Cramer's Theorem (see [DZ99], pp. 30 35).

As a consequence of the Lemma 1 , for k 2 U ,k

with high probability (i.e.probability at least 1- /n2j.In this paper, all the results about e-averaging timesare at least n., Hence, dividing the quantities measuredi n t e r m of the number of clock ticks by 71 gives thecorresponding quantities when measured in absolute time(for an example, see Corollary 2).B. Previoiis Resu 1r s

A general lower bound for any graph G and anyaveraging algorithm was obtained in [KSSVOOa] in thesynchronou s setting. Their result is:771eorenz1: For any gossip algorithm on any graph

G and for 0 < E < 0.5, the 6-averaging time (insynchronou s steps) is lower bound ed by S1(logn).For a complete graph and a synchronous averagingalgorithm, [KDG03] obtain the following result.Theorevtz 2: For a complete graph, there exists a gos-sip algorithm such that the l/n-av eraging time of thealgorithm is 0 log 11) .The problem of (synchronous) fast distributed averag-ing on an arbitrary graph without the gossip constraint

is studied in [XB03]; here, W( t )= W for all t; .e., thesystem is completely deterministic. Distributed averag-ing has also been studied in the context of distributedload balancing ([RSW98]), where an analysis based onMarkov chains is used to obtain bounds on the timerequired to achieve averaging (upto the integer con-straint) upto a certain accuracy. However, each iterationis governed either by a constant stochastic matrix, or afixed sequence of matchings is considered. Some olherresults on distributed averaging can be found in [BS03],[Mur03], [LBFM], [OSMO4], [JLSO3J.

1655


4/12

Not much is known about good randomized gossipalgorithms for averaging on arbitrary graphs. The algo-rithm of [KDG03] is quite dependent on the fact that theunderlying graph is a complete graph, and the generalresult of KSSVOOa] is a non-constructive lower hound.

C. Our ResrrltsIn this paper, we design and characterize the perfor-mance of averaging algorithms for arbitrary graphs. Ourmain result is the following theorem, which w e shalllater (in Section IV) apply to specific types of graphsthat are of interest in applications.Tlreorein 3: The averaging time, Taye E,P), of thealgorithm A(P) s bounded as follows:

where

and D is the diagonal matrix with entriesR

j = 1Theorem 3 is proved in Section 11.

In Section 111 we show th t the problem of findingthe fastest averaging algorithm can be formulated as asemidefinite program (SDP). In general, it is not possibleto solve a sem idefinite program in a distributed fashion.However, we exploit the structure of the problem topropose a completely distributed algorithm that solvesthe optimization problem on the network, based on asubgradient method. The description of the algorithmand proof of convergence are found in Section 111-A.

Section elates averaging time of an algorithm ona graph G with the mixing time of an associated randomwalk on G , and uses this result to study applications ofour results in the context of two networks of practicalinterest: wireless networks, and the Internet.

11. PROOF OF T H E O R E MWe prove bounds ( 5 ) and (6) in Lemmas 2 and 3 onthe number of discrete times {or equivalently clock ticks)required to get within E of X a i J e l (analogous to ( 5 ) and

6)).

A . Upper Bound

n : (O ) , for I; 2 K * ( E ) ,L e m m 2: For algorithm A ( P ) , or any initial vector

where

Proufi Recall that under algorithm A ( P ) , rom 1 )and (21,

a k+ 1) = W ( k 4-1)2 6), (91where with probability P;j the random matrix W k) s

First note that W(k) are doubly stochastic matrices forall ( i , j ) .For doubly stochastic matrices, the vector k1is the eigenvector corresponding to the largest eigenvalue:L With this observation, and with our assumptions onP, it can be shown that x(k) - avel . ur interest isin finding out how fast i t converges. In particular, wewould like to obtain bounds on the error random vector. y ( Q

~ ( k ) ~ ( k ) - z a v e I . (11)Note that, p(k) 1 1 since ~ ( k ) ~ 10.

Consider the evolution of y ,):g ( k + 1) = ~ ( k1)- ave1(2 I Y ( k + l ) a ( k )- , , ,W(k)l== W k+ ) y ( k ) . W )W ( k+ 1) ~ ( k ) XaveI)

Here (a) follows from the fact that 1. is an eigenvectorfor all W ( k + 1). Thus g ( . ) evolves according to thesame linear system as a(.).To obtain probabilistic bounds on y k), we will firstcompute the second moment of y k) and then applyMarkovs inequality as below.

Computing W :Let,

w e EIW(0)] = E[kP-(k)]i,?

Then, the entries of IV are as follows:1656


5/12

where D is the diagonal matrix with entries

Note thal, if P = P T , then P is doubly stochastic.This impIies that Dj = 2, which in turn implies thatComputing Second Moment E [ ~ ~ ( k ) ~ p ( k ) ] :w = r(1 - l /n ) + P/n.

For each k , W k )= IVij with probability , so that

= W ( k ) . (17)Since this is true for each instance of the random matrixi ,

E[W(O>TW(O)J E[W(O)J= w (18)

Now, from 12),E [y(k + q T y ( + l ) ]= E [ y k ) T W k l ) T W ( k+ l ) y ( k ) j

= E [Y kITWY ( ; >= E [ y ( k ) T E [ W ( k ) T I . l f ( k t- I ) ly(k) ]y(k)J

(19)using (18), and the fact that the W k+ 1) are I(independent of ;v( k ) ) .

The matrix W is symmetric2 positive-semidefinite(since W = WTWj and hence it has non-negative realeigenvalues.As stated earlier. g ( k ) 1, which is the eigenvector

corresponding to the largest eigenvalue X I = 1 of W .So, from the variational characterization of the secondeigenvalue, we haveY V I W W 2 ( w y ( k > T y ( k ) .

q d k+ UTl/(k + 111 5 Xz(W)ElY[b)Ty(k)].C21)(20)

From (18) and (20),

'The symmetry of W does not depend on P being symmetric.

Recursive application of 21j yieldsJ%/(k)Tl/(k)l 5 ~ 2 ( v k y ( o ) T y ( o > . 122)

y(o)Ty(o) = z(0) z(0) - -n2 , ,Now,

T 2~ ( o ) T x ( o ) . ( 2 3 )

Application of Markov's Inequality:equality, we haveFrom 22), (23) and an application of Markov's in-

310 C1From 24, t follows that for k 2 K E ) &,

This proves the Lemma, and gives us an upper boundon the e-averaging time.

B. Lmver Boundlh : For algorithm A ( P ) , here exists an initialvector x(O), such that for k < K E ) ,

where

Proof:From (12) and ( lS) , we obtain

E [ y ( k ) ] = W k y ( 0 ) . (27)We have shown that W is a symmetric positive-semidefinite doubly stochastic matrix. W has (non-negalive real) eigenvalues

1 = XI(M') 2 X2(W) 2 . . 2 X,(W) 2 0:with corresponding orthonormal eigenvectors

1-il r2, u s , . .. zn.Choose

1657


6/12

For this choice of x(O), 11~(0)11= 1. Now from (271,(28)1

For this particuIar choice of x(O), we will Iower boundthe e-averaging time by lower bounding E[ l l y ( k )1 * ] andusing Lemma 4 as stated below,

E [ y ( k ) ] = - - x ; ( w ) V z .fi

By Jensen's inequality and (28),n n

with prob ability at least 1- 2 ~ .

(29)1- - pm/7 .- 2 2 ( 1L e t m a 4 : Let X be a random variable such that 0 5X 5 B . Then, for any 0 < E < B ,Proo$E [ X ] 5 cPr(X < e ) +B P r ( X )

= Pr(X 2 cj(B - ) + E .Rearranging terms gives us the lemma.

From (281, llg(k)1125 /)g(0)1(2 /2. Hence Lemma(4) and (29) imply that for I; < K , ( c )

Pr( l lv(W 1 4 > E (30)This completes the proof of Lemma 3.

The following corollaries are immediate.Corollary I: For large n and symmetric P,Tay,(,)is bounded as follows:

-roofi By definition, X 2 (W ) -1 -;(I - A2 P))). For large n, : l - X2(P j )is very small, and hence

1 1log (1 - -- 1 - X 2 ( P ) ) = -- 1- A Z P ) ) .LThis along with Theorem 3 completes the proof. HCorolla? 2: For a symmetric P , the absolute time,Z T * ~ , P ) ,t takes for T * ( c , P ) lock ticks to happen isgiven by

Pruufi For b = 2(1- Xa P ) and I = T*(c, ) andusing (31), the right hand"


7/12


8/12


9/12

an edge. So entry : of vk can be foun d using only valuesof U X . - ~ corresponding LO neighbours of node i , i . e . , hecomputation is distributed. The orthogonalize and scalesteps can be carried out dislributedly using the gossipalgorithm outlined in this paper, or just by distributedaveraging as described in [XB03] and used in CKM041.Observe that the very matrix W can be used for thedistributed averaging step, since it is also a probabilitymatrix. We state the following result (appiied to ourspecial case) from [KM04], which basically states that itis possible to compute the eigenvector upto an arbitraryaccuracy:

Lemma 5; If DecentralOI is run forR ( t ~ ~ ~ i ~og( lG/t) ) iterations, producing orthogonalvector U , then

where llu,- u11 is the L2 distance between U and theeigenspace of Az; ur is the vector in the eigenspaceachieving this distance.It is therefore clear that an approximate eigenvector,

and therefore an approximate subgradient can be com-puted distributedly.3) Convergence analysis: It now rem ains to show thatthe subgradient method converges despite approximation

errors in computation of the eigenvector, which spill overinto computation of the subgradient, To show this, wewill use a result from fKiw041 on the convergence ofapproximate subgradient methods.Given an optimization problem with objective functionf and feasible set S, the approximate subgradient methodgenerates a sequence {&}E1 S such that

(47)Ps(zk - v ), g k E a @),where PS is a projection on to the feasible set, U,,. > 0 isa stepsize, and

ka d & ) = {g : fs a ) 2 s ak)+ g: .z-x } - E , k VZ](448)is the ~k subdifferential of the objective function fs atXk.

Let Tk = ( 1 / 2 ) [ g k / v k , and = ~k + tk. Then wehave the following theorem from [KiwOM],Lemrrru 6; If vk = 00, then

lim inf j ( d ) f + 6 ,where 6 = h i up dk , and f * is the optimal value of theobjective function.

Consider the k-th iteration of the subgradient method,with current iterate p ( k ) , and let fi be the errorin the (approximate) eigenvector U corresponding toX2(IV(p(b))). (By error in the eigenvector, we meanthe L2 distance between U and the (actual) eigenspacecorresponding to A2) . Again, denote by U , the vector inthe eigenspace minimizing the distance to U , and denotethe exact subgradient computed from ur by gr..We have 1/71- ~ b , ) ~ E . First w e find Q in terms ofE as follows:

This implies,E k = s u P { g - y t - : Y - p ( ~ ) ) = clly-Lh1121P

where c is a scaling constant.Next, we will find llg - g,112 in terms of E as follows:

n

Now, Ihe lth component of g - r is

Combining the facts that /U;- u I 2 &: V i ; andvz, j ; w e get thesince 1 ~ 1 1 = 11 Iui - ujl 5following

Summ ing over dl na edges gives us llg-gr 11 5 8m / n 2 ,i.e., fk 5 8cmc/n2 5 8cc since m 5 la2 for all graphs.Now choose Vk = l / k . From 39, t can be seen

that )1gk1i2 is bounded above by /n, and so ~k inTheorem 6 converges to 0. Therefore if in each iterationi, the eigenvector is computed to within an error of ~ i ,and E = lim i n f e i , we have the foItowing result:

Tlzeoretn 4: The distributed subgradient method de-scribed above converges to a distribution p for whichX 2 ( W ( p ) ) is within 8cmt/,tz2(s SCE) of the globallyoptimal value X 2 W j * ) ) .

1661


10/12

IV. A P P L I C A T I O N SIn this section, we briefly discuss applications of ourresults in the context of w ireless ad-hoc networks and theInternet. We examine how the performance of averagingalgorithms scales with the size (in terms of the numberof nodes) of the network.Before we study this, we need the following result,relating the averaging time of an algorithm A(P) andthe mixing time of the Markov chain on G that evolvesaccording to IV = W(P ) . Since IV is a positive-semidefinite doubly stochastic matrix, the Markov chainwith transition matrix W has uniform equilibrium distri-bution.)Recdl that the mixing time is defined as follows:Definition 2 (Mixing Emel: For a Markov chainwith symmetric transition matrix W . let Ai(t) =cyL,W;j - Then, the +mixing time is definedTmix(E) = supinf{f : Ai(t')5 , if' 2 } . 49)

We have the following relation between mixing timesand averaging limes, the proof of which can be found in[BGPSW].Theorem 5: For a symmetric matrix P , the E -averaging time (in terms of absolute time) of the gossip

algorithm A(P) is related to the mixing time of theMarkov chain with transition matrix P as

.

as2

Tave ( E ,P ) = 8 [logn -t mix t)) .Figure 2 is a pictorial description of Theorem 5.The z-axis denotes mixing time and the y-axis denotesaveraging lime. The scale on the axis is in order notation.As shown in the figure, for P such that T&(P) =

o(logn), T,,, (; ,E ') @( l ogn ) ; for P such thatknowing mixing property of random walk essentiallycharacterizes the averaging time in the order sense.Tmix(P) = ot(l0gn) , T a w(i,) = @ Tmix).

A. Wireless NehworkThe Geometric Random Graph, introduced by Guptaand Kumar [GKOO], has been used successfully to modelad-hoc wireless networks. A d-dimensional GeometricRandom Graph on n nodes, modeling wireless ad-hocnetworks of ?z nodes with wireless transmission radius

T , is denoted as Gd(n : r ) ,nd is obtained as follows:place n nodes on a d dimensional unit c ube uniformlyat random and connect any two nodes that are within dis-tance T of each other. An examp le of a two dimensionalgraph, G2 n , ) is shown in the Figure3.

71

log T t

Tmixloglogn logn

Fig.2 . Graphical interpretation of Theorem 5.

The following is a well-known result aboutthe connectivity of Gd(n , r ) (for a proof, see[GKOO], [GMPSWI, Pen031):

Lemma 7: For n rd 3 2logn, the G(n , r ) s con-nected with probability at least 1- l /n2 .Theorem 6: On the Geometric Random Graph,G d ( n 7 ~ ) ,he absolute l/na-averaging time, QI > 0, of

the optimal averaging algorithm is oPrmJ In [BGPSM], the authors show that for E =

l /nQ,cy > 0 the -mixing times for the fastest-mixingrandom walk on the geometric random graph Gd ( n , r )is of order I(?).herefore, using this and the resultsof Corollaries 1 and 2, we have the theorem.

Thus, in wireless sensor networks with a small radiusof communication, distributed computing is necessarilyslow, since the fastest averaging alg orih m is itself slow.However, consider the natural averaging algorithm, basedon the natural random walk, which can be describedas follows: each node, when it becomes active, choosesone of its neighbors uniformly at random and averagesits value with the chosen neighbor. As noted before, ingeneral, the performance of such an algorithm can befar worse than the optimal algorithm. Interestingly, inthe case of Gd n:r) ,he performances of the naturalaveraging algorithm and the optimal averaging algorithmare comparable (i.e. they have averaging times of thesame order). We state the following Theorem, whichis obtained exactly the same way as Theorem 6 , usinga result on Tmix for the natural random walk from[BGPSOS]:

Theorem 7: On Lhe Geometric Random Graph,

1662


11/12

G d ( n , r ) , he absolute l/n"-averaging time, cy > 0, ofthe natural averaging algorithm is of the same order asthe optimal averaging dgorithm, i.e., 0(9).Implication. In a wireless sensor network, Theorem 6suggests that for a small radius of Iransmission, even thefastest averaging algorithm converges slow ly; however,the good news is that the natural averaging algorithm,based only on local information, scales just as well asthe fastest averaging algorithm. Thus, at least in theorder sense, it is not necessary to optimize for the fastestaveraging algorithm in a w ireless sensor network.B. Inllremet

The Preferenual Connectivity (PC) model [MPS031 isone of the popular models for h e Internet. In [MPS03],it is shown that the Internet is an expander under thepreferential connectivity model. This means that thereexists a positive constant 5 > 0 (indcpendent of thesize of the graph:), such that for the transition matrixcorresponding to the natural random walk, calI it P,

l-Arnax(f')) 5 1 , (50)where Amax P)s the second largest eigenvalue of Pin magnitude, i .e. , the spectral gap is bounded awayfrom zero by a constant. Let.P' be the transition matrixcorresponding to the fastest mixing random walk on theh temet graph under the PC model. The random walkcorresponding to P* must mix at least as fast as thenatural one, and therefore,

It is easy to argue that there exists an optimal P' that issymmetric (given any optimal PO, he matrix 1/2(Po+P r ) is symmetric, and leads to the same E[W] s Po).Therefore, from t50), t51), Theorem 3 and Corollary 2,we obtain the following Theorem.

Theorern 8: Under the PC model, the optimal averag-ing algorithm on the Internet has an absolute -averagingtime TaVe(e)Q (loge-l).Implication. The absolute time for distributed compu-tation on the Intemet is independent of the size of thenetwork, and depends only on the desired accuracy ofthe computation3. One implication is that exchanginginformation on Internet via peer-to-peer network builton tap of it is extrem ely fast

'Althought the asymmetry of the P matrix for the natural randomwalk on the Internet prevents us from exactly quantifying the aver-aging time. we believe that averaging will be fast even under thenatural random walk. since the spectra1 gap for this random walk isbounded away from 1 by a constant.

1

0 . 1Fig. 3. An example of a Geometric Random G raph intwo-dimensions. A node is connected to all other nodesthat are within distance of itself.

V. C O N C L U S I O NWe presented a framework for the design and analy-sis of a randomized asynchronous distributed averagingalgorithm on an arbitrary connected network. We charac-terized the perform ance of the afgorithm precisely in theterms of second largest eigenvalue of an appropriate dau-bly stochastic matrix. Th is allowed us to find the fastestaveraging of this class of algorithms, by establishing thecorresponding optimization problem to be convex. Weestablished a tight relation between the averaging timeof the dgorithm and the mixing time of an associatedrandom wdk, and utilized this connection to design

fast averaging algorithms for two popular and well-studied networks: WireIess Sensor Networks (modeledas Geometric Random Graphs), and the Internet graph(under the so-called Preferential Conn ectivity Model}. Inthese models, we find that the natural algorithm is as fastas the optimal algorithm.

In general, solving semidefinite programs in a dis-tributed manner is not possible. However, we utilized thestructure of the problem in order to solve the semidef-inite program (corresponding to the optima1 averagingalgorithm) in a distributed fashion using the subgradientmethod. This allows for self-tuning weights: that is,the network can start out with some arbitrary averagingmatrix, say, one derived from the natural random walk,and then locally, without any central coordination, COR-verge to the optimal weights corresponding to the fastestaveraging algorithm.

The framework developed in this paper is general andcan be utilized for the purpose of design and analysis ofdistributed algorithms in many other settings.A C K N O W L E D G M E N T

D. Shah thanks Bob G a l l a g r for his suggestions.1663


12/12

[BDX03]

[BGPSO4]

[BGPSOS]

[BLOO]

[BS03]

[BVOS]

ICla901[DZ99][EGHKW]

[GKOO]

IGMPSO41

[GvRBOll

[HHL88]

[HUL93]

[ EGH021

[JLSO3]

[KDGO?]

[KEWOZ]

[ K i d 4 1

[KK02]

RE F E RE NCE SS. Boyd. P. Diaconis, and L. Xiao. Fastest mixingMarkov chain on a graph. Submitted to SIAM Review.problems and techniques section. February 2003. Avail-able a t www.stanford.edu/'boyd/fmmc .html.S . Boyd, A . Ghosh. B. Prabhakar. and D. Shah. Analysisand oDtirnization of randomized gossip algorithms. In

Proc. 43st IEEE Svfnp. on Foundations of ConputerScience. 2002.D. Kempe, J. Kleinberg. and A. Demers. Spatial gossipand resource location protocols. In Proc. 33rd ACMSymnp. of1 Theory of Computing. 2001.D.Kempe and Frank McSherry. A decentralized algo-rithm for spectral analaysis. In Symposiurii on 7 h e o vof Coniputing. ACM, 2001.

[KSSVOOa] R . Karp. C. Schindelhauer. S. Shenker. and B.Vocking.Randomized rumor spreading. In Proc. 41st IEEE Synp.oil Fouridorions of Cornputer Science. 2000.[KSSVOOb] R. Karp, C . Schindelhauer. S. Shenker. and B. Vcking.Randomized rumor spreading. In Proc. Synposiuiii anFoundalions of Conrputer Science. IEEE. 2000.strategies for groups of mobile autonomous agents.49(4):622429. April 2004.[Lew961 A . S. Lewis. Convex analysis on the Hem itian matrices.SIAM Journal oi i Uptiniizatiori. 6 : 1 6 6 1 7 7 . 1996.[Lew99] A . S. Lewis. Nonsmooth analysis of eigenvalues.Marhematical Prugrurtoning.84 -24, 1999.[LO961 A. S . Lewis and M. L. Overton. Eigenvalue o p t i i z a -tion. Acta Nunierica. 5:149-190, 1996.

$fhIHH02] S. Madden. M. Franklin, J Hellerstein. and W. Hong.

KKDOl]

KM04- _ -

Proc. -004 lEEE CDC, 004.S. Boyd. .4. Ghosh. B. Prabhakar. and D. hah. Mixingtimes of random walks on geometric random graphs.Proc. SIAM W A L C O . 2005.J. M. Bomein and A . S . Lewis. Convex. Analysismid Nonlinear Oprimizalion, T h e o q and G n n p l e s .Springer-Verlag, New York. 2000.R. W. Beard and V. Stepanyan. Synchronization of

in distributed multiple vehicle coordinatedcontrol. In Proceedings o f IEEE Conference OF? ecisioiiand Conrrol. December 2003.S. Boyd and L. Vandenberghe. Convex Opfiffrizafiofi .Cambridge University Press. 2003. Available athttp://www.stanford.edu/ boyd/cvxbook.htE H. Clarke. 0prhi:ation und Nonsmooth Analysis.

Canadian ~ ~ t h ~ ~ ~ t i ~ ~ lociety Books in MathemticS. [LBF04] 2. Lin. M. rouke. and B. Francis. Local COnlrol

SIAM. Philadelphia. 1990.A . Dembo and 0. Zeitouni. Large Devialioizs Tech-niques and Applicari;ons. Springer. 1999.D. Estrin, R. Govindan. J. Heidemann. and S. Kumar.Next century challenges: S calable coordination in sensornetworks. In Proc. 5th Intl. Cong on Mobile Computingand Networking. 1999.P. Gupta and P. R. Kumar. The capacity of wirelessnetworks. IEEE Transactions on Infonnafioii Theory.46(2):383&04. March 2000.A . El Gamal. 1. Mammen, E. Prabhakar, and D. hah.Throughput-deiay trade-off in wireless networks. hProc. 2004 INFOCOM. 2004.I. Gupta. R . van Renesse, and K. Birman. Scalable fault-tderant aggregation in large process groups. In Proc.Con$ on Dependable Systems and Neiworkr, 2001.S . Hedetniemi. S. Hedetniemi, and A . Liestman. Asurvey of gossiping and broadcasting in communicationnetworks. Networks. 18:319-349, 1988.J.-B. Hiriart-Urruty and C. Lemarkchal. ConvexAnalysisand Minirnizutian Algori thm. Springer-Verlag, Be rlin,1993.C. Intanagonwiwat. D.Estrm, R. Govindan. and J Hei-demann. lmpact of netowrk density on data aggregationin wireless sensor networks. In Pmc. lntl. Con onDistributed Computing Systems, 2002.A. Jadbabaie, J. Lin, and A. S.Morse. Coordinationof groups of mobile autonamous agents using nearestneighbor rules. IEEE Transactions on A ut onu t i c Con-fro/. 488(6):988-1001, une 2003.D. Kempe, A. Dobra. and J . Gehrke. Gossip-based com-putation of aggregate information. In Pruc. Conferenceon Folrndariuns of Cornputer Science. IEEE , 2003.B. Krishnamachari, D. strin, and S. Wicker. The impactof data aggregation in wireless sensor networks. In h t l .Workshop of Distributed Evenr Based Systems, 2002.K. Kiwiel. Convergence of approximate and incrementalsubgradient methods for convex optimization. SIAMJournal on Optimization, 14(3):807-840. 2004.D. Kempe and J . Kleinberg. Protocols and impossibilityresults for gossip-based communication mechanisms. In

[MPSO3]

[ M m 3 1

[OSM04]

[Ove92][OW931

[Pen031[RFH+011

[RSW9P]

Tag: A tiny aggregation service for ad-hoc sensor net-works. In Proc. 5th Symp.on Operaring Systems Designwid inplementution. 2002.M. Mihail. C. Papadimitriou. and A. Saberi. Intemet isan exapnder. [n h o c . con$ on Fouizdations ofCormipulerScience. 2003.L. Mureau. Leaderless coordination via bidirectionaland unidirectional time-dependent communication. InProceedings of IEEE Corlference on Decision and Cort-tml, December 2003.R. Olfati-Saber and R.M . Murray. Consensus problemsin networks of agents with switching topology andtime-delays. lEEE Transactions on Auromaric Control,49(9):1520-1533, September 2004.M. L. Overton. Large-scale optimization of eigenvalues.SIAM Journal on Optinriafion. 2:88-120, 1992.M. L. Overton and R . S . Wamersley. Optimalityconditions and duality theory for minimizing sums ofthe Lqest eigenvalues of symmetric matrices. Mathe-m i c d Programming. 62:321-357. 1993.M. Penrose. Random geometric graphs. Oxford studiesin probability. Oxford University Press, Oxford, 2003.S. Ratnasamy, P. Francis, M. Handley, R. Karp, andS. Shenker. A scalable content-addressable network. InProc.of the ACM SIGCOMM Conference. 2001.Y.Rabani, A. Sinclair, and R. Wanka. Local divergenceof Markov chains and the analysis of iterative load-balancing schemes. I n Proc. Conference on Foufiddioirsof Compucer Science. E E E , 1998.

[SMK'OI] 11 Stoica, R. Moms, D. Karger. E Kaashoek. andH, Balaknshnan, Chord: A scalable peer-to-peer lookupservice for internet applications. In Procuf flie ACMSIGCOMM Conference, 2001.R. van Renesse. Scalable and secure resource location.In 33rd Hawaii Intl. Con$ on Sysreriz Sciences. 2000.H. Wolkowicz, R. Saigal. and L. Vandenberghe. Hand-book of Semidefinite Progranuning, Theory, Alg orit hm ,and Applications. Kluwer Academic Publishers. 2000.L. Xiao and S . Boyd. Fast linear iterations for distributedaveraging. In Proc. 2003 Conference 011 Decision andConfro/. December 2003.

[vROO][WSVOO]

[XB03]
http://www.stanford.edu/%22boyd/cvxbook.hthttp://www.stanford.edu/%22boyd/cvxbook.ht

Gossip Algorithm

Documents