Top Banner
880 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010 Convergence Rates of Distributed Average Consensus With Stochastic Link Failures Stacy Patterson, Bassam Bamieh, Fellow, IEEE, and Amr El Abbadi, Senior Member, IEEE Abstract—We consider a distributed average consensus algo- rithm over a network in which communication links fail with independent probability. In such stochastic networks, convergence is defined in terms of the variance of deviation from average. We first show how the problem can be recast as a linear system with multiplicative random inputs which model link failures. We then use our formulation to derive recursion equations for the second order statistics of the deviation from average in networks with and without additive noise. We give expressions for the convergence behavior in the asymptotic limits of small failure probability and large networks. We also present simulation-free methods for computing the second order statistics in each network model and use these methods to study the behavior of various network examples as a function of link failure probability. Index Terms—Distributed systems, gossip protocols, multiplica- tive noise, packet loss, randomized consensus. W E study the distributed average consensus problem over a network with stochastic link failures. Each node has some initial value and the goal is for all nodes to reach consensus at the average of all values using only communication between neighbors in the network graph. Distributed average consensus is an important problem that has been studied in contexts such as vehicle formations [1]–[3], aggregation in sensor networks and peer-to-peer networks [4], load balancing in parallel processors [5], [6], and gossip algorithms [7], [8]. Distributed consensus algorithms have been widely inves- tigated in networks with static topologies, where it has been shown that the convergence rate depends on the second smallest eigenvalue of the Laplacian of the communication graph [9], [10]. However, the assumption that a network topology is static, i.e. that communication links are fixed and reliable throughout the execution of the algorithm, is not always realistic. In mobile networks, the network topology changes as the agents change position, and therefore the set of nodes with which each node can communicate may be time-varying. In sensor networks and mobile ad-hoc networks, messages may be lost due to interfer- ence, and in wired networks, networks may suffer from packet Manuscript received December 19, 2008; revised June 18, 2009. First published February 02, 2010; current version published April 02, 2010. This work was supported in part by NSF Grant IIS 02-23022 and NSF Grant ECCS-0802008. Recommended by Associate Editor M. Egerstedt. S. Patterson and B. Bamieh are with the Department of Mechanical En- gineering, University of California, Santa Barbara, CA 93106 USA (e-mail: [email protected]; [email protected]). A. El Abbadi is with the Department of Computer Science, University of California, Santa Barbara, CA 93106 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2010.2041998 loss and buffer overflow. In scenarios such as these, it is desir- able to quantify the effects that topology changes and commu- nication failures have upon the performance of the averaging algorithm. In this work, we consider a network with an underlying topology that is an arbitrary, connected, undirected graph where links fails with independent but not necessarily identical probability. In such stochastic networks, we define conver- gence in terms of the variance of deviation from average. We show that the averaging problem can be formulated as a linear system with multiplicative noise and use our formulation to derive a recursion equation for the second order statistics of the deviation from average. We also give expressions for the mean square convergence rate in the asymptotic limits of small failure probability and large networks. Additionally, we consider the scenario where node values are perturbed by additive noise. This formulation can be used to model load balancing algorithms in peer-to-peer networks or parallel processing systems, where the additive perturbations represent file insertions and deletions or job creations and com- pletions, with the goal of equilibrating the load amongst the par- ticipants. A measure of the performance of the averaging algo- rithm in this scenario is not how quickly node values converge to the average, but rather how close the node values remain to each other, and therefore to the average of all values as this average changes over time. This problem has been previously studied in networks without communication failures [10], [11], however we are unaware of any existing work that addresses this problem in networks with communication failures. We show how our formulation for static-valued networks can be extended to incorporate the additive perturbations and give an expression for the steady-state deviation from average. Finally, for both problem formulations, we present simulation-free methods for computing the second order statistics of the variance of the de- viation from average, and we use these methods to study the be- havior of various network examples as a function of link failure probability. Although there has been work that gives conditions for con- vergence with communication failures, to our knowledge, this is the first work that quantifies the effects of stochastic com- munication failures on the performance of the distributed av- erage consensus algorithm. We briefly review some of the re- lated work below. Related Work: The distributed consensus problem has been studied in switching networks, where convergence is defined in a deterministic sense. The works by Jadbabaie et al. [1] and Xiao and Boyd [12] show that in undirected, switching communica- tion networks, convergence is guaranteed if there is an infin- itely occurring, contiguous sequence of bounded time intervals 0018-9286/$26.00 © 2010 IEEE
13

880 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL ......880 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010 Convergence Rates of Distributed Average Consensus With

Jan 26, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 880 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010

    Convergence Rates of Distributed AverageConsensus With Stochastic Link Failures

    Stacy Patterson, Bassam Bamieh, Fellow, IEEE, and Amr El Abbadi, Senior Member, IEEE

    Abstract—We consider a distributed average consensus algo-rithm over a network in which communication links fail withindependent probability. In such stochastic networks, convergenceis defined in terms of the variance of deviation from average. Wefirst show how the problem can be recast as a linear system withmultiplicative random inputs which model link failures. We thenuse our formulation to derive recursion equations for the secondorder statistics of the deviation from average in networks with andwithout additive noise. We give expressions for the convergencebehavior in the asymptotic limits of small failure probabilityand large networks. We also present simulation-free methodsfor computing the second order statistics in each network modeland use these methods to study the behavior of various networkexamples as a function of link failure probability.

    Index Terms—Distributed systems, gossip protocols, multiplica-tive noise, packet loss, randomized consensus.

    W E study the distributed average consensus problem overa network with stochastic link failures. Each node hassome initial value and the goal is for all nodes to reach consensusat the average of all values using only communication betweenneighbors in the network graph. Distributed average consensusis an important problem that has been studied in contexts such asvehicle formations [1]–[3], aggregation in sensor networks andpeer-to-peer networks [4], load balancing in parallel processors[5], [6], and gossip algorithms [7], [8].

    Distributed consensus algorithms have been widely inves-tigated in networks with static topologies, where it has beenshown that the convergence rate depends on the second smallesteigenvalue of the Laplacian of the communication graph [9],[10]. However, the assumption that a network topology is static,i.e. that communication links are fixed and reliable throughoutthe execution of the algorithm, is not always realistic. In mobilenetworks, the network topology changes as the agents changeposition, and therefore the set of nodes with which each nodecan communicate may be time-varying. In sensor networks andmobile ad-hoc networks, messages may be lost due to interfer-ence, and in wired networks, networks may suffer from packet

    Manuscript received December 19, 2008; revised June 18, 2009. Firstpublished February 02, 2010; current version published April 02, 2010. Thiswork was supported in part by NSF Grant IIS 02-23022 and NSF GrantECCS-0802008. Recommended by Associate Editor M. Egerstedt.

    S. Patterson and B. Bamieh are with the Department of Mechanical En-gineering, University of California, Santa Barbara, CA 93106 USA (e-mail:[email protected]; [email protected]).

    A. El Abbadi is with the Department of Computer Science, University ofCalifornia, Santa Barbara, CA 93106 USA (e-mail: [email protected]).

    Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TAC.2010.2041998

    loss and buffer overflow. In scenarios such as these, it is desir-able to quantify the effects that topology changes and commu-nication failures have upon the performance of the averagingalgorithm.

    In this work, we consider a network with an underlyingtopology that is an arbitrary, connected, undirected graphwhere links fails with independent but not necessarily identicalprobability. In such stochastic networks, we define conver-gence in terms of the variance of deviation from average. Weshow that the averaging problem can be formulated as a linearsystem with multiplicative noise and use our formulation toderive a recursion equation for the second order statistics ofthe deviation from average. We also give expressions for themean square convergence rate in the asymptotic limits of smallfailure probability and large networks.

    Additionally, we consider the scenario where node values areperturbed by additive noise. This formulation can be used tomodel load balancing algorithms in peer-to-peer networks orparallel processing systems, where the additive perturbationsrepresent file insertions and deletions or job creations and com-pletions, with the goal of equilibrating the load amongst the par-ticipants. A measure of the performance of the averaging algo-rithm in this scenario is not how quickly node values convergeto the average, but rather how close the node values remain toeach other, and therefore to the average of all values as thisaverage changes over time. This problem has been previouslystudied in networks without communication failures [10], [11],however we are unaware of any existing work that addressesthis problem in networks with communication failures. We showhow our formulation for static-valued networks can be extendedto incorporate the additive perturbations and give an expressionfor the steady-state deviation from average. Finally, for bothproblem formulations, we present simulation-free methods forcomputing the second order statistics of the variance of the de-viation from average, and we use these methods to study the be-havior of various network examples as a function of link failureprobability.

    Although there has been work that gives conditions for con-vergence with communication failures, to our knowledge, thisis the first work that quantifies the effects of stochastic com-munication failures on the performance of the distributed av-erage consensus algorithm. We briefly review some of the re-lated work below.

    Related Work: The distributed consensus problem has beenstudied in switching networks, where convergence is defined ina deterministic sense. The works by Jadbabaie et al. [1] and Xiaoand Boyd [12] show that in undirected, switching communica-tion networks, convergence is guaranteed if there is an infin-itely occurring, contiguous sequence of bounded time intervals

    0018-9286/$26.00 © 2010 IEEE

  • PATTERSON et al.: CONVERGENCE RATES OF DISTRIBUTED AVERAGE CONSENSUS 881

    in which the network is jointly connected. The same conditionalso guarantees convergence in directed networks, as shown byOlfati-Saber and Murray [2] and Moreau [3]. Cao et al. [13]identify a similar convergence condition for consensus in di-rected networks based on an infinitely occurring sequence ofjointly rooted graphs. Recent works have also studied the con-vergence rates of averaging algorithms in switching networks.In [14], Olshevsky and Tsitsiklis give upper and lower boundson the convergence rate in a directed network in terms of thelength of the bounded time interval of joint connectivity, and in[13], Cao et al. give bounds on the convergence rate in terms oflength of the interval of connectivity of the rooted graph.

    Convergence conditions for the distributed averaging algo-rithm have also been investigated in stochastic networks. In[15], Hatano and Mesbahi study the Erdős-Rényi random graphmodel where each edge fails with identical probability. Theauthors use analysis of the expected Laplacian to prove thatnodes converge to consensus with probability 1. The work byWu [16] considers a more general directed random graph modelwhere edge failure probabilities are not necessarily identicaland proves convergence in probability. In [17], Porfiri andStilwell study a similar model, a random directed graph whereeach edge fails with independent non-uniform probability, butadditionally where edges are weighted. The authors also useanalysis based on the expected Laplacian to show that, in thecase where edge weights are non-negative, if the expected graphis strongly connected, the system converges asymptoticallyto consensus almost surely. For arbitrary weights, the authorsshow that asymptotic almost-sure convergence is guaranteedif the network topology changes “sufficiently fast enough”.Tahbaz-Salehi and Jadbabaie [18] consider directed networkswhere the weight matrices are stochastic i.i.d and give a neces-sary and sufficient condition for almost sure convergence basedon the second largest eigenvalue of expected weight matrix. In[19], Kar and Moura give sufficient conditions for mean squareconvergence in undirected networks with non-uniform linkfailure probabilities based on the second largest eigenvalue ofthe expected weight matrix. Additionally, our recent work [20]also gives sufficient conditions for mean square convergence inundirected networks where links fail with uniform probability.The analysis depends on reformulating the problem as a struc-tured stochastic uncertainty problem and deriving conditionsfor convergence based on the nominal component. We alsonote that in [21], Kar and Moura study averaging algorithmsover a network with stochastic communication failures wherecommunication links are also corrupted by additive noise. Inorder to achieve consensus in such a model, the weight of eachedge is decreased as the algorithm executes. This problem issimilar to the averaging algorithm with additive noise that isdescribed in this paper. However, in this work, we consideradditive perturbations at the nodes as opposed to the commu-nication channels, and we consider algorithms where the edgeweights remain constant.

    The remainder of this paper is organized as follows. In Sec-tion I, we formally define our system model and the distributedconsensus algorithm. Section II gives our convergence resultsfor systems with no additive noise, and in Section III, we givean extension of the model and convergence results for networks

    with additive noise. In Section IV, we describe our computa-tional methods, and in Section V, we present computational re-sults for different network scenarios. Finally, we conclude inSection VI.

    I. PROBLEM FORMULATION

    We model the network as a connected, undirected graphwhere is the set of nodes, with , and is the

    set of communication links between them, with . Weassume that each link has an independent, but notnecessarily identical probability of failing in each round.If a link fails, no communication takes place across the link ineither direction in that round. A link that does not fail in round

    is active. The neighbor set of node , denoted by forround , is the set of nodes with which node has active com-munication links in round .

    We consider the following simple distributed consensus al-gorithm. Every node has an initial value . The objec-tive of the algorithm is to converge to an equilibrium where

    for all . In each round, eachnode sends a fraction of its current value to each neighbor withwhich it has an active communication link. Each node’s valueis updated according to the following rule:

    where is the parameter that defines an instance of the algo-rithm. This algorithm can be implemented without any a prioriknowledge of link failures.

    In a network with no communication failures, this algorithmcan be expressed as an matrix, , where

    is the Laplacian matrix1 of the graph . The evolution of thesystem is described by the following recursion equation:

    (1)

    It is a well known result that the system converges to consensusat the average of all node values if and only if the magnitude ofthe second largest eigenvalue of , , is strictly less than 1,and that if the graph is connected, it is always possible to chosea that guarantees convergence [1], [9], [10], [22], [23]. In thiswork, we place no restriction on the choice of other than thatthe resulting matrix is such that . The diagonalentries of may be negative.

    We now demonstrate how (1) can be extended to include sto-chastic communication failures. We note that a similar modelfor communication failures in directed graphs is given in [24].Let be the -vector with the ’th entry equal to 1, the ’thentry equal to 1 and all other entries equal to 0. is de-fined as

    (2)

    1Let � be the adjacency matrix of � and � be the diagonal matrix with thediagonal entry in row � equal to the degree of node � . Then the Laplacian of agraph � is defined as � �� � � � .

  • 882 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010

    The system can then be described by the following recursionequation

    (3)

    where is a Bernoulli random variable with

    with probabilitywith probability .

    When , the edge has failed. One can interpret(3) as first performing the algorithm on the full underlying net-work graph and then simulating the failed edges by undoingthe effects of communication over those edges. In essence, each

    matrix returns the values sent across edge , yieldingthe state in which edge was not active.

    We rewrite (3) in a form that is more convenient for ouranalysis using zero-mean random variables. Let

    and observe that they are zero mean. The dy-namics can now be rewritten as

    (4)

    where .We measure how far the current state of the system is from

    the average of all states using the deviation from average vectorwhose components are

    The entire vector can be written as the projection

    with , where is the -vector with allentries equal to 1.

    We are primarily interested in characterizing the convergencerate of to zero. Since the dynamics of and are stochastic,we use the decay rate of the worst-case variance of deviationfrom average of each node , , as an indicator of therate of convergence.

    Problem Statement 1: Consider a distributed consensus algo-rithm over a connected, undirected graph where each link failswith independent probability as modeled by the system withmultiplicative noise (3). For a given set of link failure probabil-ities, determine the worst-case rate (over all initial conditions,over all nodes) at which the deviation from average ,

    , converges to 0 as .The key to addressing this problem is to study the equations

    governing the second order statistics of the states of (4). To thisend, we define the autocorrelation matrices of and by

    and note that they are related by the projection

    The variance of the deviation from average of each node ,, is given by the diagonal entry of the row of ,

    and the total deviation from average is given by the trace of ,.

    It is well known that the autocorrelation matrix of a system inthe form of (4) with zero-mean multiplicative noise [25] obeysthe following recursion equation:

    (5)

    where . This is a discrete-time Lya-punov-like matrix difference equation. However, the additionalterms multiplying make this a nonstandard Lyapunov re-

    cursion. The matrix satisfies a similar recursion relationwhich we derive in the next section and then study its conver-gence properties.

    II. CHARACTERIZING CONVERGENCE

    In this section, we first derive a recursion equation for ,the autocorrelation of , which has the variance ofdeviation from average of each node as its diagonal entries. Wethen characterize the decay rate of these variances in terms ofthe eigenvalues of a Lyapunov-like matrix-valued operator. Anexact computational procedure for these eigenvalues is given inSection IV, while in this section, we give expressions for theasymptotic cases of small, uniform link failure probability andlarge network size .

    Lemma 2.1: The matrices satisfy the recursion

    (6)

    Proof: First, we note that the following equalities hold forthe action of on any of the matrices

    where the second equality follows from for anyedge . Similarly, . We also note that t ,and consequently and , commute with the projection . Thisfollows from the fact that is both a left and a right eigenvectorof . Using these facts and noting that , (6) follows

  • PATTERSON et al.: CONVERGENCE RATES OF DISTRIBUTED AVERAGE CONSENSUS 883

    from multiplying both sides of (5) by as follows:

    If all edges have an equal probability of failure in eachround, we can derive a simpler form of the recursion equationfor .

    Corollary 2.2: If each edge fails with uniform probability ,the matrices satisfy the recursion

    (7)

    where .Proof: Note that from the definitions of the matrices

    , their sum is proportional to the graph’s Laplacian, i.e.. is then simply

    Additionally, note that for all .Therefore, for uniform failure probability , (6) simplifies as

    follows:

    A. The Decay Rate

    To study the decay or growth properties of the matrix se-quence , we define the Lyapunov-like operator

    (8)The linear matrix recursion (7) can now be written as

    (9)

    Since this is a linear matrix equation, the condition for asymp-totic decay of each entry of is , where isthe spectral radius of , which we call the decay factor of thealgorithm instance. Since each entry of has the asymp-totic bound of a constant times , then so does its trace andconsequently . And, in fact, it can be shown that thisupper bound on the decay rate is tight.

    We summarize these results in the following theorem.Theorem 2.3: Consider a distributed consensus algorithm

    where links fail with independent probability as modeledby the system with multiplicative noise

    where are Bernoulli random variables with

    with probabilitywith probability .

    1) The total deviation from average converges to0 as if and only if

    where is the matrix-valued operator defined in (8).2) The worst-case asymptotic growth (over all initial condi-

    tions) of any , is given by

    where is a constant. This upper bound is tight.Proof: As is a matrix-valued linear recursion, it is well

    known that the decay rate of each entry of is proportionalto the spectral radius of , and this is true for all initial condi-tions . What remains to be shown is that this worst-casedecay rate holds when is restricted to be a covariance ma-trix, or equivalently, when is positive semidefinite. Theproof of this is given in the Appendix.

    Note that in the case that links do not fail, whenfor all , we have

    and is precisely , the square of the eigenvalueof with the second largest modulus, as is well known. How-ever, when failures occur with non-zero probability, the addi-tional terms in the operator play a role. The operator is nolonger a pure Lyapunov operator of the form butrather a sum of such terms. Thus, one does not expect a simple

  • 884 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010

    relationship between the eigenvalues of and those of the con-stitutive matrices as in the pure Lyapunov operator case.

    B. Perturbation Analysis

    One important asymptotic case is that of small, uniform linkfailure probability . We can analyze this case by doing a firstorder eigenvalue perturbation analysis of the operator in (8)as a function of the parameter . We first recall the basic setupfrom analytic perturbation theory for eigenvalues of symmetricoperators [26].

    Consider a symmetric, matrix-valued function of areal parameter and matrix of the form

    Let and be an eigenvalue-eigenmatrix pair ofas varies, i.e.

    It is a standard result of spectral perturbation theory that forisolated eigenvalues of , the functions and are welldefined and analytic in some neighborhood . Thepower series expansion of is

    where is an eigenvalue of . The calculation of the coeffi-cient involves the corresponding eigenmatrix of and isgiven by

    (10)

    Note that we are dealing with matrix-valued operators on ma-trices, and the inner product on matrices is given by

    .In order to apply this procedure to the operator in (8), we

    first note that, when all links have uniform failure probability ,for all . can then be written as

    where

    To investigate the first order behavior of the largest eigenvalue,we observe that the eigenmatrix corresponding to the largesteigenvalue of is where is an eigenvector, with

    , corresponding to the second smallest eigenvalueof the Laplacian , also called the Fiedler vector. is also aneigenvector corresponding to the largest eigenvalue of (equiv-alently, the second largest eigenvalue of ).

    Applying formula (10) to this expression for yields the firstorder term in the expansion of the largest eigenvalue of to be

    The denominator can be simplified as follows:

    and therefore is equivalent to

    (11)

    Since is an eigenvector of , the following equality holds

    (12)

    where denotes the largest eigenvalue of . is also aneigenvector of . Therefore the following equality also holds

    (13)

    where denotes the second smallest eigenvalue of .Noting that , it follows that [27], [28] for

    , where is the maximum node degree of the graph, wehave the following relationship between and

    This equality allows us to rewrite (13) as

    (14)

    Using (12) and (14), (11) can be further simplified as follows:

    (15)

  • PATTERSON et al.: CONVERGENCE RATES OF DISTRIBUTED AVERAGE CONSENSUS 885

    Applying this identity and noting that , wearrive at following expression for which is valid up to firstorder in :

    (16)

    In the special case of a torus network, can be computedanalytically [27], [28]. For completeness, we state this resulthere.

    Theorem 2.4: In a -dimensional torus or -lattice withnodes, the asymptotic expression for the second largest eigen-value of the weight matrix (equivalently ) is given by

    Proof: The proof is given in the Appendix.With this result, we are able to derive an analytic form for the

    decay factor in tori networks.Theorem 2.5: For a -dimensional torus with nodes, the

    first order expansion (in ) of the decay factor is given by

    (17)

    Proof: We first note that, by substituting the value forgiven by Theorem 2.4 into (16), we arrive at the following ex-pression for

    (18)

    We now prove the theorem by showing that the term containingthe summation of matrices is of order .

    Recall that each matrix is of the formwhere is a vector of all zeros, excepting the and

    components which are equal to 1 and 1 respectively.Therefore, the following equivalence holds for the summation:

    (19)

    where and are the and components of . is theeigenvector corresponding to the second largest eigenvalue of

    , or equivalently, the eigenvector corresponding to the largesteigenvalue of . In the case of a -dimensional torus, there isan analytical expression for the eigenvectors . Let be suchthat .

    Each eigenvector of is associated with a multi-dimensionalindex , for . The componentsof such an eigenvector are given by

    for .

    The eigenvector corresponding to the largest eigenvalue ofoccurs when . The second largesteigenvalue has multiplicity with independent eigenvectors;each has one equal to 1 and all other ’s equal to 0. Wecompute the asymptotic expression for (19) for the eigenvectorwith and . The computation forthe other eigenvectors is similar.

    Let be the eigenvector with multi-index ; itscomponents are given by

    for . Substituting this expression for theand components of in (19), we obtain

    Since is an edge in the torus, we know that if nodesand share an edge in the first dimension then .Otherwise . Therefore, for all , we have

    Applying this bound to (19) and using the fact that in -dimen-sional torus with nodes, there are edges in each dimension,we get the following bound on the summation term:

    Therefore the summation term of matrices is of orderwhich gives the result in (17).

    It is interesting to note that for large , the leading orderbehavior of the decay factor is

    Recall that is the fraction that is sent across each link. There-fore for large , the failure of links with probability has thesame effect on the convergence rate as decreasing by a factorof .

    C. Simulations

    In this section, we demonstrate through simulations that therelationship between network size and dimensionality and linkfailure probability in tori networks stated in Theorem 2.4 ap-pears to hold even for smaller networks and a larger probabilityof link failure. Specifically, we demonstrate that, for a fixedfailure probability, the leading order of the decay factor is re-lated to the network size and dimension as follows:

    (20)

    In order to evaluate whether this relationship holds for dif-ferent network sizes, we simulate the algorithm in one-dimen-

  • 886 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010

    sional tori (ring) networks with sizes ranging from 10 to 1000nodes and in two-dimensional tori networks with sizes rangingfrom 36 to 1764 nodes. For all simulations we let links fail witha uniform probability of 0.1. In tori networks, the variance ofdeviation from average is the same at each node, and therefore,by Property 1 of Theorem 2.3

    or equivalently

    To estimate the decay factor, , for each network size, werun the algorithm and record the of the per node varianceas a function of time. In order to guarantee that the simulationsexhibit the worst case decay behavior, the initial matrixmust be such that it is not orthogonal to the eigenmatrix as-sociated with the largest eigenvalue of , or equivalently wemust have . Since is positive semidefiniteand (see the proof of Theorem 2.3), any covariance ma-trix will satisfy this property so long as forall . We achieve this by choosing each uniformly atrandom from the interval [0,100].

    We run each simulation until the plot of islinear, indicating that largest eigenvalue of the operator dom-inates the decay rate. We then find of the slope of this linear plotwhich gives us an estimate of . If the relationship be-tween the decay rate, the network dimension, and the number ofnodes as described in (20) holds, then a plot ofas a function of should have a slope of 2 for the ringnetworks and 1 for the 2-dimensional torus networks. Fig. 1shows versus using estimates ofgenerated by the procedure described above. For each type ofnetwork, the slope of the linear fit is very close to what is pre-dicted by (20), 1.9792 for the 1-dimensional networks and

    1.0011 for the 2-D networks. These results indicate that therelationship in (20) holds even for smaller network sizes.

    III. INCORPORATING ADDITIVE NOISE

    In this section, we extend our analysis to a network modelwhere node values are perturbed by a zero-mean additive noisein each round. Let be a zero-mean stochastic process withthe autocorrelation matrix defined by

    We assume that the additive noise processes are not correlatedwith the state nor with the stochastic processes governing com-munication failures. This type of noise can be used to modelrandom insertions and deletions from the participating nodes ina distributed file system or data center.

    The dynamics of this system are governed by an extension ofthe recursion equation in (3) that includes both multiplicative

    Fig. 1. ������ ����� as function of the logarithm of the network size.

    and additive noise

    (21)

    As in the first problem formulation, we are interested inthe second order statistics of the deviation from average,

    . However in a system with additive noise, theaverage of all node values at time , ,drifts in a random walk about the average of the initial values

    . Additionally, since node values are per-turbed in each round, one can no longer expect the nodes toconverge to consensus at the current average, or equivalently,each does not converge to 0. In this extended model withadditive noise, we do not measure the algorithm performancein terms of the convergence rate. Instead, performance ismeasured using the steady-state total variance of the deviationfrom average

    which is the sum of the variances of the deviation from the cur-rent average at each node. We are interested in the network con-ditions under which is bounded as well as in quantifyingthat bound.

    Problem Statement 2: Consider a distributed consensus algo-rithm on a network where each link fails with inde-pendent probability and where node values are perturbedby a zero-mean stochastic process, as modeled by the systemwith additive and multiplicative noise (21). For a given inputnoise covariance , determine the steady-state total variance ofthe deviation from average, .

    Again, we study by analyzing the recursion equa-tion for the matrices , noting thatis related to as follows:

  • PATTERSON et al.: CONVERGENCE RATES OF DISTRIBUTED AVERAGE CONSENSUS 887

    Using the same approach by which we derived the recursion (7),we can derive a recursion equation for the system with additivenoise.

    Lemma 3.1: The matrices for the system (21) satisfythe recursion

    where is the matrix-valued operator defined in (8).If the operator is asymptotically stable, this recursion has

    a limit

    and the limit satisfies the following Lyapunov-like equation

    These facts lead to the following theorem relating to the secondorder statistics of the system (21).

    Theorem 3.2: Consider the distributed consensus algorithmwith random link failures as modeled by the system with multi-plicative and additive noise (21).

    1) The total variance of the deviation from averagehas a steady-state limit if and only if

    2) This limit is equal to the trace of , ,where satisfies the equation

    This theorem implies that if the consensus algorithm resultsin convergence to the average in a network with random linkfailures, the same algorithm executed on the same network withlink failures and additive noise has a finite steady-state limit forthe total deviation from average.

    IV. COMPUTATIONAL PROCEDURES

    We present computational methods for calculating the exactsecond order statistics of the deviation from average for sys-tems with random communication failures. For the static-valuedsystem model, the procedure involves computing the largesteigenvalue of a matrix-valued operator. For systems with ad-ditive noise, one must compute the trace of a solution of a Lya-punov-like equation.

    A. Computing the Decay Factor

    The decay factor of the static-valued system (3) is the spectralradius of the linear operator defined in (8). Therefore, it is notnecessary to perform Monte Carlo simulations of the originalsystem (4) to compute decay factors. However, is not in aform to which standard eigenvalue computation routines can beimmediately applied. We present a simple procedure to obtaina matrix representation of which can then be readily used ineigenvalue computation routines.

    Recall that the Kronecker product of any two andmatrices and respectively is the matrix

    .... . .

    ...

    Let denote the “vectorization” of any matrixconstructed by stacking the matrix columns on top of one

    another to form an vector. It then follows that a matrixequation of the form can rewritten using matrix-vector products as

    Thus, using Kronecker products, in (8) has a matrix repre-sentation of the form

    For a graph with nodes, is an matrix. This ma-trix representation can be used to find via readily avail-able eigenvalue routines in MATLAB. Due to the structure of

    , it is also possible to compute the eigenvalues in a more effi-cient manner. We briefly outline this procedure here. For largevalues of , one can use an Arnoldi eigensolver to determinethe eigenvalues of in a constant number of matrix-vector mul-tiplications that depends on the structure of . Since is thesum of terms, this matrix-vector multiplication can alsobe computed by multiplying each of the terms by the vectorand summing the result. The product of andan -vector can be computed in . Eachcontains exactly 16 non-zero elements, and thus the product ofeach times an -vector can be computed in

    . Therefore, it is possible to find the eigenvalues of in.

    B. Computing the Steady-State Total Variance

    Recall that the steady-state total variance of the deviationfrom average is the trace of where satisfies theLyapunov-like equation

    where is the covariance matrix of the additive noise process.

    We again use Kronecker products to find an expression for

  • 888 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010

    Using this expression, can be computed directly forany given algorithm instance and covariance matrix . One canthen reassemble from and find its trace.

    In the next section, we use our computational procedures tocalculate the decay factor and steady-state total deviation fromaverage for various network examples.

    V. EXAMPLES

    We examine the second order statistics of the deviation fromaverage for the consensus algorithm as a function of uniformlink failure probability. For static-valued networks, we givecomputational results for different network topologies andvalues of to illustrate the relationship between the probabilityof failure, the structure of the network, and the choice of .For networks with additive noise, we give results that considerall three of these factors, and we also explore the effects ofthe size of the variance of the additive noise process on thevariance of the deviation from average. For each class of prob-lems, MATLAB was used to produce results according to thecomputational procedures described in the previous section.

    A. Decay Factors

    We first investigate the behavior of the decay factor insystems with no additive noise. For each network topology, wecompute the decay factor for several values of including thevalue that is optimal for each graph when there are no commu-nication failures. This optimal is the edge weight that yieldsthe smallest decay factor in networks with reliable communica-tion links. The value is given by the following [10]

    where and are the second smallest and the largesteigenvalues of the Laplacian matrix of the graph, respectively.

    Figs. 2 and 3 give the decay factors for a ring network with9 nodes and a 2-dimensional discrete torus with 25 nodes. Foreach topology, we show the decay factors for the optimal , athat is larger than optimal, , where is the degree ofeach node in the network, and a that is smaller than optimal,

    . For the ring network, the larger is 0.5, the op-timal is approximately 0.4601, and the smaller is 0.25. Forthe 2-dimensional torus, the larger is 0.25, the optimal isapproximately 0.2321 and the smaller is 0.125.

    As expected, in both networks, when there are no link fail-ures, the decay factor is smallest for the optimal . Surprisingly,for the maximum , the decay factors decrease for small proba-bilities of failure, and this edge weight yields better performancethan the optimal weight. The decay factor continues to decreaseuntil the failure probability reaches approximately 0.1 and thensteadily increases. For the case where , the decayfactor is consistently larger than that for the optimal . Sim-ilar trends can be observed in the decay factors larger networks,however the difference for the various choices of is not as pro-nounced.

    We also compute the decay factors for an Erdős-Rényi (ER)random graph [29] of 50 nodes where each pair of nodes is con-nected with probability 0.25. The graph has 319 edges and a

    Fig. 2. Decay factor for various link failure probabilities in a 9 node ring net-work.

    Fig. 3. Decay factor for various link failure probabilities in a 25 node 2-D torus.

    maximum node degree of 20. The decay factors are given inFig. 4. The optimal is approximately 0.071. We also showdecay factors for values of that are larger and smaller than op-timal, and , respectively. As in the resultsfor the torus networks, the optimal yields the smallest decayfactor when there is zero probability of edge failure. When fail-ures are introduced, the decay factor initially decreases for thelarger value of , where it actually results in faster convergencethan the optimal .

    We conjecture that link failures reduce the effective weightof the values that are sent across each edge over a large numberof rounds. In the case where is larger than the optimal choice,the introduction of failures decreases the effective weight to ap-proach the optimal , and thus the algorithm performance ac-tually improves. These results demonstrate that there is a rela-tionship between the failure probability and the choice of , andtherefore it seems possible to select a that optimizes perfor-mance for a given failure probability.

  • PATTERSON et al.: CONVERGENCE RATES OF DISTRIBUTED AVERAGE CONSENSUS 889

    Fig. 4. Decay factor for various link failure probabilities in a 50 node ERrandom graph.

    Fig. 5. Steady-state total variance of the deviation from average in 64 nodetorus networks of dimensions 1, 2, and 3.

    B. Steady-State Total Variance

    We next examine the steady-state total variance for systemswith communications failures where the state values are per-turbed by additive noise. While we do not know of any analyt-ical result for the optimal choice of for these systems whenthere are no communication failures, it has been shown thatthe optimal edge weight can be bounded above and below as

    [30].Fig. 5 shows the results for 64 node tori networks of dimen-

    sion 1, , 2 , and 3 . For all networks, thevariance of the additive noise is 10. For each network, we select

    to be the lower bound of the optimal value, . Ina torus, this value corresponds to where is the degreeof each node in the graph. So, for , we have , for

    we have , and for , we have .While the magnitude of is different for each of the three

    networks, the effect of increasing the probability of communi-cation failure is appears to be same regardless of the dimensionof the torus. In fact, for all three networks, the seems to grow as

    , which is also shown in the figure.In Fig. 6, we show the steady-state total variance for a 9

    node ring network. The node values are perturbed by a zero-mean additive noise with a variance of 10. We use both theupper bound on the optimal value of , , which is approxi-mately 0.2578, and the lower bound on the optimal value, ,which is approximately 0.5155. We observe that for , intro-ducing a small probability of communication failure decreasesthe steady-state total variance. Just as the introduction of com-munication failures can decrease the decay factor in systemswith no additive noise, this result demonstrates that commu-nication failures can also improve performance by decreasingvariance in systems with additive noise.

    Finally, in Fig. 7, we show the steady-state total variance foran ER random graph with 30 nodes, where an edge exists be-tween each pair of nodes with probability 0.25. The graph has132 edges and a maximum node degree of 15. We use both theupper and lower bounds on the optimal , and

    . We show results for systems with zero-meanadditive noise with variance of 1, 10, and 100. As in the pre-vious scenario, a small probability of communication failure de-creases the total variance for in all cases. An interesting ob-servation is that the variance of the additive noise process doesnot affect the relationship between the probability of commu-nication failure and the steady-state total variance. For all threeadditive noise processes, the behavior of the steady-state totalvariance is the same with respect to the probability of failure.Additionally, after the initial decrease, the variance appears togrow as for all network instances.

    VI. CONCLUSION

    We have presented an analysis of the distributed average con-sensus algorithm in networks with stochastic communicationfailures and shown that the problem can be formulated as alinear system with multiplicative noise. For systems with no ad-ditive noise, we have shown that the convergence rate of the con-sensus algorithm can be characterized by the spectral radius of aLyapunov-like matrix recursion, and we have developed expres-sions for the multiplicative decay factor in the asymptotic limitsof small failure probability and large networks. For systems withadditive noise, we have shown that the steady-state total devi-ation from average is given by the solution of a Lyapunov-likeequation. For both models, we have presented simulation-freemethods for computing the second order statistics of the devi-ation from average. Using these methods, we have computedthese second order statistics for various network topologies asa function of link failure probability. These computations indi-cate that there is a relationship between the network topology,the algorithm parameter , and the probability of failure that ismore complex than intuition would suggest. In particular, weshow that for certain choices of , communication failures canactually improve algorithm performance.

    As the subject of current work, we are investigating the ex-tension of our model and analysis to incorporate communication

  • 890 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010

    Fig. 6. Steady-state total variance of the deviation from average for variouslink failure probabilities in a nine-node ring network.

    Fig. 7. Steady-state total variance of the deviation from average for variouslink failure probabilities in a 30 node ER random graph.

    failures that are spatially and temporally correlated. Such exten-sions will enable the study of other network conditions such asnetwork partitions and node failures.

    APPENDIX APROOF OF THEOREM 2.3

    Proof: In order to prove the existence of a covariance ma-trix for which the decay factor of the linear recursion (6)is precisely , we show that every eigenvalue of has anassociated positive semidefinite eigenmatrix. By settingto be the eigenmatrix associated with the largest eigenvalue of

    , the worst case decay rate is achieved.

    We first show that every for every eigenvalue-eigenmatrixpair of , there exists a symmetric matrix such that

    is also an eigenvalue-eigenmatrix pair of . Let be thesymmetric matrix . Then, we have

    Since is self-adjoint, all of its eigenvalues are real, and so, giving

    Let be the largest eigenvalue of , and let be the corre-sponding symmetric eigenmatrix. Then the decay factor of the

    operator acting on an initial state of is precisely . We notethat as is symmetric, it can be decomposed as ,where and are positive and negative semidefinite respec-tively. By the linearly of , we have

    Therefore the decay rate of with the initial conditions isequivalent to the maximum of the decay rates of with theinitial condition . and with the initial condition . Thisimplies that there exists a covariance (positive semi-definite)matrix such that the decay factor of the operator actingon the initial is the spectral radius of .

    APPENDIX BPROOF OF THEOREM 2.4

    Proof: We consider an node tori network of dimensionas a -dimensional -array where . The distributed

    average consensus algorithm is given by the following recursionequation:

    where each ranges from 0 to .Each node communicates with its two neighbors along each

    of the axes in each round. The numbers and then mustsatisfy . The sum in the equation above can bewritten as a multidimensional convolution by defining the array

    ,, and for ,

    otherwise.

  • PATTERSON et al.: CONVERGENCE RATES OF DISTRIBUTED AVERAGE CONSENSUS 891

    We can then express the averaging operation defined above as

    where is the circulant operator associated with the array. The eigenvalues of can be determined using

    the Discrete Fourier Transform, with , for

    The largest eigenvalue occurs when all ’s are zero, and thiseigenvalue is 1. The next largest eigenvalue occurs when all butone of the ’s are zero and the non-zero is 1. This eigen-value corresponds to

    (22)

    When and consequently are large, can beexpressed as

    Substituting this equivalence into (22) and using the fact that, we obtain the following expression for the second

    largest eigenvalue of

    In the case of a lattice network, the matrix is Toeplitz ratherthan circulant. However, the spectrum of the matrix for a

    -lattice and -dimensional torus are equivalent in the limit oflarge [31], [32]. Therefore, the convergence results can beapplied to lattice networks as well as tori.

    REFERENCES

    [1] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups ofmobile autonomous agents using nearest neighbor rules,” IEEE Trans.Autom. Control, vol. 48, no. 6, pp. 988–1001, Jun. 2003.

    [2] R. Olfati-Saber and R. Murray, “Consensus problems in networks ofagents with switching topology and time-delays,” IEEE Trans. Autom.Control, vol. 49, no. 9, pp. 1520–1533, Sep. 2004.

    [3] L. Moreau, “Stability of multiagent systems with time-dependent com-munication links,” IEEE Trans. Autom. Control, vol. 50, no. 2, pp.169–182, Feb. 2005.

    [4] L. Xiao, S. Boyd, and S. Lall, “A space-time diffusion scheme forpeer-to-peer least-squares estimation,” in Inform. Processing SensorNetworks, 2006, pp. 168–176.

    [5] G. Cybenko, “Dynamic load balancing for distributed memory multi-processors,” J. Parallel Dist. Comput., vol. 7, no. 2, pp. 279–301, 1989.

    [6] J. E. Boillat, “Load balancing and poisson equation in a graph,” Con-currency: Practice Exper., vol. 2, no. 4, pp. 289–313, 1990.

    [7] D. Kempe, A. Dobra, and J. Gehrke, “Gossip-based computation of ag-gregate information,” in Proc. 44th Annu. IEEE Symp. Found. Comput.Sci., 2003, pp. 482–491.

    [8] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip al-gorithms,” IEEE Trans. Inform. Theory, vol. 52, no. 6, pp. 2508–2530,Jun. 2006.

    [9] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Compu-tation: Numerical Methods. Nashua, NH: Athena Scientific, 1997.

    [10] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,”Syst. Control Lett., vol. 52, pp. 65–78, 2004.

    [11] B. Bamieh, M. Jovanovic, P. Mitra, and S. Patterson, “Effect of topo-logical dimension on rigidity of vehicle formations: Fundamental lim-itations of local feedback,” in Proc. 47th IEEE Conf. Decision Control,2008, pp. 369–374.

    [12] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensorfusion based on average consensus,” in Proc. Inform. ProcessingSensor Networks, 2005, pp. 63–70.

    [13] M. Cao, A. S. Morse, and B. D. O. Anderson, “Reaching a consensusin a dynamically changing environment: Convergence rates, measure-ment delays, and asynchronous events,” SIAM J. Control Optim., vol.47, no. 2, pp. 601–623, 2008.

    [14] A. Olshevsky and J. N. Tsitsiklis, “Convergence speed in distributedconsensus and averaging,” SIAM J. Control Optim., vol. 48, no. 1, pp.33–55, 2009.

    [15] Y. Hatano and M. Mesbahi, “Agreement over random networks,” IEEETrans. Autom. Control, vol. 50, no. 11, pp. 1867–1872, Nov. 2005.

    [16] C. Wu, “Synchronization and convergence of linear dynamics inrandom directed networks,” IEEE Trans. Autom. Control, vol. 51, no.7, pp. 1207–1210, Jul. 2006.

    [17] M. Porfiri and D. J. Stilwell, “Consensus seeking over randomweighted directed graphs,” IEEE Trans. Autom. Control, vol. 52, no.9, pp. 1767–1773, Sep. 2007.

    [18] A. Tahbaz-Salehi and A. Jadbabaie, “A necessary and sufficient condi-tion for consensus over random networks,” IEEE Trans. Autom. Con-trol, vol. 53, no. 3, pp. 791–795, Apr. 2008.

    [19] S. Kar and J. Moura, “Distributed average consensus in sensor net-works with random link failures,” in Proc. Int. Conf. Acoust., Speech,Signal Processing, 2007, pp. 1013–1016.

    [20] S. Patterson and B. Bamieh, “Distributed consensus with link failuresas a structured stochastic uncertainty problem,” in Proc. 46’th AllertonConf. Commun., Control, Comput., 2008, pp. 623–627.

    [21] S. Kar and J. Moura, “Distributed average consensus in sensor net-works with random link failures and communication channel noise,”IEEE Trans. Signal Process., to be published.

    [22] E. Kranakis, D. Krizanc, and J. van den Berg, “Computing booleanfunctions on anonymous networks,” Inform. Computat., vol. 114, no.2, pp. 214–236, 1994.

    [23] J. N. Tsitsiklis, “Problems in Decentralized Decision Making and Com-putation,” Ph.D. dissertation, MIT, Cambridge, MA, 1985.

    [24] N. Elia, “Emergence of power laws in networked control systems,” inProc. 45th IEEE Conf. Decision Control, Dec. 2006, pp. 490–495.

    [25] S. Boyd, L. El Ghauoi, E. Feron, and V. Balakrishnan, Linear Ma-trix Inequalities in System and Control Theory, ser. Studies in AppliedMathematics. Philadelphia, PA: SIAM, 1994, vol. 15.

    [26] H. Baumgartel, Analytic Perturbation Theory for Matrices and Oper-ators. Basel, Switzerland: Birkhauser Verlag, 1985.

    [27] S. Patterson, B. Bamieh, and A. El Abbadi, “Brief announcement: Con-vergence analysis of scalable gossip protocols,” in Proc. 20th Int. Symp.Distrib. Comput., 2006, pp. 540–542.

    [28] S. Patterson, B. Bamieh, and A. El Abbadi, “Convergence Analysis ofScalable Gossip Protocols,” Tech. Rep. 2006–09, Jul. 2006.

    [29] P. Erdos and A. Renyi, “On the evolution of random graphs,” Publ.Math. Inst. Hungarian Acad. Sci., vol. 5, pp. 17–61, 1960.

    [30] L. Xiao, S. Boyd, and S.-J. Kim, “Distributed average consensus withleast-mean-square deviation,” J. Parallel Dist. Comput., vol. 67, pp.33–46, 2007.

    [31] E. E. Tyrtyshnikov, “A unifying approach to some old and new theo-rems on distribution and clustering,” Linear Algebra Appl., vol. 232,pp. 1–43, 1996.

    [32] S. Salapaka, A. Peirce, and M. Dahleh, “Analysis of a circulant basedpreconditioner for a class of lower rank extracted systems,” Numer. Lin.Algebra Appl., vol. 12, pp. 9–32, Feb. 2005.

  • 892 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 4, APRIL 2010

    Stacy Patterson received the B.S. degree in mathe-matics and computer science from Rutgers Univer-sity, Piscataway, NJ, in 1998 and the M.S. and Ph.D.degrees in computer science from the University ofCalifornia, Santa Barbara, in 2003 and 2009, respec-tively.

    She is currently a Postdoctoral Scholar in the De-partment of Mechanical Engineering, University ofCalifornia, Santa Barbara. Her research areas includedistributed systems, sensor networks, and pervasivecomputing.

    Bassam Bamieh (S’88–M’90–SM’02–F’08) re-ceived the Electrical Engineering and Physics degreefrom Valparaiso University, Valparaiso, IN, in 1983.and the M.Sc. and Ph.D. degrees from Rice Univer-sity, Houston, TX, in 1986 and 1992, respectively.

    From 1991 to 1998, he was with the Departmentof Electrical and Computer Engineering and the Co-ordinated Science Laboratory, University of Illinoisat Urbana-Champaign. He is currently a Professorof mechanical engineering at the University of Cal-ifornia, Santa Barbara. He is currently an Associate

    Editor of Systems and Control Letters. His current research interests are indistributed systems, shear flow turbulence modeling and control, quantumcontrol, and thermo-acoustic energy conversion devices.

    Dr. Bamieh received the AACC Hugo Schuck Best Paper Award, the IEEECSS Axelby Outstanding Paper Award, and the NSF CAREER Award. He is aControl Systems Society Distinguished Lecturer.

    Amr El Abbadi (SM’00) received the Ph.D. degreein computer science from Cornell University, Ithaca,NY.

    In August 1987, he joined the Department of Com-puter Science, University of California, Santa Bar-bara (UC Santa Barbara), where he is currently a Pro-fessor. He has served as Area Editor for InformationSystems: An International Journal, an Editor of In-formation Processing Letters (IPL), and an AssociateEditor of the Bulleten of the Technical Committee onData Engineering. He is currently the Chair of the

    Computer Science Department, UC Santa Barbara. His main research interestsand accomplishments have been in understanding and developing basic mech-anisms for supporting distributed information management systems, includingdatabases, digital libraries, peer-to-peer systems, and spatial databases.

    Dr. El Abbadi received the UCSB Senate Oustanding Mentorship Award in2007. He served as a Board Member of the VLDB Endowment from 2002 to2008.