Top Banner
Systems & Control Letters 58 (2009) 202–212 Contents lists available at ScienceDirect Systems & Control Letters journal homepage: www.elsevier.com/locate/sysconle Distributed randomized algorithms for probabilistic performance analysis Giuseppe Carlo Calafiore * Dipartimento di Automatica e Informatica, Politecnico di Torino, Italy article info Article history: Received 20 May 2008 Received in revised form 27 October 2008 Accepted 27 October 2008 Available online 6 December 2008 Keywords: Randomized algorithms Probabilistic robustness Distributed estimation Consensus and agreement problems abstract Randomized algorithms are a useful tool for analyzing the performance of complex uncertain systems. Their implementation requires the generation of a large number N of random samples representing the uncertainty scenarios, and the corresponding evaluation of system performance. When N is very large and/or performance evaluation is costly or time consuming, it can be necessary to distribute the computational burden of such algorithms among many cooperating computing units. This paper studies distributed versions of randomized algorithms for expected value, probability and extrema estimation over a network of computing nodes with possibly time-varying communication links. Explicit a priori bounds are provided for the sample and communication complexity of these algorithms in terms of number of local samples, number of computing nodes and communication iterations. © 2008 Elsevier B.V. All rights reserved. Notation. For a matrix X , X ij denotes the element of X in row i and column j, and X > denotes the transpose of X . X > 0 (resp. X 0) denotes a positive (resp. non-negative) matrix, that is a matrix with all positive (resp. non-negative) entries. kX k denotes the spectral (maximum singular value) norm of X , or the standard Euclidean norm, in the case of vectors. For a square matrix X R n,n , we denote with σ(X ) ={λ 1 (X ),...,λ n (X )} the set of eigenvalues, or spectrum, of X , and with ρ(X ) the spectral radius: ρ(X ) . = max i=1,...,n |λ i (X )|, where λ i (X ), i = 1,..., n, are the eigenvalues of X . I n denotes the n × n identity matrix, and 1 n denotes an n- vector of ones (subscripts with dimensions are omitted whenever they can be inferred from context). e i R n denotes a vector with all zero entries, except for the ith position, which is equal to one. We denote with bxc the largest integer smaller than or equal to x. 1. Introduction A central issue in the analysis of complex systems is the assessment of robustness, safety and degradation of performance due to the combined effect of critical parameters or uncertainties affecting the system. In quite general terms, one may denote with δ a vector collecting the critical parameters that enter a system’s description, and endow δ with a probability distribution ‘‘Prob’’ which describes the likelihood of occurrence of different sets of parameter values. Then, a normalized scalar index J (δ) ∈[0, 1] may be introduced in order to quantify some features of interest of the system, and the following three fundamental types of problems can be considered: * Tel.: +39 011 5647071; fax: +39 011 5647099. E-mail address: [email protected]. P1. Evaluate the expected value of J (δ), that is asses the average index of system performance; P2. Evaluate the probability that J (δ) exceeds a given threshold value γ , that is asses the probability of the event {J (δ) γ }, or, equivalently, evaluate the shortfall risk, that is the probability of {J (δ) < γ }; P3. Evaluate a probable maximum of J (δ), that is determine the least level ¯ J such that {J (δ) ¯ J } is guaranteed with given high probability. Notice that all the above problems constitute key steps in most engineering and manufacturing verification problems, and that their exact numerical solution is very difficult to obtain, in general. This is due to the fact that computing expectations and probabilities amounts to solving multi-dimensional integrals, and also to the fact that index J may well lack analytic structure: For instance, the value of J for a specific δ may be the result of a complex and time-consuming computer simulation. A widely accepted technique to obtain estimates for the solution of the previous problems is to resort to random sampling of the parameter δ according to its underlying probability measure, and construct with these samples empirical approximations of the quantities of interest. In particular, if {δ (i) , i = 1,..., N } are N independent and identically distributed (iid) samples, we have that ˆ J = 1 N N X i=1 J (i) ) (1) is an empirical estimator of E {J (δ)} – that is of the quantity of interest in problem P1 – and that ¯ J = max i=1,...,N J (i) ) (2) 0167-6911/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.sysconle.2008.10.010
11

Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

Jul 06, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

Systems & Control Letters 58 (2009) 202–212

Contents lists available at ScienceDirect

Systems & Control Letters

journal homepage: www.elsevier.com/locate/sysconle

Distributed randomized algorithms for probabilistic performance analysisGiuseppe Carlo Calafiore ∗Dipartimento di Automatica e Informatica, Politecnico di Torino, Italy

a r t i c l e i n f o

Article history:Received 20 May 2008Received in revised form27 October 2008Accepted 27 October 2008Available online 6 December 2008

Keywords:Randomized algorithmsProbabilistic robustnessDistributed estimationConsensus and agreement problems

a b s t r a c t

Randomized algorithms are a useful tool for analyzing the performance of complex uncertain systems.Their implementation requires the generation of a large number N of random samples representingthe uncertainty scenarios, and the corresponding evaluation of system performance. When N is verylarge and/or performance evaluation is costly or time consuming, it can be necessary to distribute thecomputational burden of such algorithms among many cooperating computing units. This paper studiesdistributed versions of randomized algorithms for expected value, probability and extrema estimationover a network of computing nodes with possibly time-varying communication links. Explicit a prioribounds are provided for the sample and communication complexity of these algorithms in terms ofnumber of local samples, number of computing nodes and communication iterations.

© 2008 Elsevier B.V. All rights reserved.

Notation. For amatrixX ,Xij denotes the element ofX in row i andcolumn j, and X> denotes the transpose of X . X > 0 (resp. X ≥ 0)denotes a positive (resp. non-negative)matrix, that is amatrixwithall positive (resp. non-negative) entries. ‖X‖ denotes the spectral(maximum singular value) norm of X , or the standard Euclideannorm, in the case of vectors. For a square matrix X ∈ Rn,n, wedenote with σ(X) = {λ1(X), . . . , λn(X)} the set of eigenvalues,or spectrum, of X , and with ρ(X) the spectral radius: ρ(X) .

=

maxi=1,...,n |λi(X)|, where λi(X), i = 1, . . . , n, are the eigenvaluesof X . In denotes the n × n identity matrix, and 1n denotes an n-vector of ones (subscripts with dimensions are omitted wheneverthey can be inferred from context). ei ∈ Rn denotes a vector withall zero entries, except for the ith position, which is equal to one.We denote with bxc the largest integer smaller than or equal to x.

1. Introduction

A central issue in the analysis of complex systems is theassessment of robustness, safety and degradation of performancedue to the combined effect of critical parameters or uncertaintiesaffecting the system. In quite general terms, one may denote withδ a vector collecting the critical parameters that enter a system’sdescription, and endow δ with a probability distribution ‘‘Prob’’which describes the likelihood of occurrence of different sets ofparameter values. Then, a normalized scalar index J(δ) ∈ [0, 1]may be introduced in order to quantify some features of interest ofthe system, and the following three fundamental types of problemscan be considered:

∗ Tel.: +39 011 5647071; fax: +39 011 5647099.E-mail address: [email protected].

0167-6911/$ – see front matter© 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.sysconle.2008.10.010

P1. Evaluate the expected value of J(δ), that is asses the averageindex of system performance;

P2. Evaluate the probability that J(δ) exceeds a given thresholdvalue γ , that is asses the probability of the event {J(δ) ≥ γ }, or,equivalently, evaluate the shortfall risk, that is the probabilityof {J(δ) < γ };

P3. Evaluate a probable maximum of J(δ), that is determine theleast level J such that {J(δ) ≤ J} is guaranteed with given highprobability.

Notice that all the above problems constitute key steps inmost engineering and manufacturing verification problems, andthat their exact numerical solution is very difficult to obtain, ingeneral. This is due to the fact that computing expectations andprobabilities amounts to solving multi-dimensional integrals, andalso to the fact that index J may well lack analytic structure: Forinstance, the value of J for a specific δ may be the result of acomplex and time-consuming computer simulation.Awidely accepted technique to obtain estimates for the solution

of the previous problems is to resort to random sampling of theparameter δ according to its underlying probability measure, andconstruct with these samples empirical approximations of thequantities of interest. In particular, if {δ(i), i = 1, . . . ,N} are Nindependent and identically distributed (iid) samples,wehave that

J =1N

N∑i=1

J(δ(i)) (1)

is an empirical estimator of E {J(δ)} – that is of the quantity ofinterest in problem P1 – and that

J = maxi=1,...,N

J(δ(i)) (2)

Page 2: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212 203

is an empirical solution to problem P3. Notice that problem P2 isjust a special case of P1, since Prob{J(δ) ≥ γ } = E {I(J(δ)− γ )},where I(z) is equal to one if z ≥ 0 and it is zero otherwise.Let us concentrate on Eq. (1). This formula can be interpreted

as an algorithm that processes sequentially the sampled data andreturns J as its output. In order for this algorithm to provide auseful albeit approximate solution for problem P1, we shall requireJ to be close to E {J}. Formally, we fix a level ε ∈ (0, 1) andsay that J is a good approximation of E {J} if |J − E {J} | ≤ ε.Note however that J is a random quantity that depends on thebatch of random extractions {δ(i), i = 1, . . . ,N}, therefore theresult of our algorithm can be a good approximation for someextraction and a bad approximation for another extraction. In otherwords, there exist a chance β ∈ (0, 1) that the outcome of thealgorithm is a ‘‘bad’’ solution for our problem. This fact is typicalof randomized algorithms. However, the chance β of getting abad solution can typically be rendered arbitrarily small by takinga sufficiently large number of samples N , see the discussion inRemark 1 for a more precise statement. A similar situation arisesfor the empirical estimate of probable maximum in (2), for whichit holds that Prob{J(δ) > J} ≤ β , with β that decreases with N , seethe discussion in Section 4.Randomized algorithms for estimating the level of performance

of uncertain systems have recently received considerable attentionin robust control and related areas. The main motivation for thisinterest stems from the fact that typical indices of performancefor uncertain control systems (such as, for instance, the structuredsingular value) provide conservative assessments and are numer-ically hard to compute by traditional deterministic methods. Ran-domized methods alleviate conservatism by allowing some smallprobability of performance shortfall, and are based on numericallyefficient random sampling techniques. A rather comprehensive ac-count of randomized techniques in control, together with pointersto related literature (such as Monte Carlo methods) can be foundin the texts [3,18]; see also the survey in [2].In a typical application of a randomized estimation technique,

one fixes the desired a priori probabilistic levels of confidence, andthen derives a lower bound on the number of samples N that arerequired in order to build the desired estimator. For the case ofestimation of expectation, for instance, the well-known Hoeffdinginequality (see [11]) states that we need N ≥ ln 2

β·1ε2, in order to

guarantee that

Prob{|J − E {J} | > ε} ≤ β.

Note that β can be taken to be ‘‘practically zero’’ (say, β =10−16) without increasing N too much: N ≥ 40/ε2. However,if also small error ε is desired, then N can be quite large; forinstance N ≥ 4 × 107 for ε = 0.001. Large N means thata large number of random samples δ(i) need be generated andthat, perhaps more critically, a correspondingly large number ofperformance index evaluations J(δ(i)) need be performed. Whensample generation and/or index evaluation are time-consumingoperations, the overall computational burden of the randomizedalgorithm can be too high for a single computing node. In suchcases it may be necessary to distribute the computation among anumber of networked cooperating nodes, see, e.g., [1].The objective of this paper is to analyze distributed schemes

for randomized estimation algorithms. Specifically, we considera situation where n ≥ 1 networked nodes are availablefor generating samples and evaluating the performance index.Each node initially computes a local estimate on the basis ofa batch of N locally generated samples, and then repeatedlycommunicates with its instantaneous neighbors in order to ‘‘fuse’’its local estimate with the neighboring ones, in accordance to asuitable distributed algorithm, which, in the case of estimation

of expectation, is a version of a distributed consensus algorithmwithMetropolis weights (see [21]), and in the case of estimation ofprobable maximum is a distributed max-agreement algorithm.We allow the communication structure of the network to be

time-varying (that is, the set of neighbors of each nodemay changewith time) and prove in Section 3.1 an ‘‘Hoeffding-type’’ resultfor distributed estimation of expectation. In particular, we showthat all local estimates converge at geometric rate to the ‘‘virtual’’centralized estimate (that is, the estimate that would have beenobtained by an ideal single ‘‘super node’’ executing centrally thewhole computationbased on the totalnN samples), andweprovideexplicit bounds that relate the desired probabilistic confidencelevels to three key parameters of the distributed scheme, namely:(a) the number N of samples and corresponding index evaluationsthat need be performed locally at each node, (b) the number nof collaborating nodes, and (c) the number k of communicationrounds that need be executed before convergence. Similar resultsare also given in Section 4 for distributed estimation of probablemaximum.The analysis developed in this paper builds upon several

recent contributions on distributed estimation and agreementproblems for networked agents, see for instance [5,13,15,16,20,21]. In particular, the algorithm for distributed expected valueestimation is analogous to the one introduced in [21] for averageconsensus, although our objective is to obtain explicit convergencebounds and Hoeffding-type results for distributed estimation, thatare currently not available in the literature. Also, the results inSection 4 on distributed estimation of probable near extremaappear to be new, to the best of the author’s knowledge.This paper is organized as follows: Section 2 introduces some

notation and preliminaries on graphs and non-negative matrices.Section 3 describes the distributed algorithm for estimation ofexpectation, and contains the key analysis results on this algorithm(Theorem 3 and Corollary 3). Section 4 describes the distributedalgorithm for estimation of probable maximum and contains therelative analysis results (Theorem 4). Section 5 illustrates thetheory through numerical examples on ring networks. Conclusionsare drawn in Section 6. Some side technical details are reported inthe Appendix.

General setting

To avoid unduly formal complications, in the rest of this paperwe simply set x .

= J(δ) and ‘‘work’’ directly in the space ofthe (scalar) random variable x ∈ [0, 1]. The probability measureProb on x is the image through map J of the original probabilitydistribution on δ. The two key problems addressed in the paperare estimation of E {x} (problem P1) and of a probable maximumof x (problem P3) via distributed algorithms that use iid randomsamples x(i) of x distributed according to Prob. Problem P2, relatedto estimation of Prob{x ≥ γ }, reduces to problem P1 by expressingthe probability as an expectation of the indicator function of (x −γ ), that is Prob{x ≥ γ } = E {I(x− γ )}.

2. Preliminaries on graphs and non-negative matrices

In this paper, the communication structure among nodes in thenetwork is described using graph formalism. Before attacking ourcentral problems we hence summarize some preliminary resultson graphs and related non-negative matrices. Standard referenceson these topics include [8,9].A graph is a pair G = (V, E), whereV = {1, 2, . . . , n} is a set of

nodes and E ⊆ V×V is a set of ordered edges.We say that (i, j) ∈ Eif node i can receive information from node j. If (j, i) ∈ E whenever(i, j) ∈ E , then the graph is said to be undirected. Undirected graphsmodel bidirectional communication between nodes. Two nodes i, j

Page 3: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

204 G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212

are connected if there exist a sequence of distinct edges (i.e., a path)leading from j to i. The length of a path is the number of edgescomposing the path. A graph is connected if every pair of nodes isconnected. We here consider graphs with self loops, which meansthat (i, i) ∈ E , ∀i ∈ V , hence a node is always connected withitself. For each i ∈ V , the set of neighborsNi of node i is defined asthe set of nodes from which node i can receive information, that isNi = {j ∈ V : (i, j) ∈ E}; note that i ∈ Ni.There is a natural way of associating a non-negative matrix A

to a graph G, by considering matrices whose (i, j) entry is positivewhenever an edge exists between nodes (i, j) in the correspondinggraph, and it is zero otherwise. Let us thus introduce the followingset of non-negative matrices with positive diagonal entries:M

.= {A ∈ Rn,n : A ≥ 0, Aii > 0, i = 1, . . . , n},

and notice thatM is closed under addition and multiplication. Wehave the following definition.

Definition 1. For A ∈ M, we say that the matrix/graph pair(A,G(V, E)) is compatible if Aij > 0⇔ (i, j) ∈ E .

The notion of connectedness of a graph is related to the notionof primitiveness of a matrix compatible with that graph. A squarematrix A ≥ 0 is said to be primitive if there exists an integerm ≥ 1 such that Am > 0. The least integer m such that Am > 0is called the index of primitivity of A. If A is primitive then ρ(A) is analgebraically simple eigenvalue of A and the eigenspace associatedwith this eigenvalue is one dimensional. The following theoremcan be readily established (using for instance Theorems 6.2.24 and8.5.2 and Lemma 8.5.5 of [12]).

Theorem 1. Let A ∈ M such that (A,G(V, E)) is a compatible pair.Then A is primitive if and only if G is connected.

It can also be readily established that if (A,G) are compatible andG is connected, then there exists a path of length no larger thanm between any pair of nodes of G, where m ≥ 1 is the index ofprimitivity of A.Graph composition and union. Let Ga(V, Ea), Gb(V, Eb) be twographs with common vertex set. The composition Gab = Ga ◦ Gbof the two graphs is defined as the graph with vertex set V suchthat (i, j) ∈ Eab if and only if (i, k) ∈ Ea and (k, j) ∈ Eb forsome k ∈ V . The union Ga|b = Ga ∪ Gb of the two graphs isdefined as the graph with vertex set V such that (i, j) ∈ Ea|b ifand only if (i, j) ∈ Ea or (i, j) ∈ Eb. Notice that graph union iscommutative whereas composition is not. Note also that the edgeset of the union is a subset of the edge set of the composition of anypermutation of the graphs, that is Ea|b ⊆ Eab, and Ea|b ⊆ Eba.1 Thesenotions of composition and union of two graphs can be extendedto sequences of an arbitrary number of graphs in an obvious way.When the union of some graphs is connected, we say that thesegraphs are jointly connected, see [13]:

Definition 2 (Jointly Connected Graphs). A sequence of graphs{Gk(V, Ek), k = 1, . . . , q} is said to be jointly connected, if the uniongraph G1 ∪ G2 ∪ · · · ∪ Gq is connected.

Let now A ∈ M be compatible with Ga, and B ∈ M becompatible with Gb. Then, it is a matter of simple matrix algebrato verify that the product AB is compatible with the compositiongraph Gab (to this end, just notice that [AB]ij =

∑nk=1 AikBkj > 0 if

and only if Aik > 0 and Bkj > 0 for some k). Similarly, one can showthat the sum A + B is compatible with the union graph Ga|b. Fromthis reasoning we obtain the following result.

1 To see why this is true, consider (i, j) ∈ Ea|b , which means that (i, j) is either inEa or in Eb , and suppose without loss of generality (i, j) ∈ Ea . Then choosing k = jwe have that (i, k) ∈ Ea and (k, j) ∈ Eb , which means indeed that (i, j) ∈ Eab . Thesecond inclusion follows from an analogous reasoning, by taking k = i.

Lemma 1. Let {Gk(V, Ek), k = 1, . . . , q} be a sequence of graphs,and let {Ak ∈ M, k = 1, . . . , q} be any sequence of matrices suchthat Ak is compatible with Gk, for k = 1, . . . , q. Then:

• The product A1A2 · · · Aq is primitive if and only if the compositiongraph G1 ◦ G2 ◦ · · · ◦ Gq is connected;• The sum α1A1+α2A2+· · ·+αqAq, αi > 0, is primitive if and onlyif the union graph G1 ∪ G2 ∪ · · · ∪ Gq is connected.

As we have observed previously, the edge set of the union of asequence of graphs is a subset of the edge set of the compositionof any permutation of the graph sequence. This means in turnthat if the union graph G1 ∪ G2 ∪ · · · ∪ Gq is connected, thenany composition Gp1 ◦ Gp2 ◦ · · · ◦ Gpq , where {p1, . . . , pq} is apermutation of {1, . . . , q}, is also connected. We thus obtain thefollowing corollary of Lemma 1.

Corollary 1. Let {Gk(V, Ek), k = 1, . . . , q} be a sequence of graphs,and let {Ak ∈ M, k = 1, . . . , q} be any sequence of matrices suchthat Ak is compatible with Gk, for k = 1, . . . , q.If the union graph G1 ∪ G2 ∪ · · · ∪ Gq is connected then the

product Ap1Ap2 · · · Apq is primitive, for any permutation {p1, . . . , pq}of {1, . . . , q}.

2.1. Stochastic non-negative matrices

Consider a subset ofM composed of matrices in which the sumover each row is equal to one (such matrices are usually called(row) stochastic):

Ms.= {A ∈M : A1 = 1}.

The setMs is convex and closed under multiplication. For A ∈ Mswe have that 1 is an eigenvalue of A. Observe that the spectralradius of a matrix is no larger than any norm of the matrix (seeTheorem 5.6.9 of [12]), hence by taking the `∞-induced norm wehave that for A ∈Ms it holds that

ρ(A) ≤ ‖A‖∞ = maxi=1,...,n

n∑j=1

|aij| = 1.

Since 1 is an eigenvalue of A it therefore must be ρ(A) = 1. It alsofollows that if A ∈ Ms is primitive then A has a unique (i.e. analgebraically simple) eigenvalue at 1, hence all other eigenvalueshave modulus strictly smaller than 1 and the fixed-point subspace

I(A) .= {x ∈ Rn : Ax = x}

is one dimensional. The following result holds (see Theorem 8.5.1of [12]).

Theorem 2. Let A ∈Ms and let λ1(A), λ2(A), . . . , λn(A) denote theeigenvalues of A ordered with non-increasing modulus, with λ1(A) =ρ(A) = 1. If A is primitive, then |λi(A)| < 1 for i = 2, . . . , n.Moreover

I(A) = span{1},

and

limk→∞

Ak =1n1v> > 0,

where v > 0 is a left eigenvector of A associated with λ1(A) = 1:v>A = v>, and 1>v = n. (In the particular case when also the sumover each column of A is one (A is doubly stochastic), we simply havethat v = 1).

Page 4: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212 205

2.2. Symmetric paracontractions

A matrix A ∈ Rn,n is called paracontractive (see [10]) withrespect to the Euclidean norm if

‖Ax‖ < ‖x‖, ∀x 6= 0, x 6∈ I(A),

that is, a paracontractive matrix is contractive for all non-nullvectors lying outside its fixed-point subspace. It is easy to verifythat a symmetric matrix is paracontractive if and only if itsspectrum belongs to the semi-open interval (−1, 1]. Now considerthe subset ofMs formed by symmetric matrices:

Mss.= {A ∈Ms : A = A>}

(amatrix A ∈Mss is symmetric, non-negativewith strictly positivediagonal elements and doubly stochastic, that is A1 = 1, 1>A =1>). For A ∈ Mss we clearly have that ρ(A) ≡ ‖A‖ = 1. Moreover,the following fact holds, see the Appendix for a proof, and seealso [21].

Lemma 2. If A ∈ Mss then σ(A) ⊂ (−1, 1], hence A is para-contractive.

We are interested in conditions under which a sequence ofmatrices from Mss forms products that are primitive andparacontractive, i.e. that are contractive for all vectors not lying inspan{1}. We preliminarily recall from Theorem 2 that if A ∈ Msis primitive then ρ(A) = 1 is a simple eigenvalue of A and thecorresponding eigenspace I(A) is one dimensional, that is

I(A) = span{1}, for A ∈Ms primitive. (3)

We are now in position to state the following result, whichspecializes Corollary 1 to the case of products of symmetricstochastic matrices. A proof of this result is given in the Appendix.

Corollary 2. Let {Ak ∈Mss, k = 1, . . . , q} be a sequence of matricesand {Gk(V, Ek), k = 1, . . . , q} a sequence of graphs such that Ak iscompatible with Gk, for k = 1, . . . , q.If the union graph G1∪G2∪· · ·∪Gq is connected then the product

Ap1Ap2 · · · Apq is primitive and paracontractive, for any permutation{p1, . . . , pq} of {1, . . . , q}.

Consider now any q matrices Ak ∈ Mss, k = 1, . . . , q. Sinceeach Ak ∈ Mss is symmetric, it is unitarily diagonalizable, that isit admits a set of orthogonal eigenvectors. Moreover, vector 1/

√n

is always an eigenvector of Ak associated with the largest-moduluseigenvalue λ1(Ak) = 1, and we may write Ak ∈Mss in the form

Ak =1n11> + Zk, Zk = VkDkV>k ,

where Vk ∈ Rn,n−1 is such that

V>k Vk = In−1, VkV>k = In −1n11>, 1>Vk = 0,

and Dk = diag(λ2(Ak), . . . , λn(Ak)) ∈ Rn−1,n−1 is a diagonalmatrix containing the last n−1 eigenvalues of Ak arranged in orderof non-increasing modulus. Since 11> is orthogonal to all Zk’s, wecan write the product (A1 · · · Aq) in the form

(A1 · · · Aq) =1n11> + (Z1 · · · Zq). (4)

The following lemma holds, see the Appendix for a proof.

Lemma 3. Let Ak ∈ Mss, k = 1, . . . , q. If the product (A1 · · · Aq) isprimitive and paracontractive, then∥∥∥∥(A1 · · · Aq)− 1n11>

∥∥∥∥ < 1.

3. Distributed estimation of expectation

Let x ∈ [0, 1] be a scalar random variable whose expected valueE {x} = µ is unknown. The objective is to estimate µ by meansof a decentralized randomized algorithm that uses independentand identically distributed (iid) random samples of x (also calledexperiments) generated by n distributed computing units (nodes).More precisely, we assume that each node i, i = 1, . . . , n, hasgenerated previous to ‘‘time zero’’ a batch2 of N random samplesx(s)i , s = 1, . . . ,N , of x. Each node is hence able to construct at timek = 0, independently of all other nodes, an empirical estimate ofthe expectation:

xi(0) =1N

N∑s=1

x(s)i , i = 1, . . . , n.

We denote with x(0) .= [x1(0) · · · xn(0)]> the vector of initial

estimates available at the nodes.

Remark 1. Notice that the initial estimates xi(0) are themselvesrandom variables, since they depend on the random experimentsperformed at the nodes. It is of course well known that each localestimate xi(0) is an unbiased estimator of µ, that is E

{xi(0)

}= µ,

and, denoting with σ 2 = var {x} the (unknown) variance of x, thevariance of the local estimates decreases linearly with N

var{xi(0)

}=σ 2

N, i = 1, . . . , n. (5)

Moreover, the strong law of large numbers ensures that xi(0)converges almost surely to µ, as N → ∞. Further, Hoeffding’sinequality can be applied locally at nodes in order to provide ana priori assessment of the quality of the local estimates:

Prob{|xi(k)− µ| ≥ ε} ≤ 2e−2N ε2, ε ∈ (0, 1).

This inequality is well known and commonly used for determiningan a priori sample-complexity bound for randomized estimationalgorithms, see, e.g., [18]. That is, fixing accuracy level ε ∈ (0, 1),and algorithm failure probability level β ∈ (0, 1), the consideredrandomized estimation scheme requires integer N (number ofsamples in a local batch) to be

N ≥1ε2· ln2β

in order to guarantee that

Prob{|xi(k)− µ| ≥ ε} ≤ β. ?

Note that in the situation considered so far each node actsindependently of other nodes, and no mutual information isexploited in order to improve the reliability of the local estimates.We now introduce a distributed scheme according to whichnodes can collaborate by iteratively exchanging information withtheir instantaneous neighbors. Representing the communicationstructure of the network at time k = 0, 1, . . . by means of anundirected graph G(k) = (V, E(k)), where V = {1, . . . , n} is theset of nodes, the proposed schemeworks as follows: at time k eachnode i contacts its neighbors (wedenotewithNi(k) the set of nodesthat are neighbors of node i at time k, including node i itself, thatis Ni(k)

.= {j ∈ V : (i, j) ∈ E(k)}), and updates its local estimate

2 To avoid clutter in the notation we assume that every node generates the samenumber N of samples. This assumption is not critical and can be easily released.

Page 5: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

206 G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212

with a weighted average of the local estimates of the neighbors. Informulae, we have

xi(k+ 1) = Wii(k)xi(k)+∑

j∈Ni(k)\i

Wij(k)xj(k), k = 0, 1, . . . , (6)

where Wij(k) are the averaging weights, which satisfy theconditions Wij(k) ≥ 0,

∑nj=1Wij(k) = 1, and Wij(k) > 0 if and

only if (i, j) ∈ E(k). Collecting weights in an n×nmatrixW (k), werewrite Eq. (6) in the form of the vector recursion

x(k+ 1) = W (k)x(k), k = 0, 1, . . . , (7)

whence we obtain an explicit expression for the vector of localestimates at generic time k:

x(k) = Φ(k)x(0), k = 1, 2, . . . , (8)

where we defined the transition matrix

Φ(k) .= W (k− 1) · · ·W (0), k = 1, 2, . . . . (9)

Remark 2 (Choice of Weights). Note that, in order to beingable to update its estimate, node i needs to know its currentneighbors’ weights Wij(k). These coefficients may be imposeda priori on the nodes (but this would require a centralized apriori knowledge of the network topology) or, more practically,negotiated autonomously on-line by the nodes. A standard andeffective rule for building the weights autonomously is the so-called Metropolis rule, see [19,21], which prescribes weights asfollows:

Wij(k) =

1

max(|Ni(k)|, |Nj(k)|)if (i, j) ∈ E(k), i 6= j

1−∑

j∈Ni(k)\i

Wij(k) if i = j

0 otherwise,

(10)

where |Ni(k)| denotes the cardinality of Ni(k). An alternativeand simpler rule amounts to updating the estimate by taking thearithmetic average of the neighbors’ estimates, which amounts toset weights as follows:

Wij(k) =

1

|Ni(k)|if (i, j) ∈ E(k),

0 otherwise.(11)

Both the Metropolis and the arithmetic mean weights are wellsuited for distributed implementation, since each node at timek just needs to know the number of its neighbors and exchangeinformation with them. With these weights, rule (6) is based onlocal-only information, which means that each node can executethe update without need of knowing the global structure of thenetwork, or even the number n of nodes composing the wholenetwork. Notice also that the arithmetic mean and Metropolisweights coincide for regular graphs, that is for graphs in which allnodes have the same number of incident edges.Finally, we remark that theseW (k)matrices belong to the class

Mss of non-negative, symmetric and doubly stochastic matrices,and that each W (k) is compatible with the corresponding graphG(k). ?

Looking again at Eq. (7), we see that nodes start at k = 0 withthe estimates initially obtained from the local experiments. Then,for k > 0 they iteratively exchange information, and one mayhope that for k → ∞ each local estimate converges to the ‘‘full-information’’ estimate, that is the estimate each node would build

if the information at all other nodes would be available to it. Thisideal full-information estimate is given by

xfull =1n11>x(0) =

1n1n∑j=1

xj(0) =1Nn

1n∑j=1

N∑s=1

xsj . (12)

The covariance matrix for this full-information estimate is

var{xfull

}= E

{(1n11>x(0)− 1µ

)(1n11>x(0)− 1µ

)>}

= E{1n11>(x(0)− 1µ)(x(0)− 1µ)>

1n11>

}=σ 2

Nn11>. (13)

In the next section, we show that convergence of the localestimates to xfull indeed happens under rather mild hypotheses,we provide an explicit bound on the convergence rate, and westate our main result related to an Hoeffding-type bound on thedistributed estimation Algorithm (6). To fix ideas, we shall assumein the rest of this paper that the weights are set according tothe Metropolis rule (10), but all results hold unchanged for thearithmetic mean rule (11), and actually for any rule that providesweight matrices belonging to Mss that are compatible with thecorresponding graphs.

3.1. Analysis of the distributed RA for expectation estimation

The analysis of convergence of the distributed iterations (7) isbased on fairly standard average consensus results, see for instance[13,15,21]. In particular, the approach in this section is similarto the one in [21], although we soon depart from it and developnew explicit results for the rate of convergence andHoeffding-typebounds.Let W (k) ∈ Mss be a symmetric stochastic matrix compatible

with an undirected graphG(k), and notice thatΦ(k) in (9) is doublystochastic, hence

E{x(k)

}= Φ(k)E

{x(0)

}= µΦ(k)1 = µ1,

which means that, for all k, the local estimate xi(k) is an unbiasedestimator of µ. The covariance of x(k) is given by

var{x(k)

}= E

{(x(k)− µ1)(x(k)− µ1)>

}= E

{(Φ(k)x(0)− µ1)(Φ(k)x(0)− µ1)>

}= Φ(k)E

{(x(0)− µ1)(x(0)− µ1)>

}Φ>(k)

= Φ(k)var{x(0)

}Φ>(k)

=σ 2

NΦ(k)Φ>(k), (14)

where the last statement follows from (5) and the fact that theelements of x(0) are independent and identically distributed. Weintroduce the following definition.

Definition 3 (Interval of Joint Connectivity). Let T be a finitepositive integer. An ordered infinite sequence of graphs {G(t) =(V, E(t)), t = 0, 1, . . .} has interval of joint connectivity T , if everysubsequence of length T is jointly connected.

Notice that the set of possible graphs that may occur on a graphwith n vertices is of finite cardinality. We denote with G the finiteset of all graphs G(k) that are allowed to occur, and with W thecorresponding finite set of Metropolis weight matricesW (k), withw = |W| denoting the cardinality ofW. We next define the indexof joint contractivity for the setW.

Page 6: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212 207

Definition 4 (Index of Joint Contractivity). Consider a finite setW ={Wi ∈Mss, i = 1, . . . , w}. Let d be a finite and positive integer, andletWd denote the set of all products of at most d elements fromW.Clearly,Wd is finite, andwedefine the d-index of joint contractivityofW as

γ = maxP∈Wd

{∥∥∥∥P − 1n11>∥∥∥∥ : P is primitive and paracontractive} .

The following theorem holds.

Theorem 3. Let an infinite sequence of undirected graphs {G(k) ∈G, k = 0, 1, . . .} have interval of joint connectivity T ≥ 1, andlet {W (k) ∈ W, k = 0, 1, . . .} be the corresponding sequence ofMetropolis weights matrices. Consider the local randomized estimatesof the expectation computed according to (7) and (8). Then, for i =1, . . . , n, and k = 1, 2, . . ., it holds that

|xi(k)− xfulli | ≤√nγ bk/Tc, (15)∣∣var {xi(k)}− var {xfulli }∣∣ ≤ σ 2

Nγ 2bk/Tc, (16)

where xfulli is the virtual ‘‘full-information’’ estimate in (12), and γ <1 is the T-index of joint contractivity of W.

Proof. Consider the expression for x(k) in (8), withΦ(k) = W (k−1) · · ·W (0), where W (0), . . ., W (k − 1) are Metropolis matricescorresponding to the occurring graphs S(k) = {G(0), . . . ,G(k −1)}. Since the interval of joint connectivity is T , we have that for k ≥T the graph sequence S(k) can be subdivided into q(k) = bk/Tcjointly connected subsequences Si(k) of length T , plus a possibleresidual subsequence of length r(k), being r(k) the remainder inthe division k/T . The productΦ(k) = W (k− 1) · · ·W (0) can thusbe split accordingly into q(k)+ 1 subproducts

Φ(k) = Φ0(k)Φ1(k) · · ·Φq(k)(k)

where Φi(k), i = 1, . . . , q(k), is the subproduct of matricescorresponding to the graphs in the Si(k)th subsequence, andΦ0(k)is the subproduct of the r(k)matrices in the residual subsequence.Since Si(k) is jointly connected, fromCorollary 2we have thatΦi(k)is primitive and paracontractive, for i = 1, . . . , q(k). Moreover,Φi(k) has length T , hence

‖Φi(k)− 11>/n‖ ≤ γ , i = 1, . . . , q(k),

where γ < 1 is the T -index of joint contractivity in Definition 4.For the residual productΦ0(k), instead, we use the fact that it is theproduct of symmetric paracontractive matrices, hence, from thesub-multiplicativity of the spectral norm, we have that ‖Φ0(k) −11>/n‖ ≤ 1. Define now

Zi(k).= Φi(k)− 11>/n, i = 0, . . . , q(k),

Z(k) .= Z0(k)Z1(k) · · · Zq(k)(k),

and note that as a consequence of the fact that Φi(k) is doublystochastic it follows that

Z(k) = Φ(k)− 11>/n,

whence

‖Φ(k)− 11>/n‖ = ‖Z(k)‖ ≤ ‖Z0(k)‖ · ‖Z1(k)‖ · · · ‖Zq(k)(k)‖

≤ ‖Z1(k)‖ · · · ‖Zq(k)(k)‖ ≤ γ q(k). (17)

Note now that 1n11>x(0) = xfull, andΦ(k)x(0) = x(k), therefore

‖x(k)− xfull‖ = ‖(Φ(k)− 11>/n)x(0)‖≤ ‖Φ(k)− 11>/n‖ · ‖x(0)‖≤ γ q(k)‖x(0)‖ ≤ γ q(k)

√n‖x(0)‖∞

≤√nγ q(k),

where the last two inequalities follow from the Euclidean/infinitynorm inequality and from the fact that each entry of x(0) is thearithmetic mean of variables that are bounded in [0, 1]. Further,since ‖ · ‖ ≥ ‖ · ‖∞, from the previous inequality it follows that

|xi(k)− xfulli | ≤√nγ q(k),

which proves (15). Note now that

Φ(k)Φ>(k) = Z(k)Z>(k)+1n11>,

hence it follows from (14) that

var{x(k)

}=σ 2

NZ(k)Z>(k)+

σ 2

nN11>

=σ 2

NZ(k)Z>(k)+ var

{xfull

},

where the last equality is due to (13), therefore

var{x(k)

}− var

{xfull

}=σ 2

NZ(k)Z>(k),

whereby

‖var{x(k)

}− var

{xfull

}‖ =

σ 2

N‖Z(k)Z>(k)‖ ≤

σ 2

N‖Z(k)‖2

≤σ 2

Nγ 2q(k),

where the last inequality follows from (17). Since each entry ofa matrix is no larger than the spectral norm of the matrix, (16)immediately follows from the previous inequality. �

We are now ready to state an Hoeffding-type result for distributedestimation.

Corollary 3. Let an infinite sequence of undirected graphs {G(k) ∈G, k = 0, 1, . . .} have interval of joint connectivity T ≥ 1, let{W (k) ∈ W, k = 0, 1, . . .} be the corresponding sequence ofMetropolis weights matrices, and let γ < 1 be the T-index of jointcontractivity of W. Consider the local randomized estimates of theexpectation computed according to (7) and (8). Given ε ∈ (0, 1),β ∈ (0, 1), if integers k, n,N are such that

bk/Tc >ln√n/ε

ln 1/γ, Nn ≥

1(ε − γ k

√n)2ln2β,

then it holds that

Prob{|xi(k)− µ| ≥ ε} ≤ β. (18)

Proof. Consider the following chain of relations:

|xi(k)− µ| = |(xi(k)− xfulli )+ (xfulli − µ)|

≤ |xi(k)− xfulli | + |xfulli − µ|

≤√nγ bk/Tc + |xfulli − µ|, [due to (15)].

Since

{|xi(k)− µ| ≥ ε} ⇒ {|xfulli − µ| ≥ ε −√nγ bk/Tc},

it follows that

Prob{|xi(k)− µ| ≥ ε} ≤ Prob{|xfulli − µ| ≥ ε(k)}, (19)

where we defined

ε(k) .= ε −√nγ bk/Tc.

Note now that

bk/Tc >ln√n/ε

ln 1/γ⇒ ε(k) > 0,

Page 7: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

208 G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212

hence for β ∈ (0, 1), Hoeffding’s inequality applied on the randomvariable xfulli guarantees that if integers N, n are such that

Nn ≥ ln2β

1ε2(k)

,

then

Prob{|xfulli − µ| ≥ ε(k)} ≤ β,

which in turn, by (19), implies that

Prob{|xi(k)− µ| ≥ ε} ≤ β,

thus concluding the proof. �

3.1.1. The time-invariant caseThe results in the previous section allow the communication

pattern among nodes to vary with time. These results can ofcourse be easily specialized to the case of constant communicationpattern. In this case, let G denote the (constant) communicationgraph, and letW be the corresponding Metropolis weights matrix.In order to apply Theorem 3, we need G to be connected, in whichcaseW is primitive and paracontractive, and the index γ is simplygiven by the second-largest-modulus of the eigenvalues ofW . Theresults of Theorem 3 and Corollary 3 can thus be applied simplyby setting T = 1, and γ = |λ2(W )|.

Remark 3 (Design of Network Structure and Weight Matrix). Itis apparent from the previous results that the performance ofthe distributed algorithm depends critically on the value of the‘‘mixing rate’’ parameter γ < 1 (the smaller the better) which, inturn, depends on the structure of the network and on the choice oftheweightmatrixW . Although thiswork focuses on the analysis ofthe distributed estimation algorithmwith given network structureand weight matrices, a few remarks are in order with respect tonetwork and weights design issues.A first quite obvious observation is that the best possible

processors network would be one corresponding to a completegraph, that is to a graph where each node is a neighbor of anyother node. In such a case, it holds that |Ni| = n for alli, hence convergence to the full-information estimate happensinstantaneously in one single step.More generally, if completenesscannot be achieved, a sensible rule of thumb would be to designthe network so to maximize the number of incident edges at eachnode.Once the network structure has been fixed, a second issue

arises related to the choice of the weight matrix W . In principle,this matrix could be determined in an optimal way, that is onemay choose W so to minimize γ over all matrices compatiblewith the graph structure. Moreover, this turns out to be a convexoptimization problem,which can be solved efficiently for networksof moderate size, see [19]. However, this approach would requirethe presence of some ‘‘central authority’’, having exact knowledgeof the whole network structure, which computes the optimalweights and transmits these weights to all nodes. The heuristicsbased on Metropolis or local degree weights, instead, can beautonomously negotiated by each local node and usually providegood, albeit possibly not optimal, performance. ?

4. Distributed estimation of extrema

In this section we study a distributed version of a randomizedalgorithm for estimating the ‘‘probable extrema’’ (maximum andminimum) of a random variable x. We shall consider without lossof generality the estimation of the probable maximum of x, thecase of theminimumbeing equivalent to estimating themaximumof−x.

Suppose n computing units (nodes) are available, and thateach computing unit i generates N iid random samples x(s)i , s =1, . . . ,N , of x and computes at some ‘‘time zero’’ the empiricalmaximum

xi(0) = maxs=1,...,N

x(s)i , i = 1, . . . , n.

We denote with x(0) .= [x1(0) · · · xn(0)]> the vector of initial

estimates available at the nodes. Note that each local estimate ata node is itself a random variable that depends on the randommulti-sample extraction at that node. It is well known that if Nis ‘‘sufficiently large’’, each local estimate xi(0) is a good estimateof the probable maximum of x, in the sense that the probabilitymeasure of those x for which {x > xi(0)} is small, see [4,14,17].More precisely, using the result in [4, Proposition 4], we have that

Prob{x > xi(0)} ≤ β(N), β(N) =1

N + 1, (20)

which shows that the probability β(N) of x exceeding theestimated level xi(0), goes to zero as N grows. Note however thatin the current situation each node is independent of other nodesand no advantage is taken from collaboration among nodes. Wenext provide a result that shows analytically how the probabilityβ can be decreased by using a distributed algorithm that exploitscommunication among nodes. To this end, we use again the graphformalism introduced in the previous section, and consider thateach node i maintains locally the following information: (a) thecurrent estimate xi(k) of the maximum at iteration k, and (b) a list`i(k) containing the indices of the nodes that have influenced thelocal estimate up to iteration k. These local information are updatedaccording to the following distributed algorithm.

Algorithm 1 (Distributed Estimation of Max).

- Each node i = 1, . . . , n, initializes to xi(0) = maxs=1,...,N x(s)i ,

and `i(0) = {i};- For k = 0, 1, . . ., each node updates its local estimate accordingto

xi(k+ 1) = maxj∈Ni(k)

xj(k), i = 1, . . . , n; (21)

and its local node list according to

`i(k+ 1) =⋃j∈Ni(k)

`j(k), i = 1, . . . , n, (22)

where we understand that lists `i(k) contain unique (non-repeated) elements.

Notice that, according to this algorithm, the local estimate xi(k)is actually the maximum over all samples collected by the nodespresent in list `i(k), hence the following proposition can be readilyestablished.

Proposition 1. Let local estimates of maximum at nodes i =1, . . . , n, be computed according to Algorithm 1, then it holds that

Prob{x > xi(k)} ≤ β(N|`i(k)|), k = 0, 1 . . . ;

β(N|`i(k)|) =1

N|`i(k)| + 1,

where |`i(k)| denotes the cardinality of `i(k).

As it is obvious, since |`i(k)| ≥ 1, communication improves thelocal estimates with respect to the ‘‘no communication’’ situationin (20), and the extent to which the estimate is improved dependson the cardinality of `i(k), which in turn depends on the structureof the occurring communication graphs. Let us recall that a graphG = (V, E) is said to be complete if each vertex is a neighbor ofany other vertex, that is if Ni = V for all i. The following theoremholds.

Page 8: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212 209

Theorem 4. Let {G(k) ∈ G, k = 0, 1, . . .} be the sequence ofoccurring communication graphs. If there exists an integer k ≥ 1 such

that the composition graph◦

G (k) .= G(k − 1) ◦ · · · ◦ G(1) ◦ G(0) iscomplete, then it holds for the maximum estimates (21) that

Prob{x > xi(k)} ≤ β(Nn), k ≥ k; i = 1, . . . , n;

β(Nn) =1

Nn+ 1. (23)

Proof. Let xmax = maxi=1,...,n xi(0), and suppose without loss ofgenerality that the maximum is attained at the first node, that is

xmax = x1(0). Let i ∈ V be any node, and observe that◦

G (k)being complete implies in particular that it possesses edge (i, 1).By definition of graph composition this means in turn that thereexist v1 ∈ V, . . . , vk−1 ∈ V such that

(v1, 1) ∈ E(0), (v2, v1) ∈ E(1), . . . , (i, vk−1) ∈ E(k− 1).

Since 1 ∈ Nv1(0) (i.e., node v1 receives information from node 1at time zero), at k = 1 in Algorithm 1 the maximum value xmaxis diffused from node 1 to node v1. At k = 2, since v1 ∈ Nv2(1),value xmax propagates from v1 to v2. Proceeding in this way, at timek = k value xmax has reached node i. Since the previous reasoningholds for any i ∈ V , we have that xi(k) = xmax for all nodesi ∈ V . Due to (21), the node values then remain unchanged forall k ≥ k. Finally, observe that xmax is the empirical maximumcomputed over Nn independent samples, hence from [4, Prop. 4]we have that Prob{x > xmax} ≤ 1/(Nn + 1), which concludes theproof. �

Notice that, under the hypotheses of Theorem 4, all nodesconverge to the centralized maximum estimate (and thus all theinformation available at nodes is exploited) in a finite number k ofiterations. This number depends on the structure and order of theoccurring sequence of time-varying communication graphs. Weremark that, since the edge set of the union of a sequence of graphsis a subset of the edge set of the composition of the same graphs,a sufficient condition for the hypothesis of Theorem 4 to hold isthat the union graph G(0) ∪ G(1) ∪ · · · ∪ G(k− 1) is complete forsome k ≥ 1. Also, the value of k can be easily determined in severalspecial cases. Two such examples are given next.

Corollary 4. Let the sequence graphs {G(k) ∈ G, k = 0, 1, . . .} besuch that G(k) is connected for all k = 0, 1, . . . . Then, the conclusionsof Theorem 4 hold for k = n− 1.

The result in Corollary 4 follows immediately from Theorem 4,using a result stated in [5, Proposition 4]. A stronger bound on kcan be obtained in the particular case of constant communicationgraph, as stated in the following corollary.

Corollary 5. Consider a constant graph sequence {G(k) ≡ G(V, E),k = 0, 1, . . .}. If G is connected, then the conclusions of Theorem 4hold for k = φ(G), where φ(G) is the diameter of G, that is themaximum length of the shortest path connecting any two nodes of G.

This result is readily proved by observing that the composition ofφ(G) graphsG◦G◦· · ·◦G is complete, and then applying Theorem4.

5. Numerical examples on a ring computing architecture

5.1. Probability estimation on a time-varying ring

To exemplify our results we next consider the case of acommunication network with ring structure and time-varyingedge topology. Specifically, we consider n nodes and suppose thatcommunication happens cyclically in time between successive

Fig. 1. A ring network.

Fig. 2. T -index of joint contractivity for the time-varying ring network, as afunction of n.

node pairs, that is we assume that at each instant k =

0, 1, . . . , only the two nodes with indices s1(k), s2(k) are able tocommunicate bidirectionally, where

s1(k) = [kmod n] + 1s2(k) = [(k+ 1)mod n] + 1.

Fig. 1 shows such a time-varying graph topology. In this situation,the Metropolis weight matrix associated to the graph at time k iswritten as

W (k) = I +12es1(k)e

>

s2(k) +12es2(k)e

>

s1(k)

−12es1(k)e

>

s1(k) −12es2(k)e

>

s2(k),

where ei ∈ Rn is a vector having all entries equal to zero except forthe ith position, which is equal to one.It is immediate to verify that the considered sequence of graphs

has least interval of joint connectivity T = n− 1. The product of Telements P = W (n− 2) · · ·W (0) has the following structure

P =

1/2 1/2 0 0 · · · 01/22 1/22 1/2 0 · · · 01/23 1/23 1/22 1/2 · · · 0...

......

.... . . 0

1/2n−1 1/2n−1 1/2n−2 · · · 1/22 1/21/2n−1 1/2n−1 1/2n−2 · · · 1/22 1/2

,

and it can be verified that cyclic permutations of the factors in theproduct P do not change the value of ‖P − 11>/n‖, hence the T -index of joint contractivity is simply given by this norm, which canbe easily evaluated numerically. The T -index of joint contractivityis shown in Fig. 2 for integer values of n in the range {3, . . . , 50}.

Page 9: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

210 G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212

Fig. 3. First 250 iterations of distributed algorithm for estimating the probabilityin (24).

As a numerical example, consider a ring composed of n = 10nodes that cooperate in order to estimate the following probabilityof stability for an uncertain matrix:

p = Prob{∆ : A+∆ is Hurwitz}, (24)

where ∆ is an uncertainty matrix with independent Gaussianentries with zero mean and variance equal to 0.1, and A is thefollowing Hurwitz stable matrix

A =

−2 −1 0 0 −1 0−1 −2 0 −1 −1 −20 0 −1 0 0 −10 −1 −1 −2 0 −1−1 −2 0 0 −3 00 −1 0 −2 0 −3

.For n = 10 we have T = n − 1 = 9, with correspondingindex of joint contractivity γ = 0.9131. For accuracy level ε =0.01, Corollary 3 would require bk/Tc ≥ (ln

√n/ε)/(ln 1/γ ) =

63.35, which is satisfied for k ≥ 576. Also, fixing the failureprobability of the randomized estimation algorithm to β =

10−5, we have that each node must execute N ≥ 1/(n(ε −γ 576√n)2) ln 2/β ' 1.2206 × 104 local experiments in order for

(18) to hold. Fig. 3 displays the first 250 iterations of a run of thedistributed estimation algorithm: initial values at k = 0 are thelocal estimates at the individual nodes, each made on the basis ofN experiments. As k increases, all local estimates converge to thecommon value of p = 0.8732.We remark that the considered ring structure is here used only

for the purpose of showing a numerical example. In particular, weare in noway suggesting that such a structure is a good architecturefor distributed computing. Indeed, one may observe from Fig. 2that the index of joint contractivity tends to one as n grows,which reflects the fact that the ‘‘time’’ required for informationto travel across the network increases as the number of nodesincreases. This behavior appears to be common to graphs in whichthe number of neighbors of each node (i.e., the degree) remainsconstant as n grows, see for instance Section 4.2 in [6] for furtherdiscussion on this topic.

5.2. Estimating the maximum abscissa of stability

We next considered the problem of estimating the maximumabscissa of stability for the uncertain matrix A + ∆, whereA ∈ Rν,ν is a nominal Hurwitz matrix (here fixed to the samenumerical value as in the previous example), and ∆ is a randommatrix perturbation, such that each entry of ∆ is independentlydistributed uniformly in the interval [−0.01, 0.01]. The abscissa ofstability is defined here as follows:

α = α(A+∆) .= maxi=1,...,ν

Re(λi(A+∆)),

Fig. 4. A run of distributed algorithm for estimating max of α(A+∆).

where λi(·) denotes the ith eigenvalue of its matrix argument.Clearly,α is a random variable, andwewant to estimate a probablemaximum of this variable with respect to∆, using the distributedalgorithm for estimation of extrema introduced in Section 4.It can be observed by direct inspection that, using the time-

varying ring architecture of the previous example, the compositiongraph is certainly complete after k = 2n−3 iterations.3 For n = 10nodes, fixing the probability of the unfavorable event in Eq. (23) toβ = 10−5, Theorem 4 requires N ≥ (1 − β)/(nβ) = 9999.9,thus we set N = 104. The estimated maximum satisfying (23)will then available at all local nodes in at most k = 17 iterationsof Algorithm 1. Fig. 4 shows a run of the distributed estimationalgorithm, where initial values at k = 0 are the local maximumestimates at the individual nodes, each made on the basis of Nexperiments. As k increases, all local estimates converge in finitetime to the common value of αmax = −0.0677.

5.3. A time-invariant case

As an additional example, we considered a constant graphtopology G(V, E) where nodes V = {1, . . . , n} are disposed ina symmetric ring with (undirected) edges (1, 2), (2, 3), . . . , (n −1, n), (n, 1). In this situation, each node has three neighbors(including itself), and the Metropolis weights matrix is

W (k) = W =13

1 1 0 · · · 0 11 1 1 0 · · · 00 1 1 1 · · · 0...

...... · · ·

......

1 0 · · · 0 1 1

, ∀k.Note that W is a symmetric circulant matrix (each row is a cyclicshift of the row above it), and denote with w0 = 1/3, w1 = 1/3,w2 = 0, . . . , wn−2 = 0, wn−1 = 1/3 the coefficients of thefirst row of W . From a standard result on circulant matrices (see,e.g., [7]) we have that the eigenvalues ofW are the discrete Fouriertransform of the sequencewk, hence

λm(W ) =n−1∑k=0

wke−2π j(m−1)k/n

=13(1+ e−2π j(m−1)/n + e2π j(m−1)/n)

=13(1+ 2 cos(2π(m− 1)/n)), m = 1, . . . , n,

3 The worst-case situation happens when the maximum of the initial localestimate resides in node n − 1. In this case, it takes n − 1 iterations to transferthis value to node n− 2, another iteration to copy it to node 1, and then other n− 3iterations to diffuse it to all remaining nodes from 2 to n− 3, thus a total of 2n− 3iterations in the worst case.

Page 10: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212 211

Fig. 5. First 60 iterations of distributed algorithm for estimating the probability in(24) on a constant network with connected and undirected ring structure.

thus λ1(W ) = 1, λ2(W ) = 13 (1 + 2 cos(2π/n)), . . . , λn(W ) =

13 (1 + 2 cos(2(n − 1)π/n)). From the discussion in Section 3.1.1it follows that the results of Theorem 3 and Corollary 3 can beapplied with T = 1 and γ equal to the second-largest modulusof the eigenvalues ofW , that is

γ =13(1+ 2 cos(2π/n)).

Considering the numerical example in Section 5.1 with n = 10nodes, we have γ = 0.8727. Fixing accuracy level ε = 0.01,distributed estimation of the probability in (24) requires k ≥ 43iterations, according to Corollary 3. Then, setting for instance k =100, and failure probability β = 10−5, each node must executeN ≥ 1/(n(ε − γ 100

√n)2) ln 2/β ' 12 216 local experiments

in order for (18) to hold. Fig. 5 displays a run of the distributedestimation algorithm on the considered graph.

6. Conclusions

In this paper we studied two distributed randomized algo-rithms for probabilistic performance analysis of uncertain systems.A first algorithm is introduced in Section 3 for estimating the ex-pectation or the shortfall probability of a normalized performanceindex, and a second one is proposed in Section 4 for estimatingthe probable maximum level achieved by the index. For the firstalgorithm, we derived in Theorem 3 an explicit result on the geo-metric convergence of the local estimates to the full-informationestimate, and an Hoeffding-type concentration result in Corol-lary 3. For the second algorithm, we showed in Theorem 4 thatnodesmay achieve agreement on a commonprobablemaximum infinite time. Both the discussed algorithms appear to be effective indistributing the computational load of randomized algorithms on apossibly large set of independent computing nodes. The nodes per-form local-only computations and need no knowledge of thewholenetwork structure. The algorithms work reliably in the presence oftime-varying network conditions, as long as rather mild assump-tions on joint connectivity of the occurring communication graphsare satisfied. A few numerical examples illustrate the results onnetworks with ring structure.

Appendix

Proof of Lemma 2. By the Perron–Frobenius theorem, all eigen-values of a matrix A ∈ Mss must have modulus no larger thanthe spectral radius ρ(A) = 1, hence they must all belong to theinterval [−1, 1]. We further show that −1 cannot actually be aneigenvalue of A, thus all eigenvalues must lie in (−1, 1], which isa necessary and sufficient condition for a symmetric matrix to beparacontractive.

To see this latter fact, note that −1 is an eigenvalue of A ifand only if ∃v such that (A + I)v = 0, that is if and only if0 ∈ σ(A + I). By the Gershgorin disk theorem, the eigenvaluesof A + I must belong to the union of disks in the complex plane⋃i Di, where Di = {z ∈ C : |z − ci| ≤ ri}, where ci = aii + 1,

and ri =∑j6=i |[A + I]ij| =

∑j6=i aij = 1 − aii, where the last

equality follows from the fact that the sum over each row of A isone. Since aii > 0 by hypothesis (matrices in Mss have strictlypositive diagonal elements), all disks have centers ci > 1 and radiiri < 1, hence none of these disks contains the origin, therefore0 6∈ σ(A+ I), which concludes the proof. �

Proof of Corollary 2. Let us denote for ease of notation with P .=

Ap1Ap2 · · · Apq the product of a generic permutation of matricesA1, . . . , Aq ∈ Mss. We have that P ∈ Ms, and from Corollary 1it follows that if the union graph G1 ∪ G2 ∪ · · · ∪ Gq is connectedthen P is primitive. Therefore, using (3) we have that

I(P) = span{1}.

We next show that

‖Pv‖ < ‖v‖ for all v 6∈ I(P),

which indeed means that P is paracontractive. To this end, let v 6∈I(P) and notice that in forming the product Pv = Ap1Ap2 · · · Apqvonly two cases might arise: either v ∈ I(Apk) for all k = 1, . . . , q,or not. But v ∈ I(Apk) for all k = 1, . . . , q, would imply thatPv = Ap1Ap2 · · · Apqv = Ap1Ap2 · · · Apq−1v = · · · = Ap1v = v,which would mean that v ∈ I(P), and this is not possible sincewe selected v 6∈ I(P). Therefore there must exist an index z ∈{1, . . . , q} such that v ∈ I(Apk) for all k = z + 1, . . . , q, andv 6∈ I(Apz ). Hence, we have that

Pv = Ap1Ap2 · · · Apzv, where Apzv 6= v.

Since Apz is paracontractive we have that ‖Apzv‖ < ‖v‖, andtherefore by sub-multiplicativity of norms it follows that

‖Pv‖ = ‖Ap1Ap2 · · · Apzv‖ ≤ ‖Ap1Ap2 · · · Apz−1‖ · ‖Apzv‖≤ ‖Apzv‖ < ‖v‖,

which proves that P is paracontractive. �

Proof of Lemma 3. Consider (4) and let P = (A1 · · · Aq), Q =(Z1 · · · Zq), with Q = P− 1n11

>. Since P is row stochastic, primitiveand paracontracting, we have that Pv = v if v ∈ span{1} and‖Pv‖ < ‖v‖ otherwise. Let then a generic vector z ∈ Rn berepresented as the sum of orthogonal components:

z = x+ y, x = α1 ∈ span{1}, y ∈ span{1}⊥,

and note that ‖z‖2 = ‖x‖2 + ‖y‖2 ⇒ ‖y‖ ≤ ‖z‖. We have

Qz =(P −

1n11>

)(α1+ y)

= αP1+ Py− α1n11>1−

1n11>y = α1+ Py− α1− 0

= Py,

whence

‖Qz‖ = ‖Py‖ < ‖y‖ ≤ ‖z‖, ∀z ∈ Rn, z 6= 0

and therefore

‖Q‖ = supz 6=0

‖Qz‖‖z‖

< 1,

which proves that Q is contractive with respect to the spectralnorm. �

Page 11: Distributed randomized algorithms for probabilistic ......;20) ./././ ‘./;:::;.;:::;./, ‘.;;;:::.././;;:::; ‘. [./ ‘./;;:::;; ‘././ ‘./;:::;.; ...;.‘./.. ‘./;/ ...

212 G.C. Calafiore / Systems & Control Letters 58 (2009) 202–212

References

[1] D. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: NumericalMethods, Prentice Hall, 1989.

[2] G. Calafiore, F. Dabbene, R. Tempo, A survey of randomized algorithms forcontrol synthesis and performance verification, Journal of Complexity 23 (3)(2007) 301–316.

[3] G. Calafiore, F. Dabbene (Eds.), Probabilistic and Randomized Methods forDesign under Uncertainty, Springer-Verlag, London, 2006.

[4] M.C. Campi, G.C. Calafiore, Notes on the scenario design approach.IEEE Transactions on Automatic Control, (in press) Draft available at:http://staff.polito.it/Documenti/Papers/Campi-Calafiore_TN-TAC-08.pdf.

[5] M. Cao, A.S. Morse, B.D.O. Anderson, Reaching a consensus in a dynamicallychanging environment: A graphical approach, SIAM Journal on Control andOptimization 47 (2) (2008) 575–600.

[6] R. Carli, F. Fagnani, A. Speranzon, S. Zampieri, Communication constraints inthe average consensus problem, Automatica 44 (2008) 671–684.

[7] P.J. Davis, Circulant Matrices, Wiley-Interscience, New York, 1979.[8] R. Diestel, Graph Theory, Springer, New York, 2005.[9] C. Godsil, G. Royle, Algebraic Graph Theory, Springer, New York, 2001.[10] D.J. Hartfiel, Nonhomogeneous Matrix Products, World Scientific, 2002.[11] W. Hoeffding, Probability inequalities for sums of bounded random variables,

Journal of the American Statistical Association 58 (1963) 13–30.[12] R.A. Horn, C.R. Johnson,Matrix Analysis, CambridgeUniversity Press, UK, 1985.

[13] A. Jadbabaie, J. Lin, A. Morse, Coordination of groups of mobile autonomousagents using nearest neighbor rules, IEEE Transactions on Automatic Control48 (6) (2003) 988–1001.

[14] P. Khargonekar, A. Tikku, Randomized algorithms for robust control analysisand synthesis have polynomial complexity, in: Proceedings of the IEEEConference on Decision and Control, 1996.

[15] L. Moreau, Stability of multiagent systems with time-dependent communica-tion links, IEEE Transactions on Automatic Control 50 (2) (2005) 169–182.

[16] Reza Olfati-Saber, J.A. Fax, R.M. Murray, Consensus and cooperation in multi-agent networked systems, Proceedings of the IEEE 95 (1) (2007) 215–233.

[17] R. Tempo, E.-W. Bai, F. Dabbene, Probabilistic robustness analysis: Explicitbounds for the minimum number of samples, Systems & Control Letters 30(1997) 237–242.

[18] R. Tempo, G. Calafiore, F. Dabbene, Randomized Algorithms for Analysis andControl of Uncertain Systems, in: Communications and Control EngineeringSeries., Springer-Verlag, London, 2004.

[19] L. Xiao, S. Boyd, Fast linear iterations for distributed averaging, Systems andControl Letters 53 (2004) 65–78.

[20] L. Xiao, S. Boyd, S. Lall, Distributed average consensus with time-varyingMetropolis weights Automatica, 2006, unpublished manuscript. Available at:http://www.stanford.edu/~boyd/papers/avg_metropolis.html.

[21] L. Xiao, S. Boyd, S. Lall, A scheme for robust distributed sensor fusion based onaverage consensus, in: International Conference on Information Processing inSensor Networks, IPSN 2005, pp. 63–70, April 2005.