Modeling and Analysis of Composite Network Embeddings

Modeling and Analysis of Composite NetworkEmbeddings∗

Ben BaumerCity University of New [email protected]

Prithwish BasuRaytheon BBN Technologies

[email protected]

Amotz Bar-NoyCity University of New York

[email protected]

ABSTRACTIn a composite network, a piece of information travelingthrough links in a social network may have to travel overmultiple links in an associated communication network. Inthis paper, we propose a model of composite networks thatconsists of two networks and an embedding between them,and several composite metrics that characterize informationflow under a particular type of embedding. We present an-alytic results for the scaling behavior of “constrained com-posite stretch” of a path, “constrained composite diameter”of a graph, and “constrained composite broadcast time” of atree, under random uniform embeddings onto various com-munication network structures. We validate our analyticalresults on composite stretch using two data sets consistingof a friendship social network geographically spread acrossWestern Europe and a historical deployment of a militarychain of command. We also present a randomized modelof field deployment consistent with real-world data, and usesimulations over this model to explore the distribution ofconstrained composite broadcast time. Finally, we show thatour analytical bounds for composite broadcast time agreewell with the simulation results.

∗Research was sponsored by the Army Research Labora-tory and was accomplished under Cooperative AgreementNumber W911NF-09-2-0053. The views and conclusionscontained in this document are those of the authors andshould not be interpreted as representing the officialpolicies, either expressed or implied, of the Army ResearchLaboratory or the U.S. Government. The U.S. Governmentis authorized to reproduce and distribute reprints for Gov-ernment purposes notwithstanding any copyright notationhere on.

(c) 2011 Association for Computing Machinery. ACMacknowledges that this contribution was authored orco-authored by an employee, contractor or affiliate of theUnited States government. As such, the United StatesGovernment retains a nonexclusive, royalty-free right topublish or reproduce this article, or to allow others to doso, for Government purposes only.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.MSWiM’11, October 31–November 4, 2011, Miami, Florida, USA.Copyright 2011 ACM 978-1-4503-0898-4/11/10 ...$10.00.

Categories and Subject DescriptorsG.2.2 [Discrete Mathematics]: Graph Theory—Networkproblems; Trees; Path and circuit problems; C.2.1 [Computer-Communication Networks]: Network Architecture andDesign—Network topology; Wireless communication

General TermsTheory, Networks

1. INTRODUCTIONThe past few years have seen explosive growth of online so-

cial networks along with proliferation of the communicationnetworks such as the Internet, wireline telephone networks,cellular telephone networks, and even multi-hop wireless adhoc and sensor networks, in specialized application scenar-ios such as disaster relief and battlefield operations. Humanbeings usually share information with folks they know, andthen that information gets forwarded along various links inthe social network, either verbatim (e.g., directives from aleader flowing through the chain of command) or after mod-ifications (e.g., propagation of rumors, gossip, or news fromthe field to the newspaper reader). This results in a socialnetwork constraining or guiding the spread of informationaccording to its topology. Such constraints can result in apiece of information having to traverse much longer pathsin the underlying communication network between the pro-ducer and the ultimate consumer of the information. Thisstretch is justified because either the intermediaries couldplay a critical role in interpreting/massaging the informa-tion or they serve as important links in the acquaintancechain, without whom the producer and the ultimate con-sumer would not know each other.

The total time taken for a piece of information to spreadthrough the entire social network in the aforementioned con-strained fashion is often different from the time taken to justmulticast the information on the underlying communicationnetwork. Additionally, a piece of information may traverse acommunication link several times during the process. Whilethis is not an issue for lightweight content such as text, itcan be a major issue for multimedia content, especially inan ad hoc network/DTN setting where accessing multime-dia content directly from a server over a flaky network is notreliable.

In this paper, we present a systematic analytical studyof the constrained information flow problem – in particular,we model a pair of networks (social and communication) asa composite graph – structures that result from embedding

the former network into the latter using mapping functionsthat can map a social network node to a communicationsnetwork node when the former uses the latter as his/hercommunication portal.

Interactions between different genres of networks are be-coming increasingly observable in both natural and man-made systems. Examples include peer-to-peer overlay net-works [13], power grid networks and communication net-works [5] or the interaction between the programmer socialnetwork and interconnected web of software modules devel-oped by them [12].

We analytically study how the “constrained compositepath stretch” and “constrained composite broadcast time”metrics scale with the sizes of both networks under consider-ation. One application scenario where this can be useful is arandom deployment of workers or soldiers in a disaster reliefsite and their leader attempting to disseminate instructionsand directives along his/her chain of command (essentially asocial network of sorts). These messages trace a logical pathin the social network which translates to a longer physicalpath on the underlying communication network.

1.1 Related WorkRelated work in this area falls in three major categories:

graph embedding, overlay networks, and network scienceapproaches to study composite networks. Graph embed-ding has received attention in the parallel computing domainwhere the problem is to map a task graph onto a multiproces-sor interconnection network (also known as host graph) [3,14, 8], and in the ubiquitous computing domain where theproblem is to map heterogeneous task graphs on non-regularnetworks such as mobile ad hoc networks [2], while attempt-ing to determine the optimal mapping (or task to proces-sor assignment) function such that metrics such as delay-to-task-completion, edge dilation (or stretch), node/edge con-gestion, etc. are minimized. Instead of the aforementioned“optimization”approaches, in this paper, we follow the“scal-ing law analysis” approach where both the graphs and themapping function are given, and we characterize how a dif-ferent set of appropriate “constrained” metrics such as com-posite path stretch, composite diameter, and broadcast timescale as a function of composite graph attributes.

There has been a significant amount of work on overlaynetworks in the past decade [13]. Seminal works such asCAN [15] and CHORD [16] attempt to design good dis-tributed hash tables for P2P applications – the main en-deavor there was to design good logical distributed datastructures for storing (key,value) pairs overlaid on top of theInternet, so that efficient insertion and retrieval of hashedcontent is feasible from any part of the network. While thisis a good example of a composite network, its similaritieswith our approach are slim. While overlay networks attemptto design good overlay graphs for the purpose of optimiza-tion of insertion/lookup overhead, in our problem space, thesocial network graph is given, and we are interested in a dif-ferent set of information flow metrics. Moreover, unlike theInternet which is a complete graph, our underlying networkis a multi-hop network, in general.

Recently, network science researchers have begun studyingvarious flavors of composite network structures. Kurant andThiran propose the Layered Complex Network model [11] forstudying load in transportation networks – they consideredtwo-layer graphs where the physical graph models the trans-

portation network and the logical graph models traffic flowbetween various cities; they use computational methods todetermine levels of load on various transportation sectorsin Europe. In comparison, our approach is analytical andwe also study different metrics. A recent analytical line ofresearch considers interdependent networks such as powergrid and communication networks [5] – they use percolationtheory to determine the fraction of nodes whose removal islikely to generate cascading failures in such networks. Le-icht and D’Souza show that percolation thresholds of com-posite networks is lower than the individual networks, whenconsidered separately [12]. While these approaches are allanalytical, they study a different graph metric, i.e., degreeof failure tolerance.

There is a large body of work pertaining to the embed-ding of one metric space into another – in particular, normedspaces such as d-dimensional Euclidean space Rd) – with“low-distortion”. This has been summarized well in [9]. Thisentails establishing the necessary and sufficient conditionson the properties of the two spaces for finding such embed-ding functions that yield a particular distortion, and in manycases finding the best embedding function [1]. A related ideaof finding embeddings is popular in geographic routing – vir-tual coordinates are assigned to nodes in a hyperbolic space,and such an embedding guarantees that a greedy algorithmon the virtual coordinate space yields a route between everysource and destination, if one exists [10].

The focus of this paper is not to find the best embeddingfunction that yields a low distortion – rather, it is to analyzethe distortion (or stretch) of an information flow that resultsfrom a random embedding of the nodes of the first graphonto the second graph, in distribution or in expectation.

1.2 Our ContributionA summary of our contributions is as follows: 1) novel

models and metrics for constrained information flow in com-posite networks; 2) mathematical analysis of scaling laws forconstrained composite path stretch when a social networkpath is randomly mapped onto a general graph under bothone-to-one and many-to-one mappings; 3) scaling laws forconstrained composite broadcast time of a tree social net-work (chain of command) randomly mapped onto differentcommunication networks; 4) validation of a subset of theseresults using the FOAF (friend of a friend) data set embed-ded on a geometric graph as well as a historical deploymentof a chain-of-command.

2. COMPOSITE GRAPH MODELSFor two graphs G1 and G2, we define the composite graphG to be the 3-tuple (G1, G2, R), where R ⊆ V (G1)× V (G2)is an embedding relation between the vertex sets V (G1) andV (G2) of the two graphs, respectively.

In this paper, we focus on analyzing random embeddingrelations, where vertices in G1 are mapped to vertices in G2

via some random process π. In particular, we study twocases:

1. Each vertex in G1 is mapped to a vertex in G2 thathas been sampled uniformly at random with replace-ment. This is the many-to-one scenario, where many“social network” nodes can get mapped to the samecommunication network node.

2. Each vertex in G1 is mapped to a vertex in G2 that has

been sampled uniformly at random without replace-ment. This is the one-to-one scenario, where a com-munication network node can host at most one socialnetwork node.

In general, every element of R may have multiple at-tributes associated with it but in this preliminary study weonly consider a binary relation. Note that this relation maybe time-varying as an information object may be replicatedor may even move from one communication node to anotherover time. This is a topic of future research.

We first analyze constrained composite path stretch, a met-ric that is useful for measuring how many physical communi-cation hops are spanned by a logical information flow undera given embedding of the logical flow on a physical network.

Throughout this paper, let G = (G1, G2, R) be a compos-ite graph, with Vi = V (Gi) the vertex set of graph i. Unlessotherwise noted, Pk = Puv = u = v0, v1, ..., vk = v is apath of length k in G1, and dG2 : V2 × V2 → R is a shortestpath distance metric in G2.

Definition 2.1 (cstretch). The constrained compos-ite path stretch of Puv in G is defined as:

cstretchG2(Puv) =

k−1∑i=0

maxx,y∈V2:

(vi,x)∈R∧(vi+1,y)∈R

dG2(x, y)

CStretch characterizes the scenario with a stringent re-quirement that the information needs to traverse the nodesin the path Puv in order and in the process need to tra-verse the appropriately mapped nodes in G2. This is nota farfetched scenario – in military systems, the chain-of-command (modeled by graph G1) often mandates a piece ofinformation to flow through the logical chain even thoughthe ultimate recipient of the information may be in closeproximity to the origin and the intermediate nodes are far-ther away from them. The reason behind this is that infor-mation often needs to get refined or obfuscated at each levelof the logical chain before it is passed on further. Similarly,even in non-military applications (such as online social net-works) information such as news or gossip is often routedalong logical paths of friends who may be physically locatedall over the globe at large “Internet distances” from eachother.

In the composite graph setting, the notion of diameter1

can be extended to that of the constrained composite diam-eter which can be defined in terms of constrained compositepath stretch.

Definition 2.2 (ccd). The contrained composite diam-eter of G is defined as

ccd(G) = maxu,v∈V1

cstretchG2(Puv) .

The CStretch metric captures the extra distance in G2

that a message has to travel in order to move through a pathin G1. We need a different metric to capture the combinedstretch for a message traveling through a chain-of-commandtree in a composite graph. In this context, it is more naturalto consider the constrained composite broadcast time metric.1Diameter is the maximum length of the shortest path be-tween any pair of nodes in a graph. It is an importantmeasure for communication networks because it gives us asense of the amount of time required (in the worst case) totraverse a network.

Definition 2.3 (cbtime). Let T be a tree in G1, withroot u. Then the constrained composite broadcast time of Tin the composite graph G is defined as

cbtimeG2(T ) = maxv∈T

cstretchG2(Puv)

The constrained composite broadcast time represents thestretch necessary to send a message through a chain-of-command tree that is deployed in a network topology. Thismay be of interest, for example, in a disaster relief situationwhen information needs to travel from a central director toend caregivers while relief workers are deployed in the field.In other words, it measures the time at which the last workerreceived the message that was broadcast through the chainof command.

3. COMPOSITE STRETCH ANALYSISIn this section, we characterize the distribution of the con-

strained composite path stretch of Pk over uniform randomembeddings into G2. We first prove some general resultsthat apply to any graph G2, and then illustrate scaling lawsfor a few well-known graph families.

3.1 Theoretical ResultsFor any graph G = (V,E), let DG be the geodesic graph

distance matrix between all pairs of vertices vi, vj ∈ V . Thatis, each entry dij in DG represents the shortest path distancefrom vi to vj inG. Then we note that the sum of the geodesicdistances ∆G =

∑vi,vj∈V dij , is a constant depending only

on the structure of G.

Lemma 3.1. Let G be a graph with |V | = n, and let Xbe a random variable denoting the geodesic distance betweentwo vertices of G chosen uniformly at random. Then:

E[X] =

∆G

n(n−1), when sampling without replacement

∆Gn2 , when sampling with replacement

Proof. The case where sampling is done with replace-ment is clear: since there are n2 pairs of vertices from whichto choose, the expression given is the average distance. Ifsampling is done without replacement, then ∆G double-counts the distance for each of the

(n2

)unique pairs of ver-

tices. Note that the n diagonal entries in DG contributenothing to ∆G.

Corollary 3.1. There is no asymptotic difference in E[X]between sampling vertices with or without replacement.

Proof. From the preceding Lemma, it follows that theratio of E[X] when sampling without replacement to E[X]when sampling with replacement is 1+ 1

n→ 1 as n→∞.

Next, we show that the expected stretch of a link is inde-pendent of the choices of vertices already mapped, regardlessof whether sampling is done with or without replacement.

Lemma 3.2. Let v1, v2, ..., vi be a sequence of vertices cho-sen uniformly at random from V (with or without replace-ment), and let Xi be the random variable giving the distancebetween vi and vi−1. Then E[E[Xi+1|v1, v2, ..., vi]] = E[X2].

Proof. While the statement may be obvious for the caseof sampling with replacement, we exercise more care for thecase where sampling is done without replacement, and prove

the statement combinatorially. For the RHS, select one ver-tex uniformly at random and color it red (call it v1). Thenselect another from the remaining and color it blue (v2).The RHS counts the expected distance between these twovertices. We now argue that the LHS counts the same. Tosee this, first color one vertex blue (call it vi+1), and an-other vertex red (vi). Now color i − 1 other vertices green(vi−1, ..., v1). The LHS counts the expected distance be-tween the blue vertex and the red vertex.

This leads us to a general theorem about the expectedcomposite stretch of a path.

Theorem 3.1. For a path Pk embedded uniformly at ran-dom into any graph G2 (with the sampling performed withor without replacement),

E[cstretchπG2(Pk)] = k · E[X] ,

where X is the random variable giving the distance betweentwo randomly chosen vertices in G2.

We emphasize the expectation is being taken over the uni-form random embedding Rπ. But as we saw in Lemma 3.1,for a specific G2, if the sampling method of Rπ is known,then the expected distance E[X] is a constant.

Composite Diameter: In addition to the average-case, wealso want to describe the worst-case cstretch for a randomembedding. It is easy to see that if Rπ samples verticeswith replacement, then each successive link in any path cansimply bounce back and forth between the furthest two ver-tices in G2. Thus, ccd(G) = diam(G1) ·diam(G2). However,when Rπ samples vertices without replacement, the problemis an instance of MAX-TSP, which is MAX SNP-hard [7].However, a greedy approximation heuristic works well inpractice.

3.2 ExamplesTheorem 3.1 shows that the expected stretch of a path

is equal to the length of the path times a constant depend-ing only on the structure of G2 and the distribution of therandom embedding. In what follows, we present examplesof some well-known graph families, and illustrate how theirstructure affects the distribution of cstretch.

d-dimensional Discrete Lattice: Let Ddn = 0, 1, ..., n−

1d be the d-dimensional discrete lattice on nd points, andconsider a composite graph with G2 = Dd

n. On this graphtopology, geodesic distance is equivalent to the `1-norm (Man-hattan distance) between two points inDd

n. Thus, dG2(v, w) =∑di=1 |vi − wi|, and summing all n2d of these pairs gives

∆G2 =∑v,w∈V

d∑i=1

|vi − wi| =dn2d+1

3

(1− 1

n2

)It follows from Lemma 3.1 and Theorem 3.1 that under arandom uniform embedding with replacement into the d-dimensional discrete lattice,

E[cstretchπG2(Pk)] =

kdn

3

(1− 1

n2

)Note that in this case it is also straightforward to fully

explicate the distribution of X. For any 1 ≤ i ≤ d, let

Xi = |vi−wi|. Then the probability mass function for Xi is

pXi(δ) =

1n

if δ = 02(n−δ)n

otherwise,

since each coordinate can take on any of n values, and thereare n−δ ways to achieve each value of δ between 0 and n−1.Since the Xi’s are independent and identically distributed,we can extract (among other things), the second moment ofX:

Var[X] = d · (n2 − 1)(n2 + 2)

18n2.

We can infer from this that the expected stretch is not likelyto deviate from its mean.

For the discrete lattice, we have that diam(G2) = d(n−1),so as mentioned above, the ccd for Pk is k(n − 1). For thenon-trivial “without replacement” scenario, we implementeda Greedy approximation heuristic, and verified that ccd forboth without and with replacement scenarios are O(n2).

Cycle: Let Cn be the cycle of length n, and consider uni-form discrete mappings from Pk onto Cn. Clearly, the max-imum distance between two vertices in Cn is bn

2c. But, for

each possible distance x between 0 and n2

, there are exactlyn such pairs for x = 0, n

2, and exactly 2n such pairs other-

wise. It is thus straightforward to show that

∆Cn =

n(n2−1)

4if n is odd

n3

4if n is even

.

Application of Lemma 3.1 and Theorem 3.1 then reveal thatfor random uniform embeddings onto Cn,

E[cstretchπCn(Pk)] = k ·

(n4

+ o(1)).

Greedy is optimal on Cn, since if n is odd, it finds n−1 pairsat distance bn

2c = diam(G2) from each other, which is opti-

mal by definition. On the other hand, if n is even, it picksall n

2pairs at distance n

2= diam(G2) from each other, and

another(n2− 1)

pairs at the next greatest distance(n2− 1).

Balloon graph: Next, we consider a graph family withsome interesting properties. Let Bn,m be a balloon graphconsisting of a string (line graph) of length m, connectedto a balloon (clique) of size n − m, for any 0 ≤ m < n.For clarity, we specify that vertices v0, ..., vm make up thestring, while vertices vm, ..., vn−1 make up the balloon.Note that for any two indices 0 ≤ i < j ≤ n − 1 in thisgraph, we have that

dBn,m(vi, vj) =

j − i if i < j < m

m+ 1− i if i < m ≤ j1 if m ≤ i < j

In particular, note that diam(Bn,m) = m+ 1. In comput-ing the distance matrix, we distinguish three cases based onthe indices of the two vertices chosen:

1. If i ≤ j ≤ m, then both vertices lie in the string, whichis D1

m+1. This contributes ∆D1m+1

towards ∆Bn,m .

2. If m ≤ i ≤ j, then both vertices lie in the balloon,and it is clear that on the complete graph Kn,∆Kn =n2−n, since every pair of vertices are connected by anedge, but there are n ways to choose the same vertextwice.

3. If i < m < j, then one vertex lies in the string, andthe other lies in the balloon. Consider any vertex wjin the balloon. Its distance from the set of vertices inthe string is simply m + 1,m,m − 1, ..., 2. Thus, thecontribution to ∆Bn,m is

2(n−m− 1)

m+1∑i=2

i = m(m+ 3)(n−m− 1) .

Adding these three quantities yields

∆Bn,m = −2

3m3 + (n− 2)m2 +

(n− 4

3

)m+ n2 − n

The reader may verify that setting m = 0 corresponds to thespecial case where the balloon graph is itself a clique, whilesetting m = n−1 yields the special case where Bn,n−1 = D1

n.By Theorem 3.1 and Lemma 3.1, the expected stretch for

a path of length k onto Bn,m is thus:

E[cstretchπBn,m(Pk)] = k ·

(1 +O

(m2

n

))Random Geometric Graph: Lastly, we consider the com-posite stretch when Pk is mapped onto a random geomet-ric graph G2 = RGG(n, r(n)), where r(n) is the radius ofcommunication. That is, G2 consists of n vertices placeduniformly at random in [0, 1]2, wherein any two vertices areconnected with an edge if and only if the Euclidean distancebetween them is at most r(n). Gupta and Kumar [6] showed

that a radius of connectivity of r(n) =√

lnn+c(n)πn

ensures

asymptotic connectivity in the RGG with high probabilityif and only if c(n)→ +∞. In all of our discussions on RGGin this paper, we assume that the radius of connectivity isat least this large, i.e., r(n) = Ω(

√lnn/n).

As before, Theorem 3.1 still applies, so it remains only tocharacterize the distribution of the random variableX givingthe geodesic distance between two vertices in RGG(n, r(n))selected uniformly at random. Note that in contrast to theprevious examples we have considered, we now have twosources of randomness: 1) the randomized construction ofthe RGG; and 2) the random uniform embedding. If theEuclidean distance between two vertices in an RGG is δ,then recent results confirm that with high probability, thegeodesic distance X differs from its minimum of δ/r by atmost a constant [4].

Theorem 3.2. With high probability, the expected geodesicdistance in RGG(n, r(n)) satisfies

∆(2)

r(n)≤ E[X] ≤ κ(n) · ∆(2)

r(n),

where ∆(2) ≈ 0.5214054331 is a known constant, and κ(n) ≥1 is O(1).

Proof. Let v, w be two vertices in RGG(n, r(n)) selecteduniformly at random, and set δ = ||v − w||2. Clearly, X ≥δ/r. Conversely, if δ = Ω(log3.5 n/r2), then by a result from[4], X = O(δ/r).

Taking expectation yields the result, since E[δ] = ∆(2) isa known result [17].

Synthetic analysis suggests that κ(n) < 1.3 for n > 1000.Therefore, as before, we can easily bound (from above) theexpected composite stretch.

G1 G2 E[cstretch] max[cstretch]

Pk

Ddn

kdn3

(1− n−2

)kd(n− 1)

Cn k ·(n4

+ o(1))

k · bn2c

Bn,m k ·(

1 +O(m2

n

))k(m+ 1)

RGG(n, r(n)) O(k√

nlnn

)Table 1: Summary of Path Stretch Metrics for Uni-form Random Embeddings of Pk

Corollary 3.2. For r(n) sufficiently large (i.e., greaterthan the critical connectivity threshold), the composite stretchof a path Pk on a random geometric graph RGG(n, r(n))satisfies with high probability:

E[cstretchπRGG(Pk)] = k · κ(n) · ∆(2)

r(n)= O

(k ·√

n

lnn

)3.3 Average vs. Worst-Case Analysis

Thus far, we have characterized both the average case(expected cstretch) and the worst case (ccd) for a randomuniform embedding of a path onto several graph families.For both the lattice and the cycle, these quantities wereof the same order of magnitude. At this point a naturalquestion arises: Are there graphs for which the ratio of themaximum cstretch to the average cstretch of Pk is not O(1)?Indeed, the balloon graph is one such graph. As the diameterof Bn,m is m+1, the maximum stretch is diam(G1) ·(m+1).If we let φ(Bn,m) be ratio of the maximum cstretch to themean cstretch, we can see that:

φ(Bn,m) =diam(G1)(m+ 1)

diam(G1)(

1 +O(m2

n

)) = O( nm

)In particular then, for m =

√n, the ratio of the maximum

stretch to the mean stretch for the balloon graph Bn,m isO(√n). Explicit calculations reveal that for m =

√n, in

fact E[X]→ 2 as n→∞.More interesting is the fact that this gap appears to be

mainly an artifact of the difference between sampling withand without replacement. The results of our Greedy al-gorithm for CCD without replacement suggest that withm =

√n, the CCD and expected cstretch are of the same

order to magnitude.Table 1 summarizes our theoretical results.

4. COMPOSITE BROADCAST TIMEIn this section, we analytically characterize the expected

composite broadcast time for tree topologies. Social net-works for information dissemination commonly have treestructures (more on this in section 5), hence this analysiscan be useful for specific communication network deploy-ment scenarios. Let Tk be a k-node tree of height h andmaximum (out)degree δ, for some 1 ≤ δ < k. We assumethat Tk exists in some G1, and examine the constrained com-posite broadcast time for sending a message from the rootto each of the other nodes.

Star Topology: We begin with the special case where Tkis a k-star. First, we introduce some notation. Let

pk =1(n−1k

)0, ..., 0︸︷︷︸

k

, 1,

(k

k − 1

), ...,

(n− 2

k − 1

) ∈ Rn

be a column vector, and note that ||pk||1 = 1. The ith entryin pk represents the probability that the ith largest among nvalues is returned, when this value is the maximum among asubset of size k chosen uniformly at random. Furthermore,let f : Rm×n → Rm×n be the function that sorts the rowsof a matrix in ascending order from left to right. That is,

D =

d1

d2

...dm

⇒ f(D) =

sort(d1)sort(d2)

...sort(dm)

,

where di is the ith row of D. Finally, vm = 1m

(1, ..., 1) ∈ Rm.

Theorem 4.1. For any graph G2, the broadcast time ofa star of size k satisfies

E[cbtimeG2(Sk)] = vTn · f(DG2) · pk .

Proof. Let di be the ith row of DG2 , and suppose thatthe root of the star Sk is mapped to node i in G2. Thebroadcast time of Sk is the maximum cstretch from amongits k children. But since the jth entry of pk is the proba-bility that the jth largest value in di will be returned, theinner product 〈sort(di), pk〉 gives the expected value of themaximum of the k cstretches. Multiplication on the left byvTn simply averages these n values over all n rows.

Note that this is consistent with Theorem 3.1 for the spe-cial case where k = 2. Theorem 4.1 allows us to computethe broadcast time of a k-star for a variety of graph families,and we later use these as building blocks for bounds on gen-eral trees. Moreover, Theorem 4.1 improves on the trivialupper bound of diam(G2) for the broadcast time of a star.A better bound can be derived by considering the averageeccentricity of G2. The eccentricity ε of a vertex in a graphis defined as the maximum geodesic distance between thatvertex and any other.

Corollary 4.1. For any graph G2, the broadcast time ofa star of size k satisfies

E[cbtimeG2(Sk)] ≤ 1

n

∑v∈V2

ε(v)

Proof. Substituting pn−1 in place of pk−1 returns theaverage eccentricity of the vertices in G2.

Corollary 4.1 provides a better bound than the diameter, butis not nearly as good as when using Theorem 4.1 directly.To illustrate how Theorem 4.1 can be used for a specific G2,we provide an upper bound on the broadcast time of a star,when G2 is the line lattice above.

Corollary 4.2. For G2 = D1n, the line lattice, the broad-

cast time of a star of size k satisfies

E[cbtimeG2(Sk)] ≤ k

k + 1· n .

Proof. The maximum product on the right certainly oc-

curs at d1 = (0, 1, 2, ..., n−1), which is already sorted. Thus,

〈d1, pk〉 =1(n−1k

) n∑j=k+1

(j − 1)

(j − 2

k − 1

)

=k(n−1k

) n∑j=k+1

(j − 1

k

)

=k(n−1k

)( n

k + 1

)=

k

k + 1· n

Tree topology: For any tree Tk with maximum degree δ,let δi be the maximum outdegree among nodes at height1 ≤ i ≤ h in Tk.

Observation 4.1 (Cbtime: Lower Bound).

E[cbtimeG2(Tk)] ≥ E[cstretchG2(Ph)]

Proof. The lower bound represents the expected cstretchof a path of length h, which is the longest in Tk. No othersingle path from the root to a leaf could have expectationlonger than this, so the expectation for the tree must be atleast this large.

Observation 4.2 (Cbtime: Upper Bound).

E[cbtimeG2(Tk)] ≤h∑`=1

E[cbtimeG2(Sδ`)]

Proof. The upper bound represents the sum (over all hlevels of Tk) of the expected composite broadcast time forthe largest star graph Sδ` at each level `. This the maximumexpected broadcast time, since no path from the root to aleaf could take longer than this.

Combining Theorem 3.1 with Observations 4.1 and 4.2yields the following bounds on the expected broadcast timeof a general tree.

Corollary 4.3. For any tree Tk of height h and maxi-mum outdegree δ,

h · E[X] ≤ E[cbtimeG2(Tk)] ≤ h · E[cbtimeG2(Sδ)] ,

where X is the r.v. giving the expected geodesic distancebetween two vertices in G2.

Proof. The lower bound is an application of Theorem3.1 to Observation 4.1, while the upper bound follows fromObservation 4.2 and the fact that δ = max1≤`≤h δ`.

5. SIMULATION BASED EVALUATIONIn this section, we use a variety of simulation-based ap-

proaches to study how the composite stretch and compositebroadcast time metrics behave under various choices of real(or at least realistic) social and communication networks.We first study the case where the social network (or G1) is ahierarchical network like a chain of command that exists inmilitary missions or in disaster relief operations. Our secondcase study is that of a social network that is richer than atree as in friendship relationships.

G1: Chain of Command Tree

Number of Nodes = 514 , Diameter = 5

(a) G1

G2: Geometric Graph Over Given Deployment

Radius of Connectivity = 14.93 km, Diameter = 15 , Broadcast Time = 18

(b) G2

G2: Random Geometric Graph

Radius of Connectivity = 14.88 km, Diameter = 16 , Broadcast Time = 51

(c) G2 under S2

Figure 1: Examples of chain of command networks. (a) a chain of command social network; (b) historicaldeployment of the chain of command shown in (a); (c) one random deployment of the chain of commandshown in (a) under scenario S2.

5.1 Chain-of-CommandFigure 1(a) denotes the chain of command hierarchy from

within a representative brigade reporting structure in theUS Army. Commands move from the highest ranked (root,shown in pink) node to the lower ranked nodes in the tree un-til they reach the lowest ranked (leaf, shown in blue) nodes.Though not depicted in Figure 1(a), each node in G1 occu-pies a physical location in space. In Figure 1(b), we showthe geometric graph G2 constructed from these physical lo-cations by adding a communication edge between two nodesif they lie within a prescribed radio transmission range (i.e.- the critical radius of connectivity r(n) outlined above) ofeach other. The coordinates of the nodes in G1 are spec-ified according to a historical deployment scenario over a124× 148km2 area. G2 is not an RGG but only a geometricgraph with radius of communication 14.93km.

The composite graph G∗ = (G1, G2, R) that combines Fig-ures 1(a) and 1(b), along with the identity mapping R, is arealistic composite network structure for a military or dis-aster relief deployment. In this instance, the broadcast timeis 18, though eccentricity in G2 of the root node of G1 is 9.Thus, the constraints imposed on the information flow by thechain of command require a message to travel through twiceas many hops as was mandated by the actual deployment.A path that produces the broadcast time is highlighted inred in Figures 1(a) and 1(b).

In order to put this broadcast time of 18 in context, wesimulated three different randomized scenarios, each of whichcould produce G∗ as a singular outcome:

S1 Instead of R being the identity mapping, R is a randompermutation.

S2 Instead of G2 being a geometric graph over the actualcoordinates of deployment, each node in G1 was as-signed a random 2D coordinate drawn from a squareregion of the same area as in S1. Therefore, G2 is arandom geometric graph (RGG). Fig. 1(c) shows anexample of such a deployment.

S3 Instead of G2 being a geometric graph over the actualcoordinates of deployment, the coordinates were gen-

erated according to a random model that is a functionof the chain of command tree G1, with R as the identiymapping. Details of the model are given below.

In the actual deployment, note that the lowest rankednodes are collocated in bunches of four, hence Fig. 1(b) ap-pears sparser than Fig. 1(c). Moreover, the broadcast timejumps from 18 to 51 for this particular RGG. This is becausein the real deployment, there is strong correlation betweenthe location of a node and its rank in the command hierarchy(even though the maximum cstretch is as high as 18), whichdoes not exist in random deployments. Inspired by this ob-servation, we constructed a correlated random deploymentmodel for S3.

Details of Deployment Model for S3: Our model placeseach child in an equi-spaced, but randomly-oriented, ringaround its parent, wiht a random jigger applied in both thehorizontal and vertical directions. This process is recursivelyapplied down the chain of command tree, which we assumehas height h. Let vi be the node in G1 at distance hi fromthe root node, and having ni children. Then the location ofvi’s children are determined as follows:

1. Find the ni roots on unity ω1, ..., ωni and associate onewith each of the ni children.

2. Draw a uniform random variable u ∈ [0, 1].

3. Set the target distance from parent to child to be ρi =a(h − hi)2, where a is a parameter determined fromanalysis of the actual deployment data. [In our casea = 1.85.].

4. For each j ∈ 1, ..., ni, set the target location cj =ρi · e2πiu · ωj , and draw two random coordinates xjand yj from normal distributions with mean <(cj) and=(cj), respectively, and standard deviation bρhi . Hereb is a parameter determined from the data (we as-sume a model with a constant coefficient of variation,b = 0.293), and ρhi is the mean distance between par-ent and child at height hi.

5. Return (xvi , yvi) + (xj , yj).

0 10 20 30 40 50 60 70

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Distribution of Broadcast Time

Connected = 39.98 %N = 20000 Bandwidth = 0.334

Den

sity

Deployment Type

GivenS1S2S3

(a)

5 6 7 8 9 10

1520

2530

3540

FOAF: Broadcast Time with Bounds

G1: Height of Spanning Tree

Bro

adca

st T

ime

(b)

Figure 2: Composite Broadcast time: (a) comparison of CBtime for various deployments of the chain ofcommand shown in Figure 1(a); (b) CBtime in the composite graph specified in Figure 3. Various upper andlower bounds derived in Section 4 are illustrated.

Fig. 2(a) plots the simulated distribution of broadcasttime for each of the aforementioned scenarios. That the redvertical line (corresponding to G∗) falls within the distribu-tion of S3 (more precisely, in the 92nd percentile), suggeststhat our correlated deployment model is a useful one. More-over, the other two scenarios, both of which were agnosticto the structure of the chain of command tree, performedcomparatively poorly. Empirically, the probability of ob-taining a broadcast time as low as that of G∗ via either S1or S2 appears to be negligible. This illustrates the potentialefficiency dangers inherent to composite networks.

5.2 Friend-Of-A-Friend (FOAF)Now we consider a small social network data set that was

extracted from the Semantic Web Billion Triples Challenge(BTC) program. The data contains unique identifiers andindicates the existence of friendship relations between pairsof users – as per the friend of a friend (FOAF) ontology. Italso contains the geographic coordinates of users.

One can imagine the IDs in the data set communicatingwith their friends over some underlying communication net-work. In this scenario, the communication would typicallyhappen over the wired Internet that connects various userson the map; however, we used this node distribution datato study the composite stretch of the FOAF social networkon a geographically distributed multi-hop network assum-ing a geometric graph model as described in Sec. 5.1 – inparticular, we place a node at each location in the data setand construct a graph using a transmission radius that islarge enough to connect most nodes in the network (akin tothe notion of critical radius in case of a random geometricgraph). In Figure 3, we show a 237 node social network (G1),alongside the geometric graph imposed over the same set ofvertices using the connectivity radius r(n) = 5.1 degrees oflatitude/longitude.

Since G1 is a rich social network, all nodes can act assources of information which can flow along random span-

ning trees of G1. Hence we pick all nodes in G1 one byone; and assign them as roots of random spanning trees.We then measure cbtime for each root node and plot theresults in Fig. 2(b) (black jitter-spaced hollow circles). Thedashed black line shows the average cbtime as a function ofthe height of the spanning tree. That this is approximatelylinear accords with Corollary 4.3.

On the same figure (Fig. 2(b)) we plot the upper andlower bound formulas for E[cbtime] that we derived in Sec.4. The blue lines indicate the bounds of Corollary 4.3, whilethe green line shows the weaker bound of Corollary 4.1 andthe red line shows the trivial diamater bound. The jitteredblue circles indicate the strongest bound for each spanningtree, derived from Observation 4.2. We observe that thelower bound is reasonably tight whereas the upper boundsbased on Theorem 4.1 and average eccentricity are looser,but much better than the trivial diameter upper bound. Theupper bound obtained from Observation 4.2 is tighter, sinceit takes the local structure of the spanning tree into account,rather than simply the globally maximum outdegree (46 inthis case). In summary, this validates our analytical results.

6. CONCLUSION AND DISCUSSIONIn this paper, we presented an analytical modeling frame-

work (both models and metrics) for studying the compositestretch (or elongation) suffered by information on a multi-hop communication network when it is constrained by socialnetwork relationships. We derive scaling laws (in expectedvalue sense) for composite stretch for random embeddings ofcommon social network structures such as trees on a varietyof multi-hop communication network models, ranging fromsimple linear networks to random geometric graphs.

We also analyze the constrained broadcast time metric,which measures the time taken to broadcast (or gossip) in-formation along the edges of a social network to all nodesin that network while being constrained by the underlying

G1: FOAF Social Network

Number of Nodes = 237 , Diameter = 9

G2: FOAF: Geometric Graph Over Location

Radius of Connectivity = 5.1 units, Diameter = 7 , Broadcast Time = 30

Figure 3: Social Network and Random Geometric Graph (constructed relative to the specified locations)

communication network structure. We derived analyticalbounds for composite broadcast time and validated them bysimulations using a friendship social network data set. Wealso show, using simulations based on a historical militarydeployment data set how the stretch of an information flowin a chain-of-command network is both non-optimal, but farsuperior to a random deployment scenario.

This is the first step toward an ambitious research pro-gram directed toward studying composite networks. Othercomposite metrics of interest include“composite load,”whichmeasures the number of times a piece of information needsto traverse a particular edge in G2. Additionally, other in-teresting variants of the constrained stretch metric exist –suppose one is only allowed to direct communication linksthat exist between friends (since these are supposed to betrusted) – this will result in a routing which is likely to havea higher stretch since shorter communication paths betweenfriends (through non-friends) are disallowed. How can onecharacterize this highly constrained stretch? Such insightsshould help us design better communication networks (or fa-cilitate intelligent deployment) that are suited to particulardemands of the overlaid social networks of users.

7. REFERENCES[1] Y. Bartal. On approximating arbitrary metrices by

tree metrics. In Proceedings STOC ’98.

[2] P. Basu, W. Ke, and T. D. C. Little. Dynamictask-based anycasting in mobile ad hoc networks.Mob. Netw. Appl., 8:593–612, October 2003.

[3] S. Bokhari. On the Mapping Problem. IEEETransactions on Computers, 30(3), 1981.

[4] M. Bradonjic, R. Elsasser, T. Friedrich, T. Sauerwald,and A. Stauffer. Efficient broadcast on randomgeometric graphs. In Proceedings of SODA ’10, pages1412–1421. Citeseer, 2010.

[5] S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley,and S. Havlin. Catastrophic cascade of failures ininterdependent networks. Nature, 464:1025–1028.

[6] P. Gupta and P.R. Kumar. Critical power forasymptotic connectivity. In Proceedings of the 37th

IEEE Conference on Decision and Control, volume 1,pages 1106–1110. IEEE, 1998.

[7] G. Gutin and A.P. Punnen. The traveling salesmanproblem and its variations, volume 12. KluwerAcademic Pub, 2002.

[8] C. C. Hui and S. T. Chanson. Allocating TaskInteraction Graphs to Processors in HeterogeneousNetworks. IEEE Transactions On Parallel AndDistributed Systems, 8(9), September 1997.

[9] P. Indyk and J. Matousek. Low-DistortionEmbeddings of Finite Metric Spaces. In in Handbookof Discrete and Computational Geometry, pages177–196. CRC Press, 2004.

[10] R. Kleinberg. Geographic Routing Using HyperbolicSpace. In Proceedings of IEEE INFOCOM.

[11] M. Kurant and P. Thiran. Layered ComplexNetworks. Physical Review Letters, 96(138701), 2006.

[12] E. A. Leicht and R. M. D’Souza. Percolation oninteracting networks. arXiv, 2009.http://arxiv.org/abs/0907.0894.

[13] E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, andS. Lim. A survey and comparison of peer-to-peeroverlay network schemes. IEEE CommunicationsSurveys and Tutorials, 7(2):72–93, 2005.

[14] R. Monien and H. Sudborough. Embedding oneInterconnection Network in Another. ComputingSuppl., 7:257–282, 1990.

[15] S. Ratnasamy, P. Francis, M. Handley, R. Karp, andS. Shenker. A scalable content addressable network. InProceedings of SIGCOMM ’01, pages 161–172, 2001.

[16] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, andH. Balakrishnan. Chord: A scalable peer-to-peerlookup service for internet applications. In Proceedingsof SIGCOMM ’01, pages 149–160, 2001.

[17] E. Weisstein. Hypercube Line Picking, 2010.http://mathworld.wolfram.com/HypercubeLinePicking.html.

Modeling and Analysis of Composite Network Embeddings

Documents