Learning and Predicting Dynamic Networked Behavior with ...baveja/Papers/learnhgmm-AAMAS.pdfdynamic scenarios by conditioning joint agent behavior on an abstracted history of actions

Learning and Predicting Dynamic Networked Behaviorwith Graphical Multiagent Models

Quang Duong† Michael P. Wellman† Satinder Singh† Michael Kearns∗

{qduong,wellman,baveja}@umich.edu [email protected]†Computer Science and Engineering, University of Michigan

∗Computer and Information Science, University of Pennsylvania

ABSTRACTFactored models of multiagent systems address the complex-ity of joint behavior by exploiting locality in agent interac-tions. History-dependent graphical multiagent models (hG-MMs) further capture dynamics by conditioning behavioron history. The challenges of modeling real human behav-ior motivated us to extend the hGMM representation bydistinguishing two types of agent interactions. This dis-tinction opens the opportunity for learning dependence net-works that are different from given graphical structures rep-resenting observed agent interactions. We propose a greedyalgorithm for learning hGMMs from time-series data, in-ducing both graphical structure and parameters. Our em-pirical study employs human-subject experiment data for adynamic consensus scenario, where agents on a network at-tempt to reach a unanimous vote. We show that the learnedhGMMs directly expressing joint behavior outperform alter-natives in predicting dynamic human voting behavior, andend-game vote results. Analysis of learned graphical struc-tures reveals patterns of action dependence not directly re-flected in the original experiment networks.

Categories and Subject DescriptorsI.2 [Artificial Intelligence]: Multiagent Systems

General TermsExperimentation, Algorithms, Human Factors

Keywordsgraphical models, dynamic behavior, structure learning

1. INTRODUCTIONModeling dynamic behavior of multiple agents presents

inherent scaling problems due to the exponential size of anyenumerated representation of joint activity. Even if agentsmake decisions independently, conditioning actions on eachother’s prior decisions or on commonly observed history in-duces interdependencies over time. To address this complex-ity problem, researchers have exploited the localized effects

Appears in: Proceedings of the 11th International Con-ference on Autonomous Agents and Multiagent Systems(AAMAS 2012), Conitzer, Winikoff, Padgham, and van der Hoek (eds.),4-8 June 2012, Valencia, Spain.Copyright c© 2012, International Foundation for Autonomous Agents andMultiagent Systems (www.ifaamas.org). All rights reserved.

of agent decisions by employing graphical models of multia-gent behavior. This approach has produced several (related)graphical representations capturing various facets of multia-gent interaction [9, 11, 6, 5, 3]. History-dependent graphicalmultiagent models (hGMMs) [4] express multiagent behav-ior on an undirected graph, and capture dynamic relationsby conditioning actions on history.

Prior work on hGMMs presumes a fixed graph structuredefined by the modeler [4]. However, it is not always appar-ent how to choose the most salient inter-agent dependenciesfor accurate and tractable modeling. We seek methods forinducing hGMM structures from observational data aboutdynamic multiagent scenarios. In the process, we also ex-tend the flexibility of hGMMs by allowing distinct depen-dence structures for within-time and across-time probabilis-tic relationships.

We empirically evaluate our techniques with data fromlaboratory experiments on dynamic consensus [8]. Humansubjects were arranged on a network, specifying for eachsubject (also called agent) the set of others whose currentchoices are observable. The network associated with eachexperiment provides a basis for expecting that joint agentbehavior may exhibit some locality that we can exploit in agraphical model for prediction.

We stress that the graph structure of the optimal pre-dictive model need not mirror the experiment network ofthe voting scenario, and moreover, the complex experimentnetwork instances we study render computation on the cor-responding hGMMs intractable. Therefore, we attempt tolearn the graphical structure and parameters of an hGMMthat can effectively and compactly capture joint dynamic be-havior. Using human subject data, we evaluate the learnedmodels’ predictions of voting behavior and compare theirperformance with those of different baseline multiagent mod-els. We generally find that models expressing joint behavioroutperform the alternatives, including models originally pro-posed by authors of the dynamic consensus experiments, inpredicting voting dynamics. The joint behavior model pro-vides comparable predictions on the rate of reaching consen-sus, and superior predictions of which consensus is reached.We further examine the learned hGMM graphical structuresin order to gain insights about the dependencies driving vot-ing behavior, as well as the network structure’s effect oncollective action.

Sections 2 provides background information on hGMMs,and introduces our extension to the modeling framework.Section 3 describes the dynamic consensus experiments. Wepresent a variety of candidate model forms in Section 4. Sec-

!" !" !"

#$!" #" #%!"

Figure 1: An example hGMM over three time periods. Undirected edges capture correlation among agentsat a point in time. Directed edges (shown here only for agent 1) denote conditioning of an agent’s action onothers’ past actions.

tion 5 provides motivations and details of our greedy modellearning algorithm that simultaneously estimates a model’sparameters and constructs its interaction graph. Our em-pirical study in Section 6 compares different models acrossthree experiment settings, and examines the learned graphstructures against the original experiment networks.

2. HISTORY-DEPENDENT GMMSWe model behavior of n agents over a time interval di-

vided into discrete periods, [0, . . . , T ]. At time period t,agent i ∈ {1, . . . , n} chooses an action ati from its action do-main, Ai, according to its strategy, σi. Agents can observeothers’ and their own past actions up to time t, as capturedin history Ht = {Ht1, . . . , Htn}, where Hti denotes the se-quence of actions agent i has taken by t. Limited memorycapacity or other computational constraints restrict an agentto focus attention on a subset of history Hti considered inits probabilistic choice of next action: ati ∼ σi(Hti ).

A history-dependent graphical multiagent model (hGMM)[4], hG = (V,E,A, π), is a graphical model with graph ele-ments V , a set of vertices representing the n agents, and E,edges capturing pairwise interactions between them. Com-ponent A = (Ai, . . . , An) represents the action domains, andπ = (π1, . . . , πn) potential functions for each agent. Thegraph defines a neighborhood for each agent i: Ni = {j |(i, j) ∈ E}∪{i}, including i and its neighbors N−i = Ni\{i}.

The hGMM representation captures agent interactions indynamic scenarios by conditioning joint agent behavior onan abstracted history of actions Ht. The history availableto agent i, HtNi , is the subset of H

t pertaining to agentsin Ni. Each agent i is associated with a potential functionπi(a

tNi| HtNi):

∏j∈Ni Aj → R

+. The potential of a localaction configuration specifies its likelihood of being includedin the global outcome, conditional on history. Specifically,the joint distribution of the system’s actions taken at timet is the normalized product of neighbor potentials [2, 4, 7]:

Pr(at | Ht) =∏i πi(a

tNi| HtNi)

Z. (1)

The complexity of computing the normalization factor Zin (1) is exponential in the number of agents, and thus pre-cludes exact inference and learning in large models. We ap-proximate Z using the belief propagation method [1], whichhas shown good results with reasonable time in sparse cyclicgraphical structures.

We extend the original hGMM representation by distin-guishing between within-time and across-time dependencies,

as depicted in Figure 1. Formally, we introduce a condition-ing set Γi for each i, denoting the set of agents whose histo-ries condition this agent’s potential function: πi(a

tNi| HtΓi).

The neighborhood Ni in this extension governs only thewithin-time probabilistic dependencies of node i. With re-spect to this extended model, the original hGMM [4] corre-sponds to the special case where Γi = Ni. The joint distri-bution of actions at time t can be rewritten as:

Pr(at | Ht) =∏i πi(a

tNi| HtΓi)

Z. (2)

3. DYNAMIC CONSENSUSWe evaluate our modeling framework with human-subject

data from a dynamic consensus game [8]. Each agent in thisgame chooses to vote either blue (0) or red (1), and canchange votes at any time. Agents are connected in a net-work, such that agent i can observe the votes of those inits observation neighborhood NOi . The scenario terminateswhen: (i) agents converge on action a ∈ {0, 1}, in which caseagent i receives reward ri(a) > 0, or (ii) they cannot agreeby the time limit T , in which case rewards are zero. Fig-ure 2 illustrates the dynamic behavior of an example votingexperiment network.

Agents have varying preferences for the possible consen-sus outcomes, reflected in their reward functions. As no-body gets any reward without a unanimous vote, agentshave to balance effort to promote their own preferred out-comes against the common goal to reach consensus. An-other important feature of the dynamic consensus game isthat agent i observes the votes of only those in its obser-vation neighborhood NOi , and all it is shown of the graphis the degree of each observation neighbor, and observationedges among them. This raises the question of how agentstake into account their neighbors’ voting patterns and theirpartial knowledge of the experiment network structure.

A series of human-subject experiments were conducted tostudy how people behave in 81 different instances of the vot-ing game [8]. The experimenters varied reward preferenceassignments and experiment network structure in these ex-periment instances, and thus were able to collect data aboutthese factors’ effects on the consensus voting results, and thestrategies employed. Figure 2 exhibits a run for the exper-imental network labeled power22, discussed below. Studygoals included developing models to predict a given sce-nario’s voting outcome, and if a consensus is reached, itsconvergence time. This problem also served as the founda-

!"

Figure 2: Time snapshots of a lab experiment run where the densely connected minority group that preferredred exerted strong influences on the blue-leaning majority. The minority group eventually succeeded inconverting all the initial (unfilled) blue votes to (filled) red votes.

tion for analysis of adaptive strategies and theoretical con-straints on convergence [10].

4. MODELING DYNAMIC VOTINGWe present four model forms designed to capture voting

behavior dynamics in the dynamic consensus experiments.All are expressible as hGMMs. Only the first exploits theflexibility of hGMMs to express dependence of actions withina neighborhood given history (2), hence we refer to this asthe joint behavior consensus model (JCM).

The other three forms model agent behaviors individually:for each agent we specify a probabilistic strategy σi(H

t) =Pr(ati | HtΓi). Such a formulation captures agent interac-tions by the conditioning of individual behavior on observedhistory. The agents’ actions are probabilistically dependent,but conditionally independent given this common history,yielding the joint distribution:

Pr(at | Ht) =∏i

σi(Ht). (3)

We refer to a dynamic multiagent model expressible by (3)as an individual behavior hGMM (IBMM). Conditional in-dependence given history is a compelling assumption for au-tonomous agents. Indeed, independent choice may even beconsidered definitional for autonomy. In practice, however,it is often infeasible to specify the entire history for condi-tioning due to finite memory and computational power, andthe assumption may not hold with respect to partial history.History abstraction generally introduces correlations amongagents actions, even if they are independently generated onfull history [4]. Nevertheless, assuming conditional indepen-dence between agents’ actions given history exponentiallyreduces the model’s complexity, or more specifically, the rep-resentational complexity of the joint probability distributionover the system’s actions.

The first of three IBMMs we present is designed as anindependent behavior version of the JCM; thus, we call itsimply the individual behavior consensus model (ICM). Theremaining two models are based on proposals and observa-tions from the original experimental analysis [8], and arelabeled proportional response model (PRM) and sticky pro-portional response model (sPRM), respectively.

4.1 Joint Behavior Consensus ModelBased on observations from the original experiment anal-

ysis, we seek to formulate a potential function for JCM thatcaptures the impact of past collective choices of i’s neigh-borhood, i’s own past voting patterns, and its relative pref-erence for each action.

First, we consider how to summarize a history HtΓi oflength h relevant to agent i. Let indicator I(ai, ak) = 1if ai = ak and 0 otherwise. We define f(ai, H

tΓi

) as thefrequency of action ai being chosen by other agents in i’sconditioning set, which by definition contains nodes whosepast influence how i chooses its action in the present,

f(ai, HtΓi) =

∑k∈Γi\{i}

∑t−1τ=t−h I(ai, a

τk) + �

h|Γi \ {i}|. (4)

We add � = 0.01 to the numerator to ensure that the fre-quency term does not vanish when ai does not appear inHtΓi .

Second, we capture agent i’s own update history in aninertia term,

I(ai, Hti ) ={t−maxτ

each i. The probabilistic ICM behavior is then given by:

Pr(ai | HtΓi) =1

Ziri(ai)f(ai, H

tΓi)

γI(ai, Hti )β .

The normalization ranges only over single-agent actions ai ∈Ai, thus Zi is easy to compute for this model.

4.3 Proportional Response ModelWe also consider for comparison the proportional response

model, PRM, suggested in the original dynamic consensusstudy [8] as a reasonably accurate predictor of their exper-iments’ final outcomes. PRM specifies that voter i choosesaction ai at time t with probability proportional to ri(ai)and g(ai, a

t−1Γi

), the number of i’s neighbors who chose ai inthe last time period,

Pr(ai | HtΓi) ∝ ri(ai)g(ai, at−1Γi

).

4.4 Sticky Proportional Response ModelPRM does not capture the subjects’ tendency to start

with their preferred option, reconsidering their votes only af-ter collecting additional information about their neighborsover several time periods [8]. Therefore, we introduce thesticky proportional response model, sPRM, which containsa parameter ρ ∈ [−1, 1] reflecting an agent’s stubbornnessin maintaining its preferred option, regardless of observedneighbors’ past choices. Intuitively, an agent’s inherent biastoward its preferred option decays proportionally until thereis no bias:

Pr(ai | HtΓi) ∝ ri(ai)g(ai, at−1Γi

)(1 +Imaxai ρ

t),

where Imaxai = 1 if ai = arg max ri(a) and Imaxai = 0 other-

wise.

5. LEARNING

5.1 Parameter LearningWe first address the problem of learning the parameters of

an hGMM hG given the underlying graphical structure anddata in the form of a set of joint actions for m time steps,X = (a0, . . . , am). For ease of exposition, let θ denote theset of all the parameters that define the hGMM’s potentialfunctions. We seek to find θ maximizing the log likelihoodof X,

LhG(X; θ) =

m−h∑k=0

ln(Pr hG(ak+h | (ak, . . . , ak+h−1)); θ)).

We use gradient ascent to update the parameters: θ ←θ + λ∇θ, where the gradient is ∇θ = ∂LhG (X;θ)

∂θ, and λ is

the learning rate, stopping when the gradient is below somethreshold. We employ this same technique to learn the pa-rameters of all model forms, except for the PRM which con-tains no parameters, in our study.

5.2 Structure LearningEach of the consensus voting experiments involves 36 hu-

man subjects. The largest neighborhood size in these gamesranges from 16 to 20, rendering computing exact data likeli-hood for a joint behavior model of this complexity (requiredfor parameter learning described above) infeasible. Prelimi-nary trials with the belief propagation approximation algo-rithm [1] on these models, where N = Γ = NO, indicated

that this computational saving would still be insufficient foreffective learning. Thus, we need to employ models withsimpler graphs in order to take advantage of hGMMs’ ex-pressiveness in representing joint behavior. Toward this end,we developed a structure learning algorithm that producesgraphs for hGMMs within specified complexity constraints.

Though dictated by computational necessity, automatedstructure learning has additional advantages. First, there isno inherent reason that the observation graph should con-stitute the ideal structure for a predictive graphical modelfor agent behavior. In other words, the most effective N isnot necessarily the same as NO. Since actual agent behav-ior is naturally conditioned on its observable history, we doassume that the conditioning set coincides with the observa-tion neighborhood, Γ = NO. Nevertheless, once we abstractthe history representation it may well turn out that non-localhistorical activity provides more useful predictive informa-tion. If so, the structure of the learned graph that defineseach i’s within-time neighborhood may provide interestinginsights on the agents’ networked behavior.

Our structure learning algorithm addresses the problemof learning Ni for every i, taking Γi = N

Oi as fixed. Note

that the IBMMs described in Section 4 impose Ni = {i}for each i, and thus do not need to learn the within-timegraphs. Starting from an empty graph, we greedily addedges to improve the log-likelihood of the training data, sub-ject to a constraint that the maximum node degree not ex-ceed a specified bound dmax. Since the set of edges E isthe only structural model feature that changes during oursearch, we use LE(X; θ) to abbreviate LhG(X; θ) as inducedby the hGMM hG = (V,E,A,Γ,π). We have found thatthe optimal settings of our parameters θ = (β, γ) is insen-sitive to within-time dependencies, hence we apply the pa-rameter learning operation (Section 5.1) only once, at thebeginning of our search. The algorithm is defined formallybelow.

1: E ← ∅2: Use gradient descent to identify θ ≈ arg maxLE(X; θ).3: Ẽ ← {(i, j) | i ∈ V, j ∈ V }4: repeat5: newedge ← false6: (i∗, j∗)← arg max(i,j)∈Ẽ LE∪(i,j)(X; θ)7: if LE∪(i∗,j∗)(X; θ) ≥ LE(X; θ) then8: E ← E ∪ {(i∗, j∗)}9: newedge ← true

10: end if11: Ẽ ← Ẽ \ {(i∗, j∗)} \ {(i, j) | max(|Ni|, |Nj | = dmax}12: until Ẽ = ∅ ∨ newedge = false

5.3 EvaluationWe evaluate the learned multiagent models by their abil-

ity to predict future outcomes, as represented by a test setY . Given two models M1 and M2, we compute their cor-responding log-likelihood measures for the test data set Y :LM1(Y ) and LM2(Y ). Note that since log-likelihood is neg-ative, we instead examine the negative log-likelihood mea-sures, which means that M1 is better than M2 predicting Yif −LM1(Y ) < −LM2(Y ), and vice versa.

6. EMPIRICAL STUDYWe empirically evaluate the predictive power of JCMs in

comparison with ICMs, PRMs, and sPRMs, using the dy-

h=1 h=5

power22, delta=0.5ne

gativ

e lo

g lik

elih

ood

05

1015

2025

h=1 h=5

power22, delta=1.5

nega

tive

log

likel

ihoo

d

05

1015

h=1 h=5

coER_2, delta=0.5

nega

tive

log

likel

ihoo

d

05

1020

30

h=1 h=5

coER_2, delta=1.5

nega

tive

log

likel

ihoo

d

05

1015

2025

h=1 h=5

coPA_2, delta=0.5

nega

tive

log

likel

ihoo

d

05

1020

JCMICMPRMsPRM

h=1 h=5

coPA_2, delta=1.5

nega

tive

log

likel

ihoo

d

05

1015

20

JCMICMPRMsPRM

Figure 3: JCMs provide better predictions of the system’s dynamics than ICMs, PRMs, and sPRMs in twelvesettings: the three experiment networks power22 (left), coER 2 (middle), and coPA 2 (right), each for twohistory lengths, using time discretization intervals δ = 0.5 (top) and δ = 1.5 (bottom). The prediction qualitydifferences between JCM and ICM are significant (p < 0.025) in all scenarios.

namic consensus experiment data [8]. We also examine thegraphs induced by structure learning, and relate them tothe corresponding observation networks by various statisti-cal measures.

6.1 Experiment SettingsThe human-subject experiments are divided into nine dif-

ferent sets, each associated with a network structure. Thesestructures differ qualitatively in various ways, characterizedby node degree distribution, ratio of inter-group and intra-group edges, and the existence of a well-connected minority[8]. In particular, networks whose edges are generated bya random Erdos-Renyi (ER) process have a notably moreheavy-tailed degree distribution than those generated bya preferential attachment (PA) process. For each exper-imental trial, human subjects were randomly assigned tonodes in the designated network structure, and preferencesbased on one of three possible incentive schemes. Since sub-jects in these experiments can change their votes at anytime, the resulting data is a stream of asynchronous vote ac-tions. We discretize these streams for data analysis, record-ing the subjects’ votes at the end of each time interval oflength δ seconds. Our experiments examine interval lengthsδ ∈ {0.5, 1.5}.

In our study, we learn predictive models for each experi-ment network structure, pooling data across subject assign-ments and incentive schemes. This approach is based on thepremise that network structure is the main factor govern-ing the system’s collective behavior, in line with the originalstudy findings [8]. In each experiment set, we use eight ofthe nine trials for training the predictive models for eachform. The within-time graphs are learned with node degreeconstraint dmax = 10. We then evaluate the models basedon their predictions over a test set comprising the left-out

experimental trial. This process is repeated five times, witha different randomly chosen trial reserved for testing. Eachdata point in our reported empirical results averages overthese five repetitions.

Using the original experiment labels, we distinguish threeexperiment networks according to their graph generator pro-cesses and the existence of a minority group of well-connectednodes that share the same vote preference (see Table 1).

Table 1: Voting Experiment SettingsLabel Strong Minority Graph Generator Process

coER 2 No Erdos-RenyicoPA 2 No Preferential attachmentpower22 Yes Preferential attachment

6.2 Predictions

JCM ICM oJCM

nega

tive

log

likel

ihoo

d

01

23

45

6

JCM ICM oJCM experiment

blue red

cons

ensu

s pr

obab

ility

0.0

0.2

0.4

0.6

0.8

Figure 5: oJCMs provide worse predictions thanJCMs and ICMs for both the system’s dynamics andend-game results (power22, h = 1 and δ = 0.5).

JCM ICM PRM experiment

power22, delta=0.5co

nsen

sus

prob

abili

ty

0.0

0.2

0.4

0.6

0.8

1.0


power22, delta=1.5

cons

ensu

s pr

obab

ility

0.0

0.2

0.4

0.6

0.8

1.0


coER_2, delta=0.5

cons

ensu

s pr

obab

ility

0.0

0.2

0.4

0.6

0.8

1.0


coER_2, delta=1.5

cons

ensu

s pr

obab

ility

0.0

0.2

0.4

0.6

0.8

1.0


bluered

coPA_2, delta=0.5

cons

ensu

s pr

obab

ility

0.0

0.2

0.4

0.6

0.8

1.0


coPA_2, delta=1.5

cons

ensu

s pr

obab

ility

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4: JCM predictions on the probability of reaching consensus are lower than predictions from ICMs andPRMs, as well as experiment outcomes. However, the JCM is significantly more accurate than ICMs or PRMson predicting the ultimate consensus colors.

We first examine predictions of subjects’ votes in eachtime period conditional on available history. A compari-son of four models on twelve scenarios is presented in Fig-ure 3. We measure predictive performance by negative log-likelihood of the test data, according to the respective mod-els. JCMs perform better than ICMs, PRMs, sPRMs, inpredicting dynamic behavior in the dynamic consensus ex-periments for all three experiment settings, given data dis-cretized at interval lengths of 0.5 and 1.5 (differences signifi-cant at p < 0.025). Both the JCM and ICM representations,which share similar fundamental elements, handily outper-form PRM and its sticky version sPRM.

Contrary to the expectation that the less historical infor-mation a model uses, the lower its prediction performance,JCMs and ICMs that employ only the last h = 1 periodof historical data generate similar predictions to those withh = 5. This phenomenon is likely a consequence of theheuristic nature of the frequency function (4), and moreovermay indicate that some human subjects take into accountonly a short history of their neighbors’ actions when choos-ing their own actions. All models perform worse with thelarger time interval δ = 1.5, which is unsurprising in thatthe coarser discretization entails aggregating data. Moresalient is that the results are qualitatively identical for thetwo δ settings, further illustrating the robustness of our find-ings. These results in general demonstrate JCMs’ ability tocapture joint dynamic behavior, especially behavior inter-dependencies induced by limited historical information, asopposed to the IBMM alternatives.

We next evaluate the models’ ability to predict the endstate of a dynamic consensus experiment. As noted above,the original aim of modeling in these domains was to predictthis final outcome. For a particular model M , we start asimulation run with agents choosing their preferred colors,and then proceed to draw samples from M for each timeperiod until a consensus is reached or the number of timeperiods exceeds the time limit. We average over 100 runinstances for each environment setting and model. As we do

not observe any considerably qualitative differences in themodels’ end-game predictions for different history lengthsh, we choose to display only results for h = 1 henceforth.

The proportion of simulation instances reaching consen-sus induced by ICMs and PRMs correlates with observedexperiment results, as shown in Figure 4.1 Simulated runsdrawn from JCMs converge to consensus at lower rates thanin ICMs, PRMs, and human-subject experiments in general.However, their end-game predictions improve with greaterδ = 1.5, especially in the power22 setting where JCMs pre-dict the experiment outcomes almost exactly. A closer lookat the end-game prediction results reveals a different pictureabout the relative performances of the three models. In par-ticular, the individual behavior models’ predictions on thefinal consensus color are considerably out of line with theactual experiments for both coER 2 and power22, renderingthem ineffective in predicting end-game color results. JCMs,on the other hand, provide significantly more accurate pre-dictions on the consensus color in the power22 setting. Theratio between blue and red consensus instances by JCMs incoPA 2 resembles that of the actual experiments more thanICMs and PRMs. In the coER 2 setting all models’ predic-tions on the favored consensus color (blue) miss the actualexperiments’ favored consensus color (red), though the ra-tio of red-to-blue consensus predicted by JCM is less skewedthan that of ICMs and PRMs.

Last, we demonstrate the benefits of our extension to theoriginal hGMM representation by comparing the JCM rep-resentation against oJCM, which retains the original hGMMdefinition, assuming that the conditioning set is identical tothe learned within-time neighborhood: Γ = N. Figure 5shows that oJCMs perform worse than both JCMs and ICMsin predicting the system’s votes for each time period andend-game results, for the power22 setting with h = 1 and

1End-game results from sPRMs are similar to those fromPRMs, and not shown here.

δ = 0.5.2 Moreover, we note that the resulting graphs byoJCMs contain disconnected node subsets, which potentiallyprevent vote decisions to propagate throughout the network,causing failures in producing any consensus instances.

6.3 Graph Analysis

prop

ortio

n of

all

edge

s

0.0

0.2

0.4

0.6

0.8

1.0

obse

rvat

ion (p

ower

22)

with

in−tim

e (p

ower

22)

obse

rvat

ion (c

oER_

2)

with

in−tim

e (c

oER_

2)

obse

rvat

ion (c

oPA_

2)

with

in−tim

e (c

oPA_

2)

intra red intra blue inter

Figure 6: Distributions of edges from three differentcategories, intra red, intra blue, and inter, in thegiven observation and learned within-time graphsfor JCM (δ = 0.5).

In this section, we seek to characterize the learned edgesthat define N in the JCM representation, and discover con-nections between the learned graphs and the aforementionedprediction results. First, we categorize edges by their end-point nodes’ vote preferences: we refer to edges that connecttwo red (blue) nodes as intra red (blue), and those betweenred and blue nodes as inter edges. Figure 6 presents theproportion of each edge type in both the given observationgraphs and the learned within-time graphs. While a ma-jority of edges in the observation graphs are inter edges,the within-time graphs that define N consist mostly of intraedges. That is, there are more interdependencies in JCMsamong agents of the same preference than among conflictingagents. The ability to discover these inter edges and incor-porate the information they carry in its joint action distri-bution may help the JCM representation to better capturedynamic behavior and end-game results, as illustrated anddiscussed in Section 6.2. For the power22 setting in particu-lar, JCMs often assign a majority of edges as intra red, andthus effectively identify the presence of a strongly connectedred minority who dictated end-game colors in the actualexperiments. This construction allows JCMs to predict end-game consensus colors much more accurately than ICMs andPRMs, which rely entirely on the observation graphs.

We further investigate whether these proportion measuresprovide any predictions on the number of consensus instancesinduced by JCMs. We pool data from the three experi-ment settings—power22, coPA 2, and coER 2—and com-pute a simple linear regression of the number of red (blue)consensus instances with respect to the proportion of intrared (blue) edges. The resulting regression coefficients arestatistically significant for both blue and red (p < 0.05).

2We also obtain similar results for oJCMs in other experi-ment settings and environment parameters, which are notshown here.

Figure 7 suggests that a weak positive correlation betweenthe within-time graphs’ intra edges and the number of con-sensus instances. Intuitively, more interdependence betweensame-preference nodes allows them to have more influenceon one another, helping to diffuse vote choices more rapidlythroughout the system.

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

proportion of intra blue edges

blue

con

sens

us in

stan

ces

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

proportion of intra red edges

red

cons

ensu

s in

stan

ces

Figure 7: The number of consensus instances in blue(left) and red (right), and proportion of JCM intraedges of the corresponding colors.

We next examine JCM edges in terms of how far apartare the nodes they connect in the observation graph. Letφi,j ≥ 1 denote the length of shortest path from i to j inthe observation graph. Given a graph G on the same set ofnodes, we can calculate the proportion of edges in G thatconnect nodes separated by a certain distance in the origi-nal observation graph. Figure 8 presents the profile of suchdistances for pairs of nodes in the learned JCMs. For com-parison, the profiles labeled “fully connected” simply reflectthe distribution of node distances in the original observa-tion graph: most of the nodes are one hop or less apart fromeach other (φ ≤ 2), and the modal distance is φ = 2. Alarge majority of edges in the learned within-time graphshave φ = 2, that is, are close but not connected in the ob-servation graphs.

prop

ortio

n of

all

edge

s

0.0

0.2

0.4

0.6

0.8

1.0

fully

−con

necte

d (p

ower

22)

with

in−tim

e (p

ower

22)

fully

−con

necte

d (c

oER_

2)

with

in−tim

e (c

oER_

2)

fully

−con

necte

d (c

oPA_

2)

with

in−tim

e (c

oPA_

2)

phi=1 phi=2 phi=3 phi=4

Figure 8: Distributions of edges in the within-timegraphs based on the distance between their end-nodes φ in the observation graph (δ = 0.5).

Next we compare the assortativity [12] of the learned andoriginal graphs. A graph G’s assortativity coefficient in[−1, 1] captures the tendency for nodes to attach to othersthat are similar (positive values) or different (negative val-ues) in connectivity. As illustrated in Figure 9, the large dif-ference in assortativity for the power22 setting stresses the

JCM’s ability to discover interdependencies among agents’actions that are not captured in the observation graph. Inparticular, the resulting JCMs are able to capture actioncorrelations among nodes of similar degrees in the power22setting, where the minority nodes are more densely con-nected than the majority, confirming the findings by afore-mentioned graph analyses on intra and inter edges. We alsoinvestigate the sparsity of the learned graphs for differentvalues of δ. Our sparsity measure is the number of edgesin the learned within-time graph divided by the number ofedges in the corresponding observation graph. Figure 9 il-lustrates that the within-time graphs become sparser as thediscretization interval shrinks from 1.5 to 0.5 in all experi-ment settings. Intuitively, the finer grain the discretizationis, the fewer simultaneous vote changes there are in one timeperiod. As a result, there may be fewer interdependenciesamong agents’ actions, which explains the aforementionedrelations between discretization interval and graph sparsityacross all experiment settings.

power22 coER_2 coPA_2

observation within−time

asso

rtat

ivity

−1.

0−

0.5

0.0

0.5

1.0

power22 coER_2 coPA_2

delta=0.5 delta=1.5

with

in−

time

over

obs

erva

tion

edge

s

0.0

0.1

0.2

0.3

0.4

Figure 9: (left) Assortativity of the observationgraphs and the learned within-time graphs (δ = 0.5).(right) Sparsity of the within-time graphs.

7. CONCLUSIONSOur main result is a demonstration of the feasibility of

learning probabilistic models of dynamic multiagent behav-ior from real traces of agent activity on a network. To ac-complish this we extend the original hGMM framework [4],by distinguishing within-time dependencies from condition-ing sets, and introducing a structure-learning algorithm toinduce these dependencies from time-series data. We eval-uated our techniques by learning compact graphs captur-ing the dynamics of human-subject voting behavior on anetwork. Our investigation finds that the learned joint be-havior model provides better predictions of dynamic behav-ior than several individual behavior models, including theproportional-response models suggested by the original ex-perimental analysis. This provides evidence that express-ing joint behavior is important for dynamic modeling, evengiven partial history information for conditioning individualbehavior. Our graph analysis further reveals characteris-tics of the learned within-time graphs that provide insightsabout patterns of agent interdependence, and their relationto structure of the agent interaction network.

We plan to improve the learning algorithm for individ-ual behavior models, by replacing the maximum-degree con-straint with a cross-validation condition that can better helpavoid over-fitting. Given the formalism’s generality, we con-

sider it promising to apply our modeling technique to simi-lar problem domains, such as graph coloring, where agentsmust coordinate their actions or make collective decisionswhile only communicating with their neighbors, as well aslarge network scenarios, such as social networks and Internetprotocols.

8. REFERENCES[1] J. Y. Broadway, J. S. Yedidia, W. T. Freeman, and

Y. Weiss. Generalized belief propagation. InThirteenth Annual Conference on Advances in NeuralInformation Processing Systems, pages 689–695,Denver, 2000.

[2] C. Daskalakis and C. H. Papadimitriou. Computingpure Nash equilibria in graphical games via Markovrandom fields. In Seventh ACM conference onElectronic Commerce, pages 91–99, Ann Arbor, MI,2006.

[3] Q. Duong, M. P. Wellman, and S. Singh. Knowledgecombination in graphical multiagent models. InTwenty-Fourth Conference on Uncertainty in ArtificialIntelligence, pages 153–160, Helsinki, 2008.

[4] Q. Duong, M. P. Wellman, S. Singh, andY. Vorobeychik. History-dependent graphicalmultiagent models. In Ninth International Conferenceon Autonomous Agents and Multiagent Systems, pages1215–1222, Toronto, 2010.

[5] Y. Gal and A. Pfeffer. Networks of influence diagrams:A formalism for representing agents’ beliefs anddecision-making processes. Journal of ArtificialIntelligence Research, 33:109–147, 2008.

[6] A. X. Jiang, K. Leyton-Brown, and N. A. R. Bhat.Action-graph games. Games and Economic Behavior,71:141–173, 2010.

[7] S. Kakade, M. Kearns, J. Langford, and L. Ortiz.Correlated equilibria in graphical games. In FourthACM Conference on Electronic Commerce, pages42–47, San Jose, CA, 2003.

[8] M. Kearns, S. Judd, J. Tan, and J. Wortman.Behavioral experiments on biased voting in networks.Proceedings of the National Academy of Sciences,106(5):1347–1352, 2009.

[9] M. Kearns, M. L. Littman, and S. Singh. Graphicalmodels for game theory. In Seventeenth Conference onUncertainty in Artificial Intelligence, pages 253–260,Seattle, 2001.

[10] M. Kearns and J. Tan. Biased voting and theDemocratic primary problem. In Fourth InternationalWorkshop on Internet and Network Economics, pages639–652, Shanghai, 2008.

[11] D. Koller and B. Milch. Multi-agent influencediagrams for representing and solving games. Gamesand Economic Behavior, 45:181–221, 2003.

[12] M. E. J. Newman. Mixing patterns in networks.Physical Review E, 67(2), 2003.

Learning and Predicting Dynamic Networked Behavior with ...baveja/Papers/learnhgmm-AAMAS.pdfdynamic scenarios by conditioning joint agent behavior on an abstracted history of actions

Documents