-
Learning and Predicting Dynamic Networked Behaviorwith Graphical
Multiagent Models
Quang Duong† Michael P. Wellman† Satinder Singh† Michael
Kearns∗
{qduong,wellman,baveja}@umich.edu [email protected]†Computer
Science and Engineering, University of Michigan
∗Computer and Information Science, University of
Pennsylvania
ABSTRACTFactored models of multiagent systems address the
complex-ity of joint behavior by exploiting locality in agent
interac-tions. History-dependent graphical multiagent models
(hG-MMs) further capture dynamics by conditioning behavioron
history. The challenges of modeling real human behav-ior motivated
us to extend the hGMM representation bydistinguishing two types of
agent interactions. This dis-tinction opens the opportunity for
learning dependence net-works that are different from given
graphical structures rep-resenting observed agent interactions. We
propose a greedyalgorithm for learning hGMMs from time-series data,
in-ducing both graphical structure and parameters. Our em-pirical
study employs human-subject experiment data for adynamic consensus
scenario, where agents on a network at-tempt to reach a unanimous
vote. We show that the learnedhGMMs directly expressing joint
behavior outperform alter-natives in predicting dynamic human
voting behavior, andend-game vote results. Analysis of learned
graphical struc-tures reveals patterns of action dependence not
directly re-flected in the original experiment networks.
Categories and Subject DescriptorsI.2 [Artificial Intelligence]:
Multiagent Systems
General TermsExperimentation, Algorithms, Human Factors
Keywordsgraphical models, dynamic behavior, structure
learning
1. INTRODUCTIONModeling dynamic behavior of multiple agents
presents
inherent scaling problems due to the exponential size of
anyenumerated representation of joint activity. Even if agentsmake
decisions independently, conditioning actions on eachother’s prior
decisions or on commonly observed history in-duces
interdependencies over time. To address this complex-ity problem,
researchers have exploited the localized effects
Appears in: Proceedings of the 11th International Con-ference on
Autonomous Agents and Multiagent Systems(AAMAS 2012), Conitzer,
Winikoff, Padgham, and van der Hoek (eds.),4-8 June 2012, Valencia,
Spain.Copyright c© 2012, International Foundation for Autonomous
Agents andMultiagent Systems (www.ifaamas.org). All rights
reserved.
of agent decisions by employing graphical models of multia-gent
behavior. This approach has produced several (related)graphical
representations capturing various facets of multia-gent interaction
[9, 11, 6, 5, 3]. History-dependent graphicalmultiagent models
(hGMMs) [4] express multiagent behav-ior on an undirected graph,
and capture dynamic relationsby conditioning actions on
history.
Prior work on hGMMs presumes a fixed graph structuredefined by
the modeler [4]. However, it is not always appar-ent how to choose
the most salient inter-agent dependenciesfor accurate and tractable
modeling. We seek methods forinducing hGMM structures from
observational data aboutdynamic multiagent scenarios. In the
process, we also ex-tend the flexibility of hGMMs by allowing
distinct depen-dence structures for within-time and across-time
probabilis-tic relationships.
We empirically evaluate our techniques with data fromlaboratory
experiments on dynamic consensus [8]. Humansubjects were arranged
on a network, specifying for eachsubject (also called agent) the
set of others whose currentchoices are observable. The network
associated with eachexperiment provides a basis for expecting that
joint agentbehavior may exhibit some locality that we can exploit
in agraphical model for prediction.
We stress that the graph structure of the optimal pre-dictive
model need not mirror the experiment network ofthe voting scenario,
and moreover, the complex experimentnetwork instances we study
render computation on the cor-responding hGMMs intractable.
Therefore, we attempt tolearn the graphical structure and
parameters of an hGMMthat can effectively and compactly capture
joint dynamic be-havior. Using human subject data, we evaluate the
learnedmodels’ predictions of voting behavior and compare
theirperformance with those of different baseline multiagent
mod-els. We generally find that models expressing joint
behavioroutperform the alternatives, including models originally
pro-posed by authors of the dynamic consensus experiments,
inpredicting voting dynamics. The joint behavior model pro-vides
comparable predictions on the rate of reaching consen-sus, and
superior predictions of which consensus is reached.We further
examine the learned hGMM graphical structuresin order to gain
insights about the dependencies driving vot-ing behavior, as well
as the network structure’s effect oncollective action.
Sections 2 provides background information on hGMMs,and
introduces our extension to the modeling framework.Section 3
describes the dynamic consensus experiments. Wepresent a variety of
candidate model forms in Section 4. Sec-
-
!" !" !"
#$!" #" #%!"
Figure 1: An example hGMM over three time periods. Undirected
edges capture correlation among agentsat a point in time. Directed
edges (shown here only for agent 1) denote conditioning of an
agent’s action onothers’ past actions.
tion 5 provides motivations and details of our greedy
modellearning algorithm that simultaneously estimates a
model’sparameters and constructs its interaction graph. Our
em-pirical study in Section 6 compares different models acrossthree
experiment settings, and examines the learned graphstructures
against the original experiment networks.
2. HISTORY-DEPENDENT GMMSWe model behavior of n agents over a
time interval di-
vided into discrete periods, [0, . . . , T ]. At time period
t,agent i ∈ {1, . . . , n} chooses an action ati from its action
do-main, Ai, according to its strategy, σi. Agents can
observeothers’ and their own past actions up to time t, as
capturedin history Ht = {Ht1, . . . , Htn}, where Hti denotes the
se-quence of actions agent i has taken by t. Limited memorycapacity
or other computational constraints restrict an agentto focus
attention on a subset of history Hti considered inits probabilistic
choice of next action: ati ∼ σi(Hti ).
A history-dependent graphical multiagent model (hGMM)[4], hG =
(V,E,A, π), is a graphical model with graph ele-ments V , a set of
vertices representing the n agents, and E,edges capturing pairwise
interactions between them. Com-ponent A = (Ai, . . . , An)
represents the action domains, andπ = (π1, . . . , πn) potential
functions for each agent. Thegraph defines a neighborhood for each
agent i: Ni = {j |(i, j) ∈ E}∪{i}, including i and its neighbors
N−i = Ni\{i}.
The hGMM representation captures agent interactions indynamic
scenarios by conditioning joint agent behavior onan abstracted
history of actions Ht. The history availableto agent i, HtNi , is
the subset of H
t pertaining to agentsin Ni. Each agent i is associated with a
potential functionπi(a
tNi| HtNi):
∏j∈Ni Aj → R
+. The potential of a localaction configuration specifies its
likelihood of being includedin the global outcome, conditional on
history. Specifically,the joint distribution of the system’s
actions taken at timet is the normalized product of neighbor
potentials [2, 4, 7]:
Pr(at | Ht) =∏i πi(a
tNi| HtNi)
Z. (1)
The complexity of computing the normalization factor Zin (1) is
exponential in the number of agents, and thus pre-cludes exact
inference and learning in large models. We ap-proximate Z using the
belief propagation method [1], whichhas shown good results with
reasonable time in sparse cyclicgraphical structures.
We extend the original hGMM representation by distin-guishing
between within-time and across-time dependencies,
as depicted in Figure 1. Formally, we introduce a condition-ing
set Γi for each i, denoting the set of agents whose histo-ries
condition this agent’s potential function: πi(a
tNi| HtΓi).
The neighborhood Ni in this extension governs only
thewithin-time probabilistic dependencies of node i. With re-spect
to this extended model, the original hGMM [4] corre-sponds to the
special case where Γi = Ni. The joint distri-bution of actions at
time t can be rewritten as:
Pr(at | Ht) =∏i πi(a
tNi| HtΓi)
Z. (2)
3. DYNAMIC CONSENSUSWe evaluate our modeling framework with
human-subject
data from a dynamic consensus game [8]. Each agent in thisgame
chooses to vote either blue (0) or red (1), and canchange votes at
any time. Agents are connected in a net-work, such that agent i can
observe the votes of those inits observation neighborhood NOi . The
scenario terminateswhen: (i) agents converge on action a ∈ {0, 1},
in which caseagent i receives reward ri(a) > 0, or (ii) they
cannot agreeby the time limit T , in which case rewards are zero.
Fig-ure 2 illustrates the dynamic behavior of an example
votingexperiment network.
Agents have varying preferences for the possible consen-sus
outcomes, reflected in their reward functions. As no-body gets any
reward without a unanimous vote, agentshave to balance effort to
promote their own preferred out-comes against the common goal to
reach consensus. An-other important feature of the dynamic
consensus game isthat agent i observes the votes of only those in
its obser-vation neighborhood NOi , and all it is shown of the
graphis the degree of each observation neighbor, and
observationedges among them. This raises the question of how
agentstake into account their neighbors’ voting patterns and
theirpartial knowledge of the experiment network structure.
A series of human-subject experiments were conducted tostudy how
people behave in 81 different instances of the vot-ing game [8].
The experimenters varied reward preferenceassignments and
experiment network structure in these ex-periment instances, and
thus were able to collect data aboutthese factors’ effects on the
consensus voting results, and thestrategies employed. Figure 2
exhibits a run for the exper-imental network labeled power22,
discussed below. Studygoals included developing models to predict a
given sce-nario’s voting outcome, and if a consensus is reached,
itsconvergence time. This problem also served as the founda-
-
!"
Figure 2: Time snapshots of a lab experiment run where the
densely connected minority group that preferredred exerted strong
influences on the blue-leaning majority. The minority group
eventually succeeded inconverting all the initial (unfilled) blue
votes to (filled) red votes.
tion for analysis of adaptive strategies and theoretical
con-straints on convergence [10].
4. MODELING DYNAMIC VOTINGWe present four model forms designed
to capture voting
behavior dynamics in the dynamic consensus experiments.All are
expressible as hGMMs. Only the first exploits theflexibility of
hGMMs to express dependence of actions withina neighborhood given
history (2), hence we refer to this asthe joint behavior consensus
model (JCM).
The other three forms model agent behaviors individually:for
each agent we specify a probabilistic strategy σi(H
t) =Pr(ati | HtΓi). Such a formulation captures agent
interac-tions by the conditioning of individual behavior on
observedhistory. The agents’ actions are probabilistically
dependent,but conditionally independent given this common
history,yielding the joint distribution:
Pr(at | Ht) =∏i
σi(Ht). (3)
We refer to a dynamic multiagent model expressible by (3)as an
individual behavior hGMM (IBMM). Conditional in-dependence given
history is a compelling assumption for au-tonomous agents. Indeed,
independent choice may even beconsidered definitional for autonomy.
In practice, however,it is often infeasible to specify the entire
history for condi-tioning due to finite memory and computational
power, andthe assumption may not hold with respect to partial
history.History abstraction generally introduces correlations
amongagents actions, even if they are independently generated
onfull history [4]. Nevertheless, assuming conditional
indepen-dence between agents’ actions given history
exponentiallyreduces the model’s complexity, or more specifically,
the rep-resentational complexity of the joint probability
distributionover the system’s actions.
The first of three IBMMs we present is designed as anindependent
behavior version of the JCM; thus, we call itsimply the individual
behavior consensus model (ICM). Theremaining two models are based
on proposals and observa-tions from the original experimental
analysis [8], and arelabeled proportional response model (PRM) and
sticky pro-portional response model (sPRM), respectively.
4.1 Joint Behavior Consensus ModelBased on observations from the
original experiment anal-
ysis, we seek to formulate a potential function for JCM
thatcaptures the impact of past collective choices of i’s
neigh-borhood, i’s own past voting patterns, and its relative
pref-erence for each action.
First, we consider how to summarize a history HtΓi oflength h
relevant to agent i. Let indicator I(ai, ak) = 1if ai = ak and 0
otherwise. We define f(ai, H
tΓi
) as thefrequency of action ai being chosen by other agents in
i’sconditioning set, which by definition contains nodes whosepast
influence how i chooses its action in the present,
f(ai, HtΓi) =
∑k∈Γi\{i}
∑t−1τ=t−h I(ai, a
τk) + �
h|Γi \ {i}|. (4)
We add � = 0.01 to the numerator to ensure that the fre-quency
term does not vanish when ai does not appear inHtΓi .
Second, we capture agent i’s own update history in aninertia
term,
I(ai, Hti ) ={t−maxτ
-
each i. The probabilistic ICM behavior is then given by:
Pr(ai | HtΓi) =1
Ziri(ai)f(ai, H
tΓi)
γI(ai, Hti )β .
The normalization ranges only over single-agent actions ai ∈Ai,
thus Zi is easy to compute for this model.
4.3 Proportional Response ModelWe also consider for comparison
the proportional response
model, PRM, suggested in the original dynamic consensusstudy [8]
as a reasonably accurate predictor of their exper-iments’ final
outcomes. PRM specifies that voter i choosesaction ai at time t
with probability proportional to ri(ai)and g(ai, a
t−1Γi
), the number of i’s neighbors who chose ai inthe last time
period,
Pr(ai | HtΓi) ∝ ri(ai)g(ai, at−1Γi
).
4.4 Sticky Proportional Response ModelPRM does not capture the
subjects’ tendency to start
with their preferred option, reconsidering their votes only
af-ter collecting additional information about their neighborsover
several time periods [8]. Therefore, we introduce thesticky
proportional response model, sPRM, which containsa parameter ρ ∈
[−1, 1] reflecting an agent’s stubbornnessin maintaining its
preferred option, regardless of observedneighbors’ past choices.
Intuitively, an agent’s inherent biastoward its preferred option
decays proportionally until thereis no bias:
Pr(ai | HtΓi) ∝ ri(ai)g(ai, at−1Γi
)(1 +Imaxai ρ
t),
where Imaxai = 1 if ai = arg max ri(a) and Imaxai = 0 other-
wise.
5. LEARNING
5.1 Parameter LearningWe first address the problem of learning
the parameters of
an hGMM hG given the underlying graphical structure anddata in
the form of a set of joint actions for m time steps,X = (a0, . . .
, am). For ease of exposition, let θ denote theset of all the
parameters that define the hGMM’s potentialfunctions. We seek to
find θ maximizing the log likelihoodof X,
LhG(X; θ) =
m−h∑k=0
ln(Pr hG(ak+h | (ak, . . . , ak+h−1)); θ)).
We use gradient ascent to update the parameters: θ ←θ + λ∇θ,
where the gradient is ∇θ = ∂LhG (X;θ)
∂θ, and λ is
the learning rate, stopping when the gradient is below
somethreshold. We employ this same technique to learn the
pa-rameters of all model forms, except for the PRM which con-tains
no parameters, in our study.
5.2 Structure LearningEach of the consensus voting experiments
involves 36 hu-
man subjects. The largest neighborhood size in these gamesranges
from 16 to 20, rendering computing exact data likeli-hood for a
joint behavior model of this complexity (requiredfor parameter
learning described above) infeasible. Prelimi-nary trials with the
belief propagation approximation algo-rithm [1] on these models,
where N = Γ = NO, indicated
that this computational saving would still be insufficient
foreffective learning. Thus, we need to employ models withsimpler
graphs in order to take advantage of hGMMs’ ex-pressiveness in
representing joint behavior. Toward this end,we developed a
structure learning algorithm that producesgraphs for hGMMs within
specified complexity constraints.
Though dictated by computational necessity, automatedstructure
learning has additional advantages. First, there isno inherent
reason that the observation graph should con-stitute the ideal
structure for a predictive graphical modelfor agent behavior. In
other words, the most effective N isnot necessarily the same as NO.
Since actual agent behav-ior is naturally conditioned on its
observable history, we doassume that the conditioning set coincides
with the observa-tion neighborhood, Γ = NO. Nevertheless, once we
abstractthe history representation it may well turn out that
non-localhistorical activity provides more useful predictive
informa-tion. If so, the structure of the learned graph that
defineseach i’s within-time neighborhood may provide
interestinginsights on the agents’ networked behavior.
Our structure learning algorithm addresses the problemof
learning Ni for every i, taking Γi = N
Oi as fixed. Note
that the IBMMs described in Section 4 impose Ni = {i}for each i,
and thus do not need to learn the within-timegraphs. Starting from
an empty graph, we greedily addedges to improve the log-likelihood
of the training data, sub-ject to a constraint that the maximum
node degree not ex-ceed a specified bound dmax. Since the set of
edges E isthe only structural model feature that changes during
oursearch, we use LE(X; θ) to abbreviate LhG(X; θ) as inducedby the
hGMM hG = (V,E,A,Γ,π). We have found thatthe optimal settings of
our parameters θ = (β, γ) is insen-sitive to within-time
dependencies, hence we apply the pa-rameter learning operation
(Section 5.1) only once, at thebeginning of our search. The
algorithm is defined formallybelow.
1: E ← ∅2: Use gradient descent to identify θ ≈ arg maxLE(X;
θ).3: Ẽ ← {(i, j) | i ∈ V, j ∈ V }4: repeat5: newedge ← false6:
(i∗, j∗)← arg max(i,j)∈Ẽ LE∪(i,j)(X; θ)7: if LE∪(i∗,j∗)(X; θ) ≥
LE(X; θ) then8: E ← E ∪ {(i∗, j∗)}9: newedge ← true
10: end if11: Ẽ ← Ẽ \ {(i∗, j∗)} \ {(i, j) | max(|Ni|, |Nj | =
dmax}12: until Ẽ = ∅ ∨ newedge = false
5.3 EvaluationWe evaluate the learned multiagent models by their
abil-
ity to predict future outcomes, as represented by a test setY .
Given two models M1 and M2, we compute their cor-responding
log-likelihood measures for the test data set Y :LM1(Y ) and LM2(Y
). Note that since log-likelihood is neg-ative, we instead examine
the negative log-likelihood mea-sures, which means that M1 is
better than M2 predicting Yif −LM1(Y ) < −LM2(Y ), and vice
versa.
6. EMPIRICAL STUDYWe empirically evaluate the predictive power
of JCMs in
comparison with ICMs, PRMs, and sPRMs, using the dy-
-
h=1 h=5
power22, delta=0.5ne
gativ
e lo
g lik
elih
ood
05
1015
2025
h=1 h=5
power22, delta=1.5
nega
tive
log
likel
ihoo
d
05
1015
h=1 h=5
coER_2, delta=0.5
nega
tive
log
likel
ihoo
d
05
1020
30
h=1 h=5
coER_2, delta=1.5
nega
tive
log
likel
ihoo
d
05
1015
2025
h=1 h=5
coPA_2, delta=0.5
nega
tive
log
likel
ihoo
d
05
1020
JCMICMPRMsPRM
h=1 h=5
coPA_2, delta=1.5
nega
tive
log
likel
ihoo
d
05
1015
20
JCMICMPRMsPRM
Figure 3: JCMs provide better predictions of the system’s
dynamics than ICMs, PRMs, and sPRMs in twelvesettings: the three
experiment networks power22 (left), coER 2 (middle), and coPA 2
(right), each for twohistory lengths, using time discretization
intervals δ = 0.5 (top) and δ = 1.5 (bottom). The prediction
qualitydifferences between JCM and ICM are significant (p <
0.025) in all scenarios.
namic consensus experiment data [8]. We also examine thegraphs
induced by structure learning, and relate them tothe corresponding
observation networks by various statisti-cal measures.
6.1 Experiment SettingsThe human-subject experiments are divided
into nine dif-
ferent sets, each associated with a network structure.
Thesestructures differ qualitatively in various ways,
characterizedby node degree distribution, ratio of inter-group and
intra-group edges, and the existence of a well-connected
minority[8]. In particular, networks whose edges are generated bya
random Erdos-Renyi (ER) process have a notably moreheavy-tailed
degree distribution than those generated bya preferential
attachment (PA) process. For each exper-imental trial, human
subjects were randomly assigned tonodes in the designated network
structure, and preferencesbased on one of three possible incentive
schemes. Since sub-jects in these experiments can change their
votes at anytime, the resulting data is a stream of asynchronous
vote ac-tions. We discretize these streams for data analysis,
record-ing the subjects’ votes at the end of each time interval
oflength δ seconds. Our experiments examine interval lengthsδ ∈
{0.5, 1.5}.
In our study, we learn predictive models for each experi-ment
network structure, pooling data across subject assign-ments and
incentive schemes. This approach is based on thepremise that
network structure is the main factor govern-ing the system’s
collective behavior, in line with the originalstudy findings [8].
In each experiment set, we use eight ofthe nine trials for training
the predictive models for eachform. The within-time graphs are
learned with node degreeconstraint dmax = 10. We then evaluate the
models basedon their predictions over a test set comprising the
left-out
experimental trial. This process is repeated five times, witha
different randomly chosen trial reserved for testing. Eachdata
point in our reported empirical results averages overthese five
repetitions.
Using the original experiment labels, we distinguish
threeexperiment networks according to their graph generator
pro-cesses and the existence of a minority group of
well-connectednodes that share the same vote preference (see Table
1).
Table 1: Voting Experiment SettingsLabel Strong Minority Graph
Generator Process
coER 2 No Erdos-RenyicoPA 2 No Preferential attachmentpower22
Yes Preferential attachment
6.2 Predictions
JCM ICM oJCM
nega
tive
log
likel
ihoo
d
01
23
45
6
JCM ICM oJCM experiment
blue red
cons
ensu
s pr
obab
ility
0.0
0.2
0.4
0.6
0.8
Figure 5: oJCMs provide worse predictions thanJCMs and ICMs for
both the system’s dynamics andend-game results (power22, h = 1 and
δ = 0.5).
-
JCM ICM PRM experiment
power22, delta=0.5co
nsen
sus
prob
abili
ty
0.0
0.2
0.4
0.6
0.8
1.0
JCM ICM PRM experiment
power22, delta=1.5
cons
ensu
s pr
obab
ility
0.0
0.2
0.4
0.6
0.8
1.0
JCM ICM PRM experiment
coER_2, delta=0.5
cons
ensu
s pr
obab
ility
0.0
0.2
0.4
0.6
0.8
1.0
JCM ICM PRM experiment
coER_2, delta=1.5
cons
ensu
s pr
obab
ility
0.0
0.2
0.4
0.6
0.8
1.0
JCM ICM PRM experiment
bluered
coPA_2, delta=0.5
cons
ensu
s pr
obab
ility
0.0
0.2
0.4
0.6
0.8
1.0
JCM ICM PRM experiment
coPA_2, delta=1.5
cons
ensu
s pr
obab
ility
0.0
0.2
0.4
0.6
0.8
1.0
Figure 4: JCM predictions on the probability of reaching
consensus are lower than predictions from ICMs andPRMs, as well as
experiment outcomes. However, the JCM is significantly more
accurate than ICMs or PRMson predicting the ultimate consensus
colors.
We first examine predictions of subjects’ votes in eachtime
period conditional on available history. A compari-son of four
models on twelve scenarios is presented in Fig-ure 3. We measure
predictive performance by negative log-likelihood of the test data,
according to the respective mod-els. JCMs perform better than ICMs,
PRMs, sPRMs, inpredicting dynamic behavior in the dynamic consensus
ex-periments for all three experiment settings, given data
dis-cretized at interval lengths of 0.5 and 1.5 (differences
signifi-cant at p < 0.025). Both the JCM and ICM
representations,which share similar fundamental elements, handily
outper-form PRM and its sticky version sPRM.
Contrary to the expectation that the less historical
infor-mation a model uses, the lower its prediction
performance,JCMs and ICMs that employ only the last h = 1 periodof
historical data generate similar predictions to those withh = 5.
This phenomenon is likely a consequence of theheuristic nature of
the frequency function (4), and moreovermay indicate that some
human subjects take into accountonly a short history of their
neighbors’ actions when choos-ing their own actions. All models
perform worse with thelarger time interval δ = 1.5, which is
unsurprising in thatthe coarser discretization entails aggregating
data. Moresalient is that the results are qualitatively identical
for thetwo δ settings, further illustrating the robustness of our
find-ings. These results in general demonstrate JCMs’ ability
tocapture joint dynamic behavior, especially behavior
inter-dependencies induced by limited historical information,
asopposed to the IBMM alternatives.
We next evaluate the models’ ability to predict the endstate of
a dynamic consensus experiment. As noted above,the original aim of
modeling in these domains was to predictthis final outcome. For a
particular model M , we start asimulation run with agents choosing
their preferred colors,and then proceed to draw samples from M for
each timeperiod until a consensus is reached or the number of
timeperiods exceeds the time limit. We average over 100
runinstances for each environment setting and model. As we do
not observe any considerably qualitative differences in
themodels’ end-game predictions for different history lengthsh, we
choose to display only results for h = 1 henceforth.
The proportion of simulation instances reaching consen-sus
induced by ICMs and PRMs correlates with observedexperiment
results, as shown in Figure 4.1 Simulated runsdrawn from JCMs
converge to consensus at lower rates thanin ICMs, PRMs, and
human-subject experiments in general.However, their end-game
predictions improve with greaterδ = 1.5, especially in the power22
setting where JCMs pre-dict the experiment outcomes almost exactly.
A closer lookat the end-game prediction results reveals a different
pictureabout the relative performances of the three models. In
par-ticular, the individual behavior models’ predictions on
thefinal consensus color are considerably out of line with
theactual experiments for both coER 2 and power22, renderingthem
ineffective in predicting end-game color results. JCMs,on the other
hand, provide significantly more accurate pre-dictions on the
consensus color in the power22 setting. Theratio between blue and
red consensus instances by JCMs incoPA 2 resembles that of the
actual experiments more thanICMs and PRMs. In the coER 2 setting
all models’ predic-tions on the favored consensus color (blue) miss
the actualexperiments’ favored consensus color (red), though the
ra-tio of red-to-blue consensus predicted by JCM is less skewedthan
that of ICMs and PRMs.
Last, we demonstrate the benefits of our extension to
theoriginal hGMM representation by comparing the JCM
rep-resentation against oJCM, which retains the original
hGMMdefinition, assuming that the conditioning set is identical
tothe learned within-time neighborhood: Γ = N. Figure 5shows that
oJCMs perform worse than both JCMs and ICMsin predicting the
system’s votes for each time period andend-game results, for the
power22 setting with h = 1 and
1End-game results from sPRMs are similar to those fromPRMs, and
not shown here.
-
δ = 0.5.2 Moreover, we note that the resulting graphs byoJCMs
contain disconnected node subsets, which potentiallyprevent vote
decisions to propagate throughout the network,causing failures in
producing any consensus instances.
6.3 Graph Analysis
prop
ortio
n of
all
edge
s
0.0
0.2
0.4
0.6
0.8
1.0
obse
rvat
ion (p
ower
22)
with
in−tim
e (p
ower
22)
obse
rvat
ion (c
oER_
2)
with
in−tim
e (c
oER_
2)
obse
rvat
ion (c
oPA_
2)
with
in−tim
e (c
oPA_
2)
intra red intra blue inter
Figure 6: Distributions of edges from three differentcategories,
intra red, intra blue, and inter, in thegiven observation and
learned within-time graphsfor JCM (δ = 0.5).
In this section, we seek to characterize the learned edgesthat
define N in the JCM representation, and discover con-nections
between the learned graphs and the aforementionedprediction
results. First, we categorize edges by their end-point nodes’ vote
preferences: we refer to edges that connecttwo red (blue) nodes as
intra red (blue), and those betweenred and blue nodes as inter
edges. Figure 6 presents theproportion of each edge type in both
the given observationgraphs and the learned within-time graphs.
While a ma-jority of edges in the observation graphs are inter
edges,the within-time graphs that define N consist mostly of
intraedges. That is, there are more interdependencies in JCMsamong
agents of the same preference than among conflictingagents. The
ability to discover these inter edges and incor-porate the
information they carry in its joint action distri-bution may help
the JCM representation to better capturedynamic behavior and
end-game results, as illustrated anddiscussed in Section 6.2. For
the power22 setting in particu-lar, JCMs often assign a majority of
edges as intra red, andthus effectively identify the presence of a
strongly connectedred minority who dictated end-game colors in the
actualexperiments. This construction allows JCMs to predict
end-game consensus colors much more accurately than ICMs andPRMs,
which rely entirely on the observation graphs.
We further investigate whether these proportion measuresprovide
any predictions on the number of consensus instancesinduced by
JCMs. We pool data from the three experi-ment settings—power22,
coPA 2, and coER 2—and com-pute a simple linear regression of the
number of red (blue)consensus instances with respect to the
proportion of intrared (blue) edges. The resulting regression
coefficients arestatistically significant for both blue and red (p
< 0.05).
2We also obtain similar results for oJCMs in other experi-ment
settings and environment parameters, which are notshown here.
Figure 7 suggests that a weak positive correlation betweenthe
within-time graphs’ intra edges and the number of con-sensus
instances. Intuitively, more interdependence betweensame-preference
nodes allows them to have more influenceon one another, helping to
diffuse vote choices more rapidlythroughout the system.
0.0 0.2 0.4 0.6 0.8 1.0
020
4060
proportion of intra blue edges
blue
con
sens
us in
stan
ces
0.0 0.2 0.4 0.6 0.8 1.0
020
4060
proportion of intra red edges
red
cons
ensu
s in
stan
ces
Figure 7: The number of consensus instances in blue(left) and
red (right), and proportion of JCM intraedges of the corresponding
colors.
We next examine JCM edges in terms of how far apartare the nodes
they connect in the observation graph. Letφi,j ≥ 1 denote the
length of shortest path from i to j inthe observation graph. Given
a graph G on the same set ofnodes, we can calculate the proportion
of edges in G thatconnect nodes separated by a certain distance in
the origi-nal observation graph. Figure 8 presents the profile of
suchdistances for pairs of nodes in the learned JCMs. For
com-parison, the profiles labeled “fully connected” simply
reflectthe distribution of node distances in the original
observa-tion graph: most of the nodes are one hop or less apart
fromeach other (φ ≤ 2), and the modal distance is φ = 2. Alarge
majority of edges in the learned within-time graphshave φ = 2, that
is, are close but not connected in the ob-servation graphs.
prop
ortio
n of
all
edge
s
0.0
0.2
0.4
0.6
0.8
1.0
fully
−con
necte
d (p
ower
22)
with
in−tim
e (p
ower
22)
fully
−con
necte
d (c
oER_
2)
with
in−tim
e (c
oER_
2)
fully
−con
necte
d (c
oPA_
2)
with
in−tim
e (c
oPA_
2)
phi=1 phi=2 phi=3 phi=4
Figure 8: Distributions of edges in the within-timegraphs based
on the distance between their end-nodes φ in the observation graph
(δ = 0.5).
Next we compare the assortativity [12] of the learned
andoriginal graphs. A graph G’s assortativity coefficient in[−1, 1]
captures the tendency for nodes to attach to othersthat are similar
(positive values) or different (negative val-ues) in connectivity.
As illustrated in Figure 9, the large dif-ference in assortativity
for the power22 setting stresses the
-
JCM’s ability to discover interdependencies among agents’actions
that are not captured in the observation graph. Inparticular, the
resulting JCMs are able to capture actioncorrelations among nodes
of similar degrees in the power22setting, where the minority nodes
are more densely con-nected than the majority, confirming the
findings by afore-mentioned graph analyses on intra and inter
edges. We alsoinvestigate the sparsity of the learned graphs for
differentvalues of δ. Our sparsity measure is the number of edgesin
the learned within-time graph divided by the number ofedges in the
corresponding observation graph. Figure 9 il-lustrates that the
within-time graphs become sparser as thediscretization interval
shrinks from 1.5 to 0.5 in all experi-ment settings. Intuitively,
the finer grain the discretizationis, the fewer simultaneous vote
changes there are in one timeperiod. As a result, there may be
fewer interdependenciesamong agents’ actions, which explains the
aforementionedrelations between discretization interval and graph
sparsityacross all experiment settings.
power22 coER_2 coPA_2
observation within−time
asso
rtat
ivity
−1.
0−
0.5
0.0
0.5
1.0
power22 coER_2 coPA_2
delta=0.5 delta=1.5
with
in−
time
over
obs
erva
tion
edge
s
0.0
0.1
0.2
0.3
0.4
Figure 9: (left) Assortativity of the observationgraphs and the
learned within-time graphs (δ = 0.5).(right) Sparsity of the
within-time graphs.
7. CONCLUSIONSOur main result is a demonstration of the
feasibility of
learning probabilistic models of dynamic multiagent behav-ior
from real traces of agent activity on a network. To ac-complish
this we extend the original hGMM framework [4],by distinguishing
within-time dependencies from condition-ing sets, and introducing a
structure-learning algorithm toinduce these dependencies from
time-series data. We eval-uated our techniques by learning compact
graphs captur-ing the dynamics of human-subject voting behavior on
anetwork. Our investigation finds that the learned joint be-havior
model provides better predictions of dynamic behav-ior than several
individual behavior models, including theproportional-response
models suggested by the original ex-perimental analysis. This
provides evidence that express-ing joint behavior is important for
dynamic modeling, evengiven partial history information for
conditioning individualbehavior. Our graph analysis further reveals
characteris-tics of the learned within-time graphs that provide
insightsabout patterns of agent interdependence, and their
relationto structure of the agent interaction network.
We plan to improve the learning algorithm for individ-ual
behavior models, by replacing the maximum-degree con-straint with a
cross-validation condition that can better helpavoid over-fitting.
Given the formalism’s generality, we con-
sider it promising to apply our modeling technique to simi-lar
problem domains, such as graph coloring, where agentsmust
coordinate their actions or make collective decisionswhile only
communicating with their neighbors, as well aslarge network
scenarios, such as social networks and Internetprotocols.
8. REFERENCES[1] J. Y. Broadway, J. S. Yedidia, W. T. Freeman,
and
Y. Weiss. Generalized belief propagation. InThirteenth Annual
Conference on Advances in NeuralInformation Processing Systems,
pages 689–695,Denver, 2000.
[2] C. Daskalakis and C. H. Papadimitriou. Computingpure Nash
equilibria in graphical games via Markovrandom fields. In Seventh
ACM conference onElectronic Commerce, pages 91–99, Ann Arbor,
MI,2006.
[3] Q. Duong, M. P. Wellman, and S. Singh. Knowledgecombination
in graphical multiagent models. InTwenty-Fourth Conference on
Uncertainty in ArtificialIntelligence, pages 153–160, Helsinki,
2008.
[4] Q. Duong, M. P. Wellman, S. Singh, andY. Vorobeychik.
History-dependent graphicalmultiagent models. In Ninth
International Conferenceon Autonomous Agents and Multiagent
Systems, pages1215–1222, Toronto, 2010.
[5] Y. Gal and A. Pfeffer. Networks of influence diagrams:A
formalism for representing agents’ beliefs anddecision-making
processes. Journal of ArtificialIntelligence Research, 33:109–147,
2008.
[6] A. X. Jiang, K. Leyton-Brown, and N. A. R. Bhat.Action-graph
games. Games and Economic Behavior,71:141–173, 2010.
[7] S. Kakade, M. Kearns, J. Langford, and L. Ortiz.Correlated
equilibria in graphical games. In FourthACM Conference on
Electronic Commerce, pages42–47, San Jose, CA, 2003.
[8] M. Kearns, S. Judd, J. Tan, and J. Wortman.Behavioral
experiments on biased voting in networks.Proceedings of the
National Academy of Sciences,106(5):1347–1352, 2009.
[9] M. Kearns, M. L. Littman, and S. Singh. Graphicalmodels for
game theory. In Seventeenth Conference onUncertainty in Artificial
Intelligence, pages 253–260,Seattle, 2001.
[10] M. Kearns and J. Tan. Biased voting and theDemocratic
primary problem. In Fourth InternationalWorkshop on Internet and
Network Economics, pages639–652, Shanghai, 2008.
[11] D. Koller and B. Milch. Multi-agent influencediagrams for
representing and solving games. Gamesand Economic Behavior,
45:181–221, 2003.
[12] M. E. J. Newman. Mixing patterns in networks.Physical
Review E, 67(2), 2003.