Fuzzy Cognitive Maps of Public Support Modeling and ...sipi.usc.edu/~kosko/FCM-PSOT-Nov-2016.pdfsimulation capabilities of some FCM-PSOT models. Section4 summarizes the FCM PSOT experiments

Fuzzy Cognitive Maps of Public Supportfor Insurgency and Terrorism

Journal of DefenseModeling and SimulationXX(X):1–14c©The Author(s) 2016

Reprints and permission:sagepub.co.uk/journalsPermissions.navDOI: 10.1177/ToBeAssignedwww.sagepub.com/

Osonde A. Osoba1 and Bart Kosko2

AbstractFeedback fuzzy cognitive maps (FCMs) can model the complex structure of public support for insurgency and terrorism(PSOT). FCMs are fuzzy signed directed graphs that model degrees of causality in interwoven webs of feedback causalityand policy variables. Their nonlinear dynamics permit forward-chaining inference from input causes and policy options tooutput effects. We show how a concept node causally affects downstream nodes through a weighted product of theintervening causal edge strengths. FCMs allow users to add detailed dynamics and feedback links directly to the causalmodel. Users can also fuse or combine FCMs from multiple experts by weighting and adding the underlying FCM fuzzyedge matrices. The combined FCM tends to better represent domain knowledge as the expert sample size increasesif the expert sample approximates a random sample. Statistical or machine-learning algorithms can use numericalsample data to learn and tune a FCM’s causal edges. A differential Hebbian learning law can approximate a PSOTFCM’s directed edges of partial causality using time-series training data. The PSOT FCM adapts to the computationalfactor-tree PSOT model that Davis and O’Mahony based on prior social science research and case studies. Simulationexperiments compare the factor-tree PSOT model with the adapted FCM models.

Keywordsfuzzy cognitive maps, causal reasoning, knowledge fusion, differential Hebbian learning

1 Modeling Feedback Causal Webs withFuzzy Cognitive Maps

This paper presents static and dynamic fuzzy cognitivemap (FCM) models of public support for insurgency andterrorism (PSOT). We base these PSOT FCMs on the factor-tree PSOT analysis of [1, 2]. Public support for insurgencyand terrorism has complex socio-political causes [3, 4] thatinvolve numerous factors. FCMs can efficiently model andprocess the interwoven causal and policy structure of PSOTand other defense problems. FCM applications number inthe thousands and range from control engineering and signalprocessing to policy analysis and social modeling [5, 6].

FCMs are fuzzy causal signed directed graphs. They arefuzzy because in general both their directed causal edgesand their concept nodes are multivalued and so can assumemore values than just the extremes of on or off. They locallymodel degrees of causality through their directed causaledge strengths [7]. Users can express their causal and policymodels by drawing signed weighted causal edges betweenconcept nodes. Figure 1 shows a FCM fragment that modelsan undersea causal web of dolphins in the presence of sharksor other survival threats. The next section shows how to makewhat-if inferences or predictions with this simple FCM thathas binary concept nodes and trivalent causal edges. Theinference process uses only vector-matrix multiplication andthresholding. More complex FCMs can activate concept nodeswith some of the nonlinear functions in Figure 3 or withmany other monotonic or nonmonotonic functions. A causallearning law can approximate the causal edge values giventime-series data of the concept nodes. Figure 5 shows the

approximation path of one such causal edge from a PSOTFCM.

A FCM’s overall cyclic signed digraph structure resemblesa feedback neural or semantic network. The graph structurepermits inference through forward chaining and allows theuser to control the level of causal or conceptual granularity.A FCM concept node can itself be part of another FCM or ofsome other nonlinear system. Feedforward fuzzy rule-basedsystems can also model the input-output structure of a conceptnode just as they can model a single causal edge that connectsone concept node to another. Such fuzzy systems are uniformfunction approximators if they use enough if-then rules [8].Their rule bases adapt using both unsupervised and supervisedlearning laws [9, 10].

A FCM’s feedback loops model interwoven causal websand can produce rich and predictive equilibrium dynamics[11, 12]. These causal equilibria define “hidden patterns” [11]in the often inscrutable web of edges and nodes. FCMs withbinary concept variables produce limit-cycle equilibria orsimple fixed-point attractors. Properly fuzzy concept nodescan in principle produce more exotic equilibria such as limittori or chaotic attractors.

1Osonde A. Osoba is a researcher at the RAND Corporation and aprofessor at the Pardee RAND Graduate School, Santa Monica, CA, USA2Bart Kosko is a professor of Electrical Engineering and Law at theUniversity of Southern California, Los Angeles, CA, USA

Corresponding author:Osonde A. Osoba,The RAND Corporation, 1776 Main St, Santa Monica,CA, 90405, USA.Email: [email protected]

Prepared using sagej.cls [Version: 2015/06/09 v1.01]

2 INFERENCE WITH FUZZY COGNITIVE MAPS

A FCM’s underlying matrix structure makes it easy tocombine or fuse FCMs from several sources to produce anoverall representative FCM. The strong law of large numbershows that in many cases this fused FCM converges withprobability one to the population FCM of the sampled FCMs[13]. This result holds formally if the FCM edge values fromthe combined experts approximate a statistical random samplewith finite variance. A random sample is sufficient for thisconvergence result but not necessary. A combined FCM maystill give a representative knowledge base when the expertresponses are somewhat correlated or when the experts do notall have the same level of expertise. Users can also constructFCMs from written sources such as policy articles or booksor legal testimony. They can also use statistical learningalgorithms to grow FCMs from sample data. FCMs can inthis way address the growing representational problems of bigdata and what we have called “big knowledge” [6]. Figure 7shows the minimal fusion case of combining two FCMs withoverlapping concept nodes.

Knowledge fusion is a key function in defense andintelligence decision-making processes [14, 15]. The PSOTFCMs we develop below allow such knowledge combinationor fusion [16, 11, 17, 13] and can use a causal differentialHebbian learning law [11] to grow and tune the FCM causaledges from numerical time-seres data.

The FCM PSOT models below rely on the prior PSOTmodeling of [1, 2]. Their carefully constructed PSOT modelis a factor tree model. It reflects social science research thatdescribes the factors and societal relationships that determinea community’s propensity to support insurgent or terroristacts. This PSOT factor tree model has limited scenariosimulation capability because the “public” under study inthe PSOT model is a heterogeneous aggregation. Later workintroduced the related propensity for terrorism model [14, 15]that focuses on the factors that influence someone’s or somegroup’s propensity to support terrorism or carry out terroristacts.

We use both static and dynamic FCM versions of the PSOTfactor tree. The static FCM behaves much as does the originalfactor tree model. The dynamic FCM extends the static modelto allow simulations of long-run what-if scenarios.

The next sections develop the FCM PSOT models. Section2 explains and extends the basic mathematical and graphstructure of FCMs and how to use them for forward-chaininginference in analysis. Section 3 presents the original PSOTmodel. Then we introduce a FCM version of the PSOTmodel. This includes adaptations to the PSOT model thatallow long-run simulations. We demonstrate the fusion andsimulation capabilities of some FCM-PSOT models. Section 4summarizes the FCM PSOT experiments for causal inferenceand knowledge fusion.

2 Inference with Fuzzy Cognitive Maps

Fuzzy cognitive maps (FCMs) [12, 6] are fuzzy signeddirected graphs that describe degrees of causality and webs ofcausal feedback. Most FCMs have cycles or closed loops thatmodel causal feedback. FCMs can be acyclic and thus definetrees. This is rare in practice and implies that such a FCM hasno feedback dynamics.

FCMs are fuzzy because their nodes and edges can bemultivalent and so need not be binary or bivalent. A propertyor concept is fuzzy if it admits degrees and is not just blackand white [19, 20, 21]. Then the property or concept hasborders that are gray and not sharp or binary. A subset A of aspace X is properly fuzzy if and only if at least one elementx ∈ X belongs to A to a degree other than 0 or 1. Then thesetA breaks the so-called “law” of contradiction because thenA ∩Ac 6= ∅ holds where Ac is the complement set of A. Theset A equivalently breaks the dual “law” of excluded middlebecause then A ∪Ac 6= X holds. Equality holds in these two“laws” just in case A is an ordinary bivalent set.

A FCM concept node is fuzzy in general because it can takevalues in the unit interval [0, 1]. So its values over time definea fuzzy set. This implies that a concept node that describesa survival threat or any other property or policy both occursand does not occur to some degree at the same time. It cannotboth occur 100% and not occur 100% at the same time. Thetwo percentages must sum to 100%. Nor does this precludeapplying a probability measure to a concept node or fuzzy set.The probability of a fuzzy event combines the two distinctuncertainty types of randomness and vagueness or fuzz (andformally involves taking the expectation of a measurablefuzzy indicator function [22]). So it makes sense to speakof the probability of a partial survival threat. This differs fromthe compound uncertainty of a fuzzy probability such as thestatement that the survival-threat probability is low or veryhigh. This paper works only with fuzzy concept values

A directed causal edge eij is also fuzzy because in generalit takes on a continuum of values. The edge can also havea positive or negative sign. So it takes values in the bipolarinterval [−1, 1]. The use of “dis-concepts” can convert allnegative causal edges into positive edge values [7]. Theedge formally defines a relation or fuzzy subset of a conceptproduct space.

A FCM consists of n concept nodes Cj and n2 directedfuzzy causal edges eij . The n concept nodes C1, C2, . . . , Cn

are nonlinear and represent variable concepts or factors in acausal system. They are nonlinear in how they convert theirinputs to outputs. The concept nodes can define conceptsor social patterns that increase or decrease such as politicalinstability or jihadi radicalism. Or they can be policies orcontrol variables that increase or decrease such as weaponsspending or foreign investment in a country. The very firstFCM published [7] dealt with concepts related to MiddleEast stability such as Islamic fundamentalism and Sovietimperialism and the strength of the Lebanese government.The author based this first FCM on a 1982 newspaper editorialfrom political analyst Henry Kissinger titled “Starting Out inthe Direction of Middle East Peace.”

A concept node’s occurrence or activation value Ci(tk)measures the degree to which the concept Ci occurs in thecausal web at time tk. It can also reflect the degree to which itis true that the ith node fires or appears in a given snapshot ofthe causal web at time tk. The FCM state vector C(tk) givesa snapshot of the FCM system at time tk.

A FCM model must specify the nonlinear dynamics of then concept nodes C1, C2, . . . , Cn. It must also specify the n2

directed and signed causal edge values eij that connect theith concept node Ci to Cj . The edges can be time-varyingfunctions in more general FCMs.

To appear in the Journal of Defense Modeling and Simulation


1

1

1

1

C1: Herd

Clustering

C2: Fatigue Rest

Survival

Threat

C5: Run Away

Figure 1. Fragment of a predator-prey fuzzy cognitive map that describes dolphin behavior in the presence of sharks or othersurvival threats [18]. The FCM itself is a fuzzy signed directed graph with feedback. The concept nodes of the digraph represent fuzzysets that activate to varying degrees of concept-node occurrence. The edges denote fuzzy or partial causal dependence betweenconcept nodes. The edges in this FCM are trivalent: eij ∈ {−1, 0, 1}. Each nonzero edge defines a causal if-then rule: The dolphinpod decreases its resting behavior if a shark or other survival threat is present. But the survival threat increases if the pod rests more.These two causal links define a minimal cycle or feedback loop within the FCM’s causal web. Such feedback cycles endow the FCMwith transient and equilibrium dynamics. All inputs produce equilibrium limit cycles or fixed-point attractors in the simplest case whereall nodes are bivalent threshold functions and when the system updates all nodes at each iteration.

Sum

mary xvii

Figure S.1Factors Underlying Public Support for Terrorism or Insurgency

RAND MG1122-S.1

~ands

+/–

orsAttractions

ors

ors

Identity

+/– –

––

+/–orsors

Effectiveness oforganization

Motivation for supportinggroup or cause

Perceived legitimacyof violence

Acceptability ofcosts and risks

Leadershipadaptation

and

conceptsexcite-

costs

dationdesperation

–

Public support for insurgency and terrorism

��

��

Figure 2. Factor-tree model that shows the relationships among factors that underly the Public Support for Terrorism or Insurgencymodel in [1].

We start with the nonlinear structure of the concept nodes.The jth concept node Cj depends at time tk on a scalar inputxj(tk) that weights and aggregates all the in-flowing causalactivation to Cj . Then some nonlinear function Φj convertsxj(tk) into the concept node’s new state Cj(tk+1) at the nextdiscrete time tk+1. The FCM literature explores discrete andcontinuous node models with a wide variety of nonlinearitiesand time lags [5, 6]. We present here the simplest case of adiscrete FCM where each node’s current state depends on an

edge-weighted inner product of the node activity:

Cj(tk+1) = Φj(

n∑i=1

Ci(tk) eij(tk) + Ij(tk)) (1)

where Ij(tk) is some external or exogenous forcing valueor input at time tk. The simplest nonlinear function Φj is ahard threshold that produces bivalent or on-off concept node



values:

Cj(tk+1) =

{0 if

∑ni=1 Ci(tk) eij(tk) + Ij(tk) ≤ 0

1 if∑n

i=1 Ci(tk) eij(tk) + Ij(tk) > 0(2)

for a zero threshold value.Fixing the input Ij as some very large positive (or negative)

value ensures that Cj stays on (or off) during an inferencecycle. We call this “clamping” on (or off) the jth concept nodeCj . We clamp one or more concept nodes to test a given policyor forcing scenario. Clamping is the only way to drive policyor other nodes that have no causal fan-in from other conceptnodes. We show below how to model the sustained presenceof a shark in the dolphin FCM of Figure 1 by clamping on thefourth concept node.

Continuous-valued concept nodes often use a monotoneincreasing Φj nonlinearity such as the logistic sigmoidfunction. But Φj can also be nonmonotonic. This happens ifit is a Gaussian or Cauchy probability density function. It canalso be multimodal by forming a mixture of such unimodalprobability curves. Then the Expectation-Maximizationalgorithm can tune the mixture parameters based on numericaltraining data [23]. Almost all concept nodes are monotonicallynondecreasing in the FCM literature. The causal-influencetheorem below holds for such activation functions Φj .

The logistic causal activation gives a soft threshold thatapproximates the hard threshold in (2) if the shape parameterc > 0 is large enough:

Cj(tk+1) =1

1 + exp(− c

∑ni=1 Ci(tk) eij(tk) − cIj(tk)

) .(3)

The first graph in Figure 3 shows a logistic function and itssigmoidal or soft-threshold shape. The other five graphs showother common causal activation functions. Logistic units arepopular in causal and neural-network learning algorithmsbecause they smoothly approximate the on-off behavior ofthreshold units and still have a simple partial derivative of theform

∂Cj(x)

∂x= c Cj(1− Cj) > 0 (4)

if

Cj(x) =1

1 + exp(−cx)(5)

for scaling constant c > 0. The positive derivative in (5)greatly simplifies many learning algorithms. We will alsouse it below to show the transitive product effect of the edgeseij in a causal inference.

We turn next to the causal edge values eij . These values areconstants during most FCM inferences. The last section belowshows how a version of the differential Hebbian learning lawcan learn and tune these causal edge values from time-seriesdata.

The causal edge value eij(tk) in (1) measures the degreethat concept node Ci causes concept node Cj at time tk:

eij = Degree (Ci → Cj) . (6)

These n2 causal edge values define the FCM’s n× n fuzzyadjacency matrix or causal edge matrix E. The ith row liststhe causal edge values ei1, ei2, . . . , ein that flow out fromCi to the other concept nodes (including to itself). The jthcolumn lists the causal edge values e1j , e2j , . . . , enj that flowinto Cj from the other concept nodes. So the ith row definesthe causal fan out vector of concept node Cj . The jth columndefines the causal fan in vector of Cj . The matrix diagonallists any causal self-excitation of the n concept nodes.

We can also interpret eij in terms of fuzzy subsethood[24, 25]). Then eij states the degree to which the fuzzyconcept set Ci is a fuzzy or partial subset of fuzzy concept setCj [7]. This abstract framework implies that the edge valueeij is the degree to which the fuzzy concept set Ci belongs tothe fuzzy power set of fuzzy set Cj .

A probabilistic view might interpret the edge value eijas the conditional probability P (Cj |Ci) that Cj occursgiven that Ci occurs. An immediate problem is that eijtakes on negative values in the bipolar interval [−1, 1] toindicate causal decrease. There is a simple but somewhatcostly way to address this. The original FCM paper [7]showed how to introduce n companion dis-concepts tokeep all causal edge values nonnegative and thus howto convert causal decrease into causal increase: “Extremeterrorism decreases government stability” holds just in case“Extreme terrorism increases government instability” holds.So dis-concepts negate the noun and not the adjective thatmodifies it. Using dis-concepts doubles the number of conceptnodes and expands the edge matrix E to a 2n× 2n matrix.The technique does preserve more causal structure whencombining multiple FCMs because then two combined edgesof opposite polarity but the same magnitude do not canceleach other out.

There are two structural problems with viewing thedirected (positive) edge eij as the conditional probabilityP (Cj |Ci). The first problem is that conditional probabilityis not transitive but causal implication is transitive. Thetransitive equality P (C|A) = P (B|A)P (C|B) does not holdin general. A simple counter-example takes any two disjointor mutually exclusive events A and B with positive jointprobabilities P (A ∩ C) > 0 and P (B ∩ C) > 0 if all threeset events have positive probability. Then P (C|A) > 0

but P (B|A)P (C|B) = 0 because P (B|A) = P (A∩B)P (A) = 0

since A ∩B = ∅. An even starker counter-example results ifA ⊂ C because then P (C|A) = 1 while P (B|A)P (C|B) =0.

The second and deeper problem with a probabilityinterpretation of eij is that it collides with the LewisImpossibility Theorem [26, 27]. This triviality result and itsprogeny show that we cannot in general equate the probabilityof the logical if-then conditional A→ B with the conditionalprobability P (B|A). The equality P (A→ B) = P (B|A)holds only in the trivial case when A and B are independentand thus when there is no conditional relationship at all.So a probabilistic transitive equality of the form P (A→C) = P (A→ B)P (B → C) lacks a formal foundation ingeneral. One approach is to replace conditional probabilitywith a more general probable equivalence relation. Thisgives upper and lower conditional probabilities based onthe general inequality that P (A)P (B) ≥ P (A ∩B)P (A ∪B) [25] because then P (B|A) ≤ Q(B|A) if Q(B|A) =



P (B)P (AUB) . But the resulting conditioning interval does notdirectly address the basic prohibition that lies behind Lewis’striviality theorem. So a meta-level heuristic may be the bestwe can make of probabilistic interpretations of the directededge eij . Such interpretations may be intuitive but they remainonly heuristics.

We now show that FCM nodes influence one anotherthrough a weighted product of intervening causal edgestrengths eij . This result describes a type of causal chainingalong a directed path or summed over all such directed pathsthat connect two concept nodes. It depends on the transitivecausal product ej1j2ej2j3 · · · ejkjk+1

.Consider first the directed causal path from concept node

Ci to node Ck by way of the intervening node Cj :

Ci −−→eij

Cj −−→ejk

Ck . (7)

Then how does a change in the input node Ci causally affectthe downstream node Ck? The chain rule of differentialcalculus gives a transitive-based product answer for thelogistic concept node activation in (3):

∂Ck

∂Ci=∂Ck

∂Cj

∂Cj

∂Ci(8)

=∂Ck

∂xk

∂xk∂Cj

∂Cj

∂xj

∂xj∂Ci

(9)

=Ck(xk)(1− Ck(xk))ejkCj(xj)(1− Cj(xj))eij(10)

=eijejkψj,k (11)

using (4) - (5) if we define ψj,k = CjCk(1− Cj)(1− Ck).The weighting function ψj,k ≥ 0 is maximal when Cj = 1−Cj = 1

2 = Ck = 1− Ck holds for the fuzzy concept nodesCj and Ck.

So the induced causal effect of a change in Ci dependsdirectly on the causal-edge product eijejk. This causalinfluence decays in intensity the lesser Cj or Ck fires oroccurs. The edge product eijejk is negative if exactly one ofthe edge values is negative. It is positive otherwise.

The causal-influence result (11) extends directly to longercausal chains. Suppose there is a directed causal path of lengthk from the initial concept node Cj1 to the final node Cjk+1

:

Cj1 −−−→ej1j2

Cj2 −−−→ej2j3

· · · −−−−−→ejkjk+1

Cjk+1. (12)

Then the chain rule and (4) - (5) again give the influence ofCj1 on Cjk as a weighted product of the intervening causaledge strengths:

∂Cjk+1

∂Cj1

=

k∏l=1

ejljl+1ψj2,j3,...,jk+1

(13)

where now the nonnegative weighting function ψj2,j3,...,jk+1

is the double product ψj2,j3,...,jk+1=∏k+1

l=2 Cjl

∏k+1l=2 (1−

Cjl). The edge product∏k

l=1 ejljl+1is positive if the number

of negative edges is even. It is negative if the number ofnegative edges is odd. The magnitude of the change

∂Cjk+1

∂Cj1

can only decrease as the causal chain lengthens. The fuzzinessor partial firing of the concept nodes only exacerbates thismonotone causal decay.

The causal influence in (13) still holds if we replace thelogistic activation function (1) of concept node Cj with anarbitrary monotonically nondecreasing functions Φj . Then∂Cjl

∂xjl≥ 0 and so ψj2,j3,...,jk+1

≥ 0 because the weightingfunction is just the product of these activation partialderivatives. This general result on FCM causal influence isimportant enough to state as a theorem.

Theorem: Partial Causal Influence in Fuzzy CognitiveMaps. Suppose a fuzzy cognitive map has n concept nodesCj and n2 directed causal edges eij . Suppose further that theconcept nodes have monotonically nondecreasing activations:∂Cj

∂xj≥ 0 where the argument xj of Cj(xj) has the same

inner-product form as in (1). Then the causal influence ofthe concept node Cj1 on the downstream node Cjk of thelength-k directed causal chain

Cj1 −−−→ej1j2

Cj2 −−−→ej2j3

· · · −−−−−→ejkjk+1

Cjk+1(14)

is a nonnegatively weighted product of the intervening causaledge strengths ej1j2 , . . . , ejkjk+1

:

∂Cjk+1

∂Cj1

=

k∏l=1


(15)

where the weighting function ψj2,j3,...,jk+1has the form

ψj2,j3,...,jk+1=

k+1∏l=2

∂Cjl

∂xjl. (16)

Proof. The result follows from iterated applications of thechain rule:

∂Cjk+1

∂Cj1

=∂Cjk+1

∂Cjk

∂Cjk

∂Cjk−1

· · · ∂Cj2

∂Cj1

(17)

=∂Cjk+1

∂xjk+1

∂xjk+1

∂Cjk

∂Cjk

∂xjk

∂xjk∂Cjk−1

· · · ∂Cj2

∂xj2

∂xj2∂Cj1

(18)

=∂Cjk+1

∂xjk+1

ejkjk+1

∂Cjk

∂xjkejk−1jk · · ·

∂Cj2

∂xj2ejkjk+1

(19)

=

k∏l=1

ejljl+1

k+1∏l=2

∂Cjl

∂xjl(20)

=

k∏l=1


. (21)

The theorem states only a partial causal result for just onecausal path from concept node Cj1 to Cjk+1

. A FCM maycontain many other directed causal paths from Cj1 to Cjk+1

.

So the total causal changedCjk+1

dCj1invokes the more general

chain rule that sums over all the partial derivatives in (17) inall the paths involved. A discrete version of the theorem alsoholds. It requires keeping track of the discrete time steps asthe causal activation flows from one node in the path to thenext node.



We next develop a simple example of FCM inference. Thisexample shows how a FCM answers a policy-based what-ifquestion by converging to a limit-cycle equilibrium. The limitcycle itself is the policy answer.

Consider again the FCM fragment in Figure 1 that describessome of the predator-prey behavior of a dolphin pod inthe presence of sharks or other survival threats [18]. Theconcept nodes are binary with threshold activations that obey(2). Bivalent nodes simplify the dynamical analysis becauseupdating all n nodes at the same time must lead to either afixed-point attractor or a limit-cycle of bit vectors.

The edges in the dolphin FCM fragment in Figure 1 aretrivalent: eij ∈ {−1, 0, 1}. So an edge describes maximalcausal increase (eij = 1) or maximal causal decrease (eij =−1) or there is no causal relationship at all (eij = 0). Thecausal edge adjacency matrix E for the FCM in Figure 1 is a5-by-5 trivalent matrix:

E =

C1 C2 C3 C4 C5

C1 0 1 0 −1 0C2 0 0 1 0 −1C3 0 −1 0 1 −1C4 1 0 −1 0 1C5 −1 1 0 −1 0

(22)

A key argument for using trivalent edge weights eij in{−1, 0, 1} here and elsewhere is that experts may find ithard to accurately state a graded measure of causal intensityeij ∈ [−1, 1] for a causal dependence. It is usually mucheasier to elicit just sign values from experts than real-valuedmagnitudes Taber et al. [13] refer to this difficulty as theexpert’s articulation burden. Real-valued magnitudes alsotend to be less reliable. Experts are far more likely to agreeon edge signs than on both signs and magnitudes. Even thesame expert may state different edge-value magnitudes atdifferent times. This articulation burden motivates averagingthe trivalent-edge-valued FCMs of experts to approximate theunknown population FCM.

The stochastic convergence result in the appendix of [13]shows that averaging FCMs with trivalent edges approximatesthe underlying population FCM that has real edge values.FCM sample averages converge with probability one to thepopulation average in accord with the strong law of largenumbers. The underlying limit-cycle structure of the averagedFCM also appears to approximate the limit-cycle structureof the original or population FCM if the concept nodes arebinary. The limit-cycle results in [13] are only preliminarysimulations. So far no theoretical guarantee of limit-cycleconvergence has appeared in the FCM literature.

FCM dynamics depend on the FCM’s nonlinear feedbackstructure. The long-run evolution of the FCM state vector C

limt→∞

C(t) (23)

depends on the initial state C(0) as well as on the nonlinearstructure of the concept nodes and the structure of the FCMcausal edge matrix E. Simple two-state or binary-node FCMsconverge either to a fixed-point attractor

C∗(t+ 1) = Φ(C∗(t) E) (24)

or to a limit cycle of repeating bit vectors. This convergenceassumes synchronous updating of all the concept nodes at

each time step. This stability or convergence guarantee forbinary-node FCMs follows from the general result that everysquare connection matrix is temporally stable [28, 12].

We now show how a limit-cycle hidden pattern occurs inthe dolphin FCM in Figure 1. Suppose that a shark appears attime t = 0. Then the fourth or survival-threat concept nodeoccurs or turns on. We can represent this initial state C(0) ofthe FCM with the unit bit vector

C(0) = (0, 0, 0, 1, 0).Each of the 5 concept nodes acts as a threshold function

with zero threshold as in (2). So Ck(t) = 1 if and only if itstotal inner-product input x is positive: x > 0. It otherwiseequals zero and thus turns off or stays off if it is not active.Then a forward inference gives the following sequence ofFCM state vectors:

C(0)E = (1, 0,−1, 0, 1)→ (1, 0, 0, 0, 1) = C(1)

C(1)E = (−1, 2, 0,−2, 0)→ (0, 1, 0, 0, 0) = C(2)

C(2)E = (0, 0, 1, 0,−1)→ (0, 0, 1, 0, 0) = C(3)

C(3)E = (0,−1, 0, 1,−1)→ (0, 0, 0, 1, 0) = C(0) .This inference sequence defines an equilibrium 4-

step limit cycle because the fourth state vector C(4) =(0, 0, 0, 1, 0) is just the first state vector C(0). Sothe FCM equilibrium or hidden pattern is the indefi-nitely repeating cycle C(0)→ C(1)→ C(2)→ C(3)→C(0)→ · · · . This cycle defines the equivalent cycle ofbit vectors (0, 0, 0, 1, 0)→ (1, 0, 0, 0, 1)→ (0, 1, 0, 0, 0)→(0, 0, 1, 0, 0)→ (0, 0, 0, 1, 0)→ · · · . The repeating cycle pre-dicts a predator-prey oscillation: The shark threat appears.Then the threatened dolphin pod clusters and runs away. Thenthe dolphins get tired. Then they rest. But the resting dolphinsthen attract a shark and so on. This limit cycle can model anincidental appearance of a shark.

Suppose instead that a shark appears and actively pursuesthe dolphins. We can model this what-if policy scenario byclamping the fourth node on during each update. This againamounts to adding a large positive input value for I4 in (1).Clamping leads to two transient bit-vector states and then astable 3-step equilibrium limit cycle:

C(0)E = (1, 0,−1, 0, 1)→ (1, 0, 0, 1, 1) = C(1) sincewe keep C4 = 1 throughout.

C(1)E = (0, 2,−1,−2, 1)→ (0, 1, 0, 1, 1) = C(2)

C(2)E = (0, 1, 0,−1, 0)→ (0, 1, 0, 1, 0) = C(3)

C(3)E = (1, 0, 0, 0, 0)→ (1, 0, 0, 1, 0) = C(4)

C(4)E = (1, 1,−1,−1, 1)→ (1, 1, 0, 1, 1) = C(5)

C(5)E = (0, 2, 0,−2, 0)→ (0, 1, 0, 1, 0) = C(3) .The equilibrium 3-step limit cycle is C(3)→ C(4)→

C(5)→ C(3)→ · · · or (0, 1, 0, 1, 0)→ (1, 0, 0, 1, 0)→(1, 1, 0, 1, 1)→ (0, 1, 0, 1, 0)→ · · · . The limit cycle definesand thus predicts a different form of predator-prey behavior:The shark tires the dophin pod. The dolphins cluster in asafety maneuver. They then try to rest and still run awayas they fatigue. The shark does not relent and the dolphinsfatigue and so on.

We show next how FCM models naturally combine or fuseknowledge networks from multiple experts. A group of mexperts can each produce an FCM causal edge matrix Ek thatdescribes their understanding of the prey system in Figure 1.A simple and powerful way to fuse these expert opinions isto take the weighted average of the panel’s knowledge baseor FCMs by taking the convex combination of their edge


3 THE PSOT MODEL: PUBLIC SUPPORT FOR INSURGENCY AND TERRORISM

matrices [11, 12, 17]:

Em =

m∑k=1

wkEk (25)

where the weights wk are convex weights and hencenonnegative and sum to one.

The weights wk can reflect relative expert credibility inthe problem domain. So the weights can correspond to testscores or to subjective valuations or to some other measureof the experts’ predictive accuracy in prior experiments.Predd et al. [29] developed a method for aggregating expertcontributions in cases where experts can abstain or beincoherent. We simply take the weights as given and useequal weights as a default.

The edge matrices Ek in (25) must be conformable foraddition. So they must have the same number of rows andcolumns and they must be in the same matrix positions. Wetake the union of all concept nodes from all m knowledgesources. This gives a total of n distinct concept nodes. Wezero-pad or add rows and columns of zeros for missingnodes in a given knowledge source’s causal edge matrix. Thisproduces a conformable n-by-n adjacency matrix Ek afterappropriately permuting rows and columns to bring them inmutual coincidence with all other zero-padded augmentedmatrices.

The strong law of large numbers gives some guaranteesabout the convergence of this fusion knowledge graph toa representative population FCM if the knowledge sourcesare approximately statistically independent and identicallydistributed and if they have finite variance [11, 13]. Thenthe weighted average in (25) can only reduce the inherentvariance in the expert sample FCMs. So the knowledge fusionprocess improves with sample size m.

FCMs are digraph models and so resemble Bayesian beliefnetworks (BBNs) and decision trees and factor trees. BBNsare directed acyclic probabilistic graphs that represent causaldependence among random variables. They form the basis ofPearl’s alternate model of causal inference [30].

FCMs differ from BBNs in many respects. They differconceptually because an FCM’s nodes need not representrandom variables and in practice seldom do. Partial causalityis causality that occurs only to some degree. It is not aprobability or bet that the cause or effect occurs all or none.More complex FCMs can superimpose such randomness ontop of the fuzzy degrees of occurrence. Even then the twotypes of uncertainty are distinct even though they merge andproduce a single real number.

FCMs differ dynamically from BBNs because most FCMsare rich nonlinear dynamical systems while BBNs arefeedforward trees and have no dynamical structure. FCMdynamics and inference depend on their nonlinear concept-node functions and on their edge-based feedback cycles. Morecomplex FCMs replace the constant edges of ordinary FCMswith nonlinear functions that vary with time. This leads tostill more complex transient and equilibrium dynamics. Thelack of cycles in BBNs precludes any nontrivial dynamics.But a BBN’s acyclic structure may permit finer controlwhen propagating probabilistic beliefs. Probabilistic beliefpropagation is also NP-hard [31]. This complexity can imposea heavy computational burden for large BBNs. FCM inferenceinvolves only matrix-vector multiplication.

The cycles in a FCM directly model feedback causalityamong the concept nodes. This cyclic structure gives rise inturn to complex dynamics that range from simpled fixed-pointattractors and limit cycles to chaotic or aperiodic attractorsin more advanced FCMs. The dynamic attractor regionspartition the FCM’s state space into a finite number of suchregions. Every input state converges to exactly one of theseregions. The mapping from inputs to regions serves as a macroform of stored input-output associations or what-if questionsand answers. FCMs do not easily permit backward chainingbecause of the nonlinearity of their concept nodes. A usermust locate an output effect state within one of the attractorregions and then find the corresponding input cause states thatmap to that attractor region.

FCMs also differ from BBNs in how they combine expertknowledge sources. Trees do not naturally combine to producea tree because cycles tend to appear among the nodes. So mBBN probability trees do not naturally combine to form arepresentative BBN. Such knowledge fusion need not improvewith the expert or knowledge-engineer sample size m. ButFCMs always combine to yield a new FCM from the matrixaveraging process (25). So FCMs are closed under knowledgecombination while BBNs are not. The same holds for AIsearch trees or any other knowledge representation structurebased on acylic graphs.

3 The PSOT Model: Public Support forInsurgency and Terrorism

The Public Support for Insurgency and Terrorism (PSOT)model [2, 1] is a factor-tree model that Davis [32] developedto describe the factors and causal pathways that influence apublic’s support for insurgent or terrorist organizations andactions. The PSOT model synthesizes prior social-scienceresearch on terrorism and social movements theory [33, 2, 34].This work has validated the PSOT model on case studiesof terrorist groups. These groups include al-Qa’ida and theTaliban in Afghanistan, the Kurdistan Workers’ Party inTurkey, and the Maoists in Nepal. More recent work [1]has distilled the extensive prior social science research onthe topic into a computational PSOT model. Davis’s laterwork [15, 14] used the PSOT model to motivate relatedmodels of an individual propensity for terrorism.

The PSOT model is a causal factor tree model because itdepicts the degree to which child nodes influence or causeparent nodes. Figure 2 and Table 1 give more details on thecomponents and structure of the PSOT factor tree. The PSOTnodes represent factors that directly or indirectly relate to thePublic Support for Insurgency and Terrorism concept PSOT .

Davis’s factor tree models are multi-resolution models [35].Major elements have a hierarchical structure that allows usersto specify factors at different levels of detail. Each node is anexogenously driven factor or it fires or activates based on afunction of its inputs.

There are also cross-cutting factors besides sub-node factors. Cross-cutting factors affect multiple factorssimultaneously. The “and” nodes depend on all fan-in factorsbeing present to a first approximation. The “or” nodes dependon any of the fan-in factors being active or on a combinationof the fan-in factors being active. There are several top-level factors that directly relate to the general PSOT of


3 THE PSOT MODEL: PUBLIC SUPPORT FOR INSURGENCY AND TERRORISM

0

0.5

1

��

0

0.5

1

��

0

0.5

1

� ��

0

0.5

1

��

�� 0

0.5

1

��

0

0.5

1

��

��

Figure 3. Six types of FCM concept-node occurrence or activation functions: sigmoid logistic, sigmoid hyperbolic tangent, arctangent,linear, step function, and the delayed step. Each occurrence function maps into the unit interval [0, 1] and gives the degree to whichthe concept or policy occurs at a given moment in the causal web. Simulations used the logistic, linear, or step function.

[2]: Effectiveness of the organization EFF , motivation forsupporting the group/causeMOTV , the perceived legitimacyof violence PLEG, and the acceptability of costs and risksACR. Each of these factors have attendant contributory sub-factors.

PSOT edges denote positive influences by default. Wedenote negative edges with ‘-’ as with a FCM causal-decreaseedge. Factor activation along a negative edge reduces theactivation of the parent factor. We denote ambiguous edgeswith “+/−”. The ambiguity refers to uncertainty over theedge’s direction of influence.

We based our FCM models on the important case of theal-Qa’ida transnational terrorist organization.

Davis et al. [2] have discussed how the PSOT modelexplains the public support for al-Qa’ida’s mission as follows(paraphrased from [2]): The organizational effectiveness ofal-Qa’ida depends in part on the charisma, strategic thinking,and organizational skills of its leadership (lead). al-Qa’idahas packaged and framed its ideology to appeal to manyMuslims worldwide. Motivation for public support of al-Qa’ida’s beliefs comes from shared religious beliefs that stresscommon identity (id) and the sense of duty (duty) that suchidentity fosters. al-Qa’ida also relies on a popular narrativeof shared grievances (shgr) in the Muslim world. al-Qa’idaplays up the perceived glory (glry) of supporting a cause thataims to redress these purported grievances. Religious beliefsand intolerance (intl) help increase the perceived legitimacy(PLEG) of violence against the West and against the manyMuslims who do not share their Salafist views. Countervailingpressure (scst) discourages more support for al-Qa’ida. Thiscountervailing pressure may occur in part because much ofthe public believes that al-Qa’ida is not likely to succeedand emerge as ultimate victors (lvic). This pressure in turndiminishes the acceptability of costs and risks (ACR) foral-Qa’ida activities. The parameters of this al-Qa’ida casestudy determined the relative causal edge weights in our FCMmodels.

Label Full Descriptionlead Leadership Strategic or otherwisepkg Ideological Package & Framingrsrc Resource Mobilizationopp Opportunism & Adaptationpres Presence, Tactics, & DeedsEFF Effectiveness of Organizationreli Ideological Religious Conceptssocs Social Servicesglry Glory, ExcitementATT Attractionsduty Duty & Honorrwrd RewardsMOTV Motivation for Supporting Group, Causeintl Religious, Ideological, Ethical Beliefs; Intolerancervng Revengecprop Cultural Propensity for Accepting Violencedesp Desperation, NecessityPLEG Perceived Legitimacy of Violenceintm Intimidationlvic Assessment of Likely Victorprsk Personal Risk and Opportunity Costscst Countervailing Social Costs & PressuresACR Acceptability of Costs & Risksid Identityshgr Shared Grievances & Aspirationsugb Unacceptable Group Behaviorenv Environmental Factorsimpl Impulses, Emotions, Social Psychologyhsucc History of Successesmgtc Management Competenceprop Propaganda, Advertisingefdoc Effectiveness of Indoctrination/Passing Beliefshfail History of FailuresPSOT* Public Support for Insurgency and Terrorism

Table 1. Table of factors in the Public Support for Insurgencyand Terrorism (PSOT) model.


4 SIMULATION EXPERIMENTS AND DISCUSSION

3.1 The FCM-PSOT ModelWe cast the PSOT model as a FCM based on the originalPSOT factor tree model. Then the directed edges of thefactor tree became directed edges in the cognitive map.The signs of the factor-tree links determined the signs ofthe FCM edges. The ambiguous factor tree links (labeled“+/−” in Figure 2) defined weak bidirectional dependencebetween pairs of factors in the FCM. Factor tree nodes use“or” or “and” to aggregate their fan-in input signals. Weused these different PSOT combination functions to specifyanalogous combination functions in the FCM. Factor treenodes aggregate inputs by using functions from a predefinedset of functions (Tables 2.3 and 2.4 in [1]). Nodes in a FCMapply nonlinear occurrence or activation functions to weightedlinear combinations of their inputs as discussed above. TheFCM-PSOT model used step functions (delayed or otherwise),logistic sigmoids, and clamped linear activation functions.

The left panel of Figure 4 shows the direct FCM translationof the original PSOT model. The left panel of Figure 6 showsthe intensity plot of the causal edge connection matrix forthe FCM translation of the original PSOT model. The FCM-PSOT models in Figure 4 retain the general PSOT structure.But the edge values are specific to the case of al-Qa’ida asresearchers have reported in Figure 2.4 of [1] and Figure S.2of [2].

3.2 The Dynamic FCM-PSOT ModelThe PSOT factor tree gives a static snapshot of the stateof public support for insurgency and terrorism. This makesthe model useful for causal attribution at single points intime. This first-order static model does not require that wecorrectly identify causal cross-linkages among the factors.But the absence of cross-linkages can make long-run dynamicsimulations misleading. An FCM with no feedback loopsconverges in at most L steps where L is the length of thelongest chain in the FCM. Such feedforward or feedback-freemodels rarely give an accurate model of the real world andits causal interconnections. A dynamic time-varying PSOTmodel would need to identify such cross-linkages to tracksystem behavior over time. A dynamic model gives causalattribution at snapshots in time and the ability to simulatelong-run what-if scenarios.

Causal cross-linkages specify how sets of two or morefactors co-vary in time. And they do so based on causalrelationships. Causal cross-linkages are difficult to specifywithout insight into the causal laws that guide the relatedfactors. Domain experts are the main source of thesecausal relations. The PSOT model relies on domain experts,extensive social science research, and validation to establishthe snapshot relationships presented [2, 1, 33]. Specifyingnew causal cross-links in the model will require more suchinputs from experts.

Our goal was to produce a dynamic causal model of publicsupport for insurgency and terrorism in which new edgesmodel covariation in time among model factors. The newedges needed grounding in subject-matter expertise. So wereviewed prior work on PSOT for information on factorcovariation. We also consulted with PSOT authors and expertson the PSOT model for guidance on the new causal edgesthat we added. The new causal edges transformed the PSOT

model from a static snapshot model into a dynamic simulationmodel. Figure 4 shows FCM versions of the old static andnew dynamic PSOT model.

We now outline these changes to the original PSOT model.We first added a weak self-excitation feedback loop on thePSOT concept node because it is the highest-level conceptnode. This self-excitation loop modeled inertia in aggregatepublic opinion about insurgency and terrorism. This newfeedback source induced a weak serial correlation in timein the PSOT concept node.

The next directed weak edges connected the top-levelfactors in Figure 2 from left to right: EFF →MOTV ,MOTV → PLEG, and PLEG→ ACR. These directedcausal edges made explicit an implicit point about O’Mahonyand Davis’s use of factor trees. Their factor-tree representationassumed a left-to-right dependence of the top-level factorsthat we have linked [32, 1]. This implicit dependence madetheir factor tree more readable. The FCM model made thisdependence explicit.

O’Mahony and Davis [1] discuss other dynamicaugmentations to the PSOT model. They point to thefollowing new factors. A history of successes or failurescan affect motivation and perceived risks. We model thisdependence with the two factors “history of successes”and “history of failures.” These two nodes exert opposinginfluence on MOTV and prsk. We split this history factorbecause traditional FCM models admit only positive valuesthat represent the degree or intensity to which a conceptoccurs. And the effectiveness of the organization factor EFFpartly determines the history of successes: EFF → hsucc.Unacceptable group behavior ugb also influences motivationand effectiveness: and ugb→MOTV and ugb→ EFF .

4 Simulation Experiments and DiscussionWe first compare the behavior of the FCM-PSOT and dynamicFCM-PSOT models in the previous Sections 3.2 and 3.2.Then we examine methods for adaptively updating our fuzzycausal maps by using expert opinion or hard data or by usingboth. Factor tree models do not have access to these updatemethods.

4.1 Comparing the Static and DynamicFCM-PSOT Models

The PSOT FCM mimics the behavior of the original PSOTfactor tree model because it is a direct FCM version of thefactor tree model. This suggests that FCM models may bea richer classs of models than factor tree models becausethey act as feedback-laden supersets of tree or acyclic models.Further causal analyses may well find classes of factor treesthat FCM models cannot easily capture.

Augmenting the PSOT with cross-links in the dynamicmodel allows richer representation of system dynamics. Thedynamic and static FCM models agree on many inferencetasks. Both models often converged to similar fixed points orlimit cycles given the same initial conditions. But examplesof diverging behavior did appear. Network-analytic measuressuch as vertex degree and vertex centrality measures on theFCMs helped induce such divergent behavior.

Consider a hypothetical al-Qa’ida-like insurgent groupthat has similar PSOT model weights. Call the group the


4.1 Comparing the Static and Dynamic FCM-PSOT Models 4 SIMULATION EXPERIMENTS AND DISCUSSION

0.249

0.249

0.249

0.199

0.199

0.125

0.249

0.125

0.199

0.249

0.249

0.125

0.1250.249

0.125

0.199

0.199

0.249

0.125

0.249

0.249

-0.249

-0.249

0.249

0.249

0.125

0.549

0.249

0.249

0.299

0.125

0.125

0.125

lead

pkg

rsrc

opp

pres

EFF

reli

socs

glryATT

duty

rwrd

MOTV

intl

rvng

cprop

despPLEG

intm

lvic

prsk

scstACR

id

shgr ugb

env

impl

hsucc

mgtc

prop

efdoc

hfail

PSOT*

0.249

0.249

0.2490.199

0.199

0.125

0.125

0.125

0.249

0.125

0.199

0.249

0.249

0.125

0.125

0.125

0.249

0.125

0.1990.199

0.249

0.125

0.125

0.249

0.249

-0.249

-0.249

0.249

0.249

0.125

0.5490.249

0.249

0.299

-0.249

-0.249

0.199

-0.599

0.1250.1250.125

-0.199

0.599

0.1

lead

pkg

rsrcopp

pres EFF

reli

socs

glry

ATT

duty

rwrd

MOTV

intlrvng

cpropdesp

PLEG

intm

lvic

prsk

scstACR

id

shgrugb

env impl

hsucc

mgtcpropefdoc

hfail

PSOT*

Figure 4. Two FCM implementations of the PSOT factor-tree model. The left panel shows the FCM digraph for the original static(acyclic) PSOT model. The right panel shows the FCM digraph of the dynamic PSOT model with cross-links. Table 1 gives the key forthe concept node labels in both FCMs. We based the new FCM edges in the right digraph on the findings in [1] or on expert input.

Salafist United (SU). SU’s leadership is charismatic andcompetent as well as effective. Imagine Osama bin Ladenor Abdullah Ocalan of the Kurdistan Workers’ Party withcareful ideological framing of their group’s message andcause. This might be Salafism itself. We may assume alsothat those already inclined towards the group are culturallyand ideologically comfortable with SU’s violence. The grouphas embedded in a community that shares the group’s strongMuslim identity. And SU’s militant jihadi framing makes theircause attractive to many Muslims.

Suppose that SU has a history of failed operations despiteits effective organization. Suppose further that the group hasnot made good use of political opportunities. The publicbelieves that SU has a good chance of success. But SUroutinely intimidates the public with violent or threateningtactics. These tactics impose both social costs and personalrisks on many members of the community. The group canbring only limited money and labor to bear on their campaign.

The scenario gives the following coding for the model. Thefactors lead, hfail, pkg, pres,MOTV, cprop, intl, intm,and lvic remain active throughout the evolution of thisscenario. The factors rsrc, opp, prsk, and scst remaininactive during the simulation.

Both the static and dynamic FCM-PSOT models unfoldedin time through the stated scenario constraints and initialconditions. The concept nodes used logistic activationfunctions. And both models converged to fixed-pointattractors instead of to limit cycles. The static modelconverged in 4 iterations to a fixed state that predicted littlepublic support for SU. The dynamic FCM-PSOT convergedin 11 iterations to a fixed state that predicted medium-to-high public support for SU. We also started the models fromrandom initial states under the same constraints. Most of theseperturbations died out before the FCMs converged to one ofthe fixed points.

The dynamic model predicts that a violent terrorist groupcan retain public support in a community that shares itsreligious or ideological beliefs and cultural propensities. Itcan retain that support despite a history of failure or a lackof resources. It can do so at a high cost or even violence tothe community in which the group acts. The static factor-tree-based model opposes this finding.

There is no easy way to validate the FCM-PSOT modelsbecause there is no clear ground-truth for such hypotheticalsimulations. But a good causal model can still give patternpredictions. The model can help with exploratory analysisand detecting trends.

Consider how the dynamic FCM-PSOT model can helpexplore the effects of transient events. We omit the detailedvector-matrix operations and just qualitatively describe theresulting equilibrium FCM limit cycle.

Suppose the public does not support an insurgent group’scause (MOTV = 0). The group seems unlikely to succeed(lvic = 0). The public does not find the costs and risks ofsupporting this losing insurgency acceptable (ACR = 0).

Suppose now that some short-term event or shock occursthat strongly motivates the public to support the insurgency’scause (MOTV → 1) and that causes the public to believe thatthe group could win (lvic→ 1). Then the FCM-PSOT modelpredicts that the cost and risk of support will fall enoughto become acceptable (ACR = 1). This leads the group tobecome bolder in its use of intimidation tactics (intm→ 1).But the group’s gains will be short-lived. There will be nowidespread public support for the insurgency if the shockingevent is too short-lived to sustain public motivation for thecause (MOTV → 0 and lvic→ 0). The public will onceagain find that the cost of support is too high. But supposeinstead that the public motivation and the assessment of thelikely victor remain steady (MOTV = 1 and lvic = 1. Thiscould reflect an abusive government in rapid decline. Then


4 SIMULATION EXPERIMENTS AND DISCUSSION4.3 Learning Causal Structure with Differential Hebbian Learning

the cost and risk will stay acceptable (ACR = 1). The publicwill eventually have sustained support for the insurgency(PSOT → 1).

We can also argue for the value of the dynamic FCMmodel’s findings by identifying insurgent organizations thatresemble those in our scenario and that have managed tomaintain public support. Resource mobilization problemspose a common concern for smaller extra-legal groups suchas Uganda’s Lord’s Resistance Army. Online radicalizationappears to be replacing such resource mobilization problems.A related concern is a history of failure and the useof intimidation and violence against the locals as withColombia’s FARC rebels.

This argument by analogy is not itself a robust method ofvalidation. It requires a comprehensive analysis of a largersampling of terrorist and insurgent operations. The main wayto build confidence in FCM inferences is to carefully buildthe causal knowledge model from representative evidenceand then compare these what-if inferences or predictions withobserved patterns or outcomes.

200 400 600 800Iteration

0.2

0.4

0.6

Edge

Value, ei,j

Adaptively Learned Edge Weight

(Dashed line is true edge weight)

ei, jlvicACR

Figure 5. Learning FCM causal-edge values eij with time-seriesdata from activated causal concept nodes. The directed causaledge lvic→ ACR between Assessment of likely victor andAcceptability of costs and risks is uncertain in the original PSOTmodel. We can infer the value of this directed causal edge withadaptive inference algorithms such as differential Hebbianlearning if we have access to the time-series data history of bothconcept nodes. The time-series data may come from survey dataor field measurements or expert elicitations. The plot shows thatdifferential Hebbian learning quickly converged to a goodapproximation of the edge value eij .

4.2 Causal Knowledge Updating by AveragingExpert Responses

.The tree-structure of the original PSOT model implies that

it inherits a key structural limitation of AI decision trees.There is no easy or natural way to combine or fuse severalexpert trees into a representative knowledge structure that isstill a tree. Cycles too easily appear in general as the numberor sample size m of fused experts increases. Combining evena small number of expert trees is likely to produce somecycles and hence feedback loops in the combined knowledgestructure.

Some form of ad hoc cycle clipping must ensure that thecombined trees produce a tree. But removing causal cycles

removes some of the very expert knowledge that the treestructure tries to capture. And it does so solely to maintainthe tree structure. FCM models are in this sense at least asexpressive as factor tree models. And again they benefit fromthe strong law of large numbers if the combined expertsbehave at least approximately as independent and identicallydistributed knowledge sources [11, 13].

Figure 7 shows how such model updates can occur. Themixture or convex combination of FCMs creates a new fusedFCM as the weighted averages of the FCMs’ augmentedsimply matrices. Users can add new factors at will. Each newfactor converts the n-by-n adjacency matrices into n+ 1-by-n+ 1 adjacency matrices. This amounts to adding a newzero-padded row and column to an adjacency matrix if itscorresponding FCM does not include the factor as a conceptnode.

This fusion averaging technique may not directly accountfor effects such as active sabotage or extreme variance inexpert opinions. Highly variable expert inputs will tendto produce a highly variable FCM causal knowledge base.There may be no benefit from combining expert edge valuesthat approximate thick-tailed probability densities. Cauchyprobability bell curves closely resemble normal probabilitybell curves. Cauchy bell curves have slightly thicker tailsthat give rise to far more variable realizations. But thesample average of Cauchy random variables is itself aCauchy random variable. So there is no benefit or decreasein system variance whatsoever in this thick-tailed case. Thecombined result has the same infinite variance that any oneof the individual Cauchy samples has. Combining knowledgesources with even thicker-tailed probability densities canproduce variability even more extreme than the variabilityof any of the combined knowledge sources.

Model averaging for FCMs helps update our knowledgeof causal edge values. But we may also want to updateour knowledge of the causal concepts Ck(t) as well. Wecan base this on data from expert opinion surveys, fromdirect time-series data on measurable factors, or from indirectinstrumental variables linked to the factors of interest. Suchinstrumental variables include social media trends, Googletrends, and topic modeling on news corpuses.

4.3 Learning Causal Structure with DifferentialHebbian Learning

We draw a learning distinction between factor correlation andfactor covariation. Consider first factor correlation.

Directed causal edges induce correlations between linkedfactors. These correlations themselves need not indicate acausal dependence. We can estimate these correlations fromobservations given enough samples and effort. This estimationof correlative links requires time-series data about variationin the factors. The two learning approaches below assume forsimplicity that there are no time delays between factors. Userscan easily insert such time delays as needed.

We can learn causal edge strengths through the concomitantactivation among the factor pairs. This approach assumes thatevents (factor activities) are more likely to involve a causalconnection if the events occur together [36, 12, 37]. Thissuggests the well-known Hebbian correlation learning law(neurons that fire together wire together) for training neural


4.3 Learning Causal Structure with Differential Hebbian Learning4 SIMULATION EXPERIMENTS AND DISCUSSION

Figure 6. Graphical display of the causal-edge connection matrices E for the original static PSOT FCM on the left and for thedynamical PSOT FCM on the right. Each FCM’s causal-edge or connection matrix E is the adjacency matrix for the FCM’s fuzzysigned directed graph. Each square shows the fuzzy causal edge value eij that denotes how much the ith concept Ci influences thejth concept Cj . The matrix entries eij in these FCMs are fuzzy values in the bipolar interval [−1, 1]. Blue squares represent negativecausal influence as the color bars indicate. Orange squares represent positive causal influence. White squares represent the absenceof causal influence. These matrix intensity plots are larger scale analogues of the matrix in Equation (22) but for a larger set ofconcepts. The dynamic FCM-PSOT is marginally less sparse than the static FCM-PSOT because the dynamic FCM-PSOT assertsmore directed causal edge values between factors.

1

1

1

1

11

1

HCP

stas

inju

HCF

1

11

1

1

HCP

stasinju1

1

1

2

3

1

12

3

HCP

stas

inju

HCF

� � � �

� � � �

� � � �

� � � �

� � �

� � �

� � �

� � � ��

� � � �

� � � ��

� � � �

Figure 7. FCM knowledge combination or fusion by averaging weighted FCM adjacency matrices. The three digraphs show theminimal case of combining two FCMs that have overlapping concept nodes. The third FCM is the weighted combination of the firstFCMs. The expert problem domain is the medical problem of strokes and blood clotting involving Virchow’s Triad: blood stasis stas,endothelial injury inju, and hypercoagulation factors HCP and HCF [13]. Expert 1 has a larger FCM than Expert 2 has becauseExpert 1 uses an extra concept node. We can fuse their knowledge webs by averaging their causal-edge adjacency matrices with (25).This weighted average uses each expert’s causal edge matrix E = 2

3E1 +

13E2 as shown in the combined (third) FCM. The

weighting assumes that the first expert is twice as credible as the second expert. Note that Expert 2 ignored the HCF factor. Thisresults in a new row and column of all zeroes in E2.

network synaptic weights [12]:

eij = −eij + CiCj (26)

where x denotes the time derivative of the signal x. Thepassive decay term −eij stabilizes the learning in thedifferential-equation model. It also models a “forgetting”


5 CONCLUSION

constraint that helps the network prune inactive connections.The product term CiCj directly models concomitantcorrelation.

We can alternatively use concomitant variation [38] intime between factors as partial evidence of a causal relationbetween those factors. Suppose that the data indicate thatincreases in factor Ci occur at the same time as increasesin the factor Cj . This concomitant increase suggests thatthe edge value eij should be positive. Suppose similarlythat decreases in Ci occur with decreases in Cj . Then suchconcomitant decrease suggests a negative causal edge valueeij . (Even a slight time lag can between the two conceptnodes can indicate the direction of causality in practice.)Such concomitant variation leads to the differential Hebbianlearning (DHL) law [37, 12, 11]:

eij = −eij + CiCj . (27)

Both Hebbian learning and DHL can learn causal edgevalues in a FCM. But Hebbian learning grows spurious causalrelations between any two concept nodes that occur at thesame time. This quickly leads to matrix of nearly all unityvalues if most of the nodes are active. DHL correlates nodevelocities and thus has a type of arrow of time built into it.DHL correlates the signs of the time derivatives. So it grows apositive causal edge value eij if and only if the concept nodesCi and Cj both increase or both decrease. It grows negativeedge value if and only if one of the nodes increases and theother decreases.

We can combine both learning laws to give a more generalversion of DHL [11]:

eij = −eij + CiCj + CiCj . (28)

This hybrid learning law fills in expected values for edgeweights when there is no signal variation in the factor set [39].It takes advantage of the relatively rarer variation events toupdate the edge weights. It tends to produce limit cycles orother attractors. It can produce fixed-point attractors givensome strong mathematical assumptions [11, 12].

We can use the DHL learning scheme to infer causalweights in FCM-PSOT models if we have access to adequatetime-series data. Such data can again come from expertopinion surveys, from direct time-series data on measurablefactors, or from indirect instrumental variables linked to thefactors of interest: social media trends, Google trends, ortopic modeling on news corpuses. Figure 5 shows a DHLlearning path for a single causal edge value. The algorithmused data under a hypothetical relationship for an uncertainlink lvic→ ACR in the PSOT model. Complete concept-node data let the DHL algorithm learn the causal edge matrixE. We found that DHL training gave a close approximationof the true causal edge values after only a few iterations.

Our simulations with adaptive FCMs used the followingdiscretized version of the DHL learning law [9] in (27):

eij(t+ 1) =

{eij(t) + µ [∆Ci(t)∆Cj(t)− eij(t)] ∆Ci(t) 6= 0

eij(t) else(29)

where ∆Ck(t) = Ck(t)− Ck(t− 1).We note last that we can fuse soft and hard knowledge

sources through the above averaging technique in (25). Let

Edata denote the data-driven FCM. Let Eexp denote theexpert-elicited FCM. Then the fused causal edge matrixEfusion is a simple mixture of the two edge matrices:

Efusion = ωdataEdata + ωexpEexp . (30)

Then (29) or some other statistical learning law can continuethe adaptation process by using new numerical data oroccasional opinion updates from experts.

5 ConclusionWe developed static and dynamic FCM versions of a factortree model of public support for insurgency and terrorism.The FCM models allow forward-chaining causal inference aswell as updates based on numerical training data or expertopinions. Their underlying matrix structure permits naturalknowledge fusion that tends to improve with the number ofcombined experts.

Current FCM learning techniques involve two majorlimitations. FCMs do not easily permit backward chainingto answer which input cause produced an observed outputeffect. Users cannot simply run the FCM in reverse becauseof the node nonlinearities. The best that current techniquesallow is to find one of the many input states that map to anobserved output equilibrium. Future research needs to addressthis limitation with new inferencing or other techniques. Thesecond limitation is even more challenging. Current adaptivetechniques infer and tune causal edge values only for a knownset of concept nodes. An open research question is how to usedata-based techniques to infer new or missing concept nodesin large-scale FCM causal models.


REFERENCES REFERENCES

References[1] Paul K Davis and Angela O’Mahony. A computational model

of public support for insurgency and terrorism: A prototypefor more-general social-science modeling. RAND Corporation,2013.

[2] Paul K Davis, Eric V Larson, Zachary Haldeman, MustafaOguz, and Yashodhara Rana. Understanding and InfluencingPublic Support for Insurgency and Terrorism. RANDCorporation, 2012.

[3] Raymond Ibrahim. The Al Qaeda Reader: The Essential Textsof Osama Bin Laden’s Terrorist Organization. BroadwayBooks, 2007.

[4] Maajid Nawaz. Radical: My journey from Islamist extremismto a democratic awakening. Random House, 2015.

[5] Michael Glykas, editor. Fuzzy Cognitive Maps. Springer, 2010.[6] E.I. Papageorgiou. Fuzzy Cognitive Maps for Applied

Sciences and Engineering: From Fundamentals to Extensionsand Learning Algorithms. Intelligent Systems ReferenceLibrary. Springer Berlin Heidelberg, 2013. ISBN9783642397394. URL https://books.google.com/

books?id=S3LGBAAAQBAJ.[7] Bart Kosko. Fuzzy cognitive maps. International Journal of

man-machine studies, 24(1):65–75, 1986.[8] Bart Kosko. Fuzzy systems as universal approximators. IEEE

transactions on computers, 43(11):1329–1333, 1994.[9] B. Kosko. Fuzzy Engineering. Prentice Hall, 1996.

[10] Osonde Osoba, Sanya Mitaim, and Bart Kosko. Bayesianinference with adaptive fuzzy priors and likelihoods. IEEETransactions on Systems, Man, and Cybernetics, Part B(Cybernetics), 41(5):1183–1197, 2011.

[11] Bart Kosko. Hidden patterns in combined and adaptiveknowledge networks. International Journal of ApproximateReasoning, 2(4):377–393, 1988.

[12] B. Kosko. Neural Networks and Fuzzy Systems: A DynamicalSystems Approach to Machine Intelligence. Prentice Hall,1991.

[13] Rod Taber, Ronald R Yager, and Cathy M Helgason.Quantization effects on the equilibrium behavior of combinedfuzzy cognitive maps. International Journal of IntelligentSystems, 22(2):181–202, 2007.

[14] Paul K Davis, David Manheim, Walter L Perry, and John SHollywood. Causal Models and Exploratory Analysis inHeterogeneous Information Fusion for Detecting PotentialTerrorists. RAND Corporation, 2015.

[15] Paul K Davis, David Manheim, Walter L Perry, and JohnHollywood. Using causal models in heterogeneous informationfusion to detect terrorists. In 2015 Winter SimulationConference (WSC), pages 2586–2597. IEEE, 2015.

[16] Bart Kosko. Fuzzy knowledge combination. InternationalJournal of Intelligent Systems, 1(4):293–320, 1986.

[17] Rod Taber. Knowledge processing with fuzzy cognitive maps.Expert Systems with Applications, 2(1):83–87, 1991.

[18] J. A. Dickerson and B. Kosko. Virtual worlds as fuzzy cognitivemaps. Presence, 3(2):173–189, 1994.

[19] L. A. Zadeh. Fuzzy sets. Information and Control, 8:338–353,1965.

[20] Lotfi A. Zadeh. Outline of a new approach to the analysis ofcomplex systems and decision analysis. IEEE Transactions onSystems, Man, and Cybernetics, 3(1):28–44, 1973.

[21] Bart Kosko and Satoru Isaka. Fuzzy logic. Scientific American,269(1):62–7, 1993.

[22] Lotfi Asker Zadeh. Probability measures of fuzzy events.Journal of mathematical analysis and applications, 23(2):421–427, 1968.

[23] Osonde Osoba and Bart Kosko. The noisy expectation-maximization algorithm for multiplicative noise injection.Fluctuation and Noise Letters, page 1650007, 2016.

[24] Bart Kosko. Fuzzy entropy and conditioning. Informationsciences, 40(2):165–174, 1986.

[25] B. Kosko. Probable Equality, Superpower Sets, andSuperconditionals. International Journal of Intelligent Systems,19:1151 – 1171, 12 2004.

[26] David Lewis. Probabilities of conditionals and conditionalprobabilities. The Philosophical Review, 85:297–315, 1976.

[27] David Lewis. Probabilities of conditionals and conditionalprobabilities–part two. The Philosophical Review, 95(4):581–589, 1986.

[28] Bart Kosko. Bidirectional associative memories. IEEETransactions on Systems, Man, and Cybernetics, 18(1):49–60,1988.

[29] Joel B Predd, Daniel N Osherson, Sanjeev R Kulkarni, andH Vincent Poor. Aggregating probabilistic forecasts fromincoherent and abstaining experts. Decision Analysis, 5(4):177–189, 2008.

[30] Judea Pearl. Causality. Cambridge university press, 2009.[31] Gregory F Cooper. The computational complexity of

probabilistic inference using bayesian belief networks.Artificial intelligence, 42(2-3):393–405, 1990.

[32] Paul K Davis. Primer for building factor trees to representsocial-science knowledge. In Proceedings of the WinterSimulation Conference, pages 3121–3135. Winter SimulationConference, 2011.

[33] Paul K Davis and Kim Cragin. Social science forcounterterrorism: Putting the pieces together. RANDCorporation, 2009.

[34] David A Snow, Sarah A Soule, and Hanspeter Kriesi. TheBlackwell companion to social movements. John Wiley &Sons, 2008.

[35] Paul K Davis and James H Bigelow. Experiments inmultiresolution modeling (mrm). RAND Corporation, 1998.

[36] Donald Olding Hebb. The organization of behavior: Aneuropsychological approach. John Wiley & Sons, 1949.

[37] B. Kosko. Differential Hebbian learning. In AIP ConferenceProceedings, volume 151, pages 277–282, 1986.

[38] J.S. Mill. A System of Logic, Ratiocinative and Inductive:Being a Connected View of the Principles of Evidence andthe Methods of Scientific Investigation. Number v. 1 inA System of Logic, Ratiocinative and Inductive: Being aConnected View of the Principles of Evidence and the Methodsof Scientific Investigation. John W. Parker, 1843. URL https://books.google.com/books?id=y4MEAAAAQAAJ.

[39] Bart Kosko. Unsupervised learning in noise. IEEETransactions on Neural Networks, 1(1):44–57, 1990.


https://books.google.com/books?id=S3LGBAAAQBAJ

https://books.google.com/books?id=S3LGBAAAQBAJ

https://books.google.com/books?id=y4MEAAAAQAAJ

https://books.google.com/books?id=y4MEAAAAQAAJ

Fuzzy Cognitive Maps of Public Support Modeling and ...sipi.usc.edu/~kosko/FCM-PSOT-Nov-2016.pdfsimulation capabilities of some FCM-PSOT models. Section4 summarizes the FCM PSOT experiments

Documents