Pseudo-Static Cooperators: Moving Isn’t Always about Going Somewhere

Pseudo-Static Cooperators: Moving Isn’t Always about Going Somewhere

Olaf Witkowski, Nathanael Aubert-Kato

University of Tokyo, [email protected], [email protected]

Abstract

The evolution of cooperation has long been studied in GameTheory and Evolutionary Biology. In this study, we inves-tigate the impact of movement control in a spatial versionof the Prisoner’s Dilemma in a three dimensional space. Apopulation of agents is evolved via an asynchronous geneticalgorithm, to optimize their strategy. Our results show thatcooperators rapidly join into static clusters, creating favor-able niches for fast replications. Surprisingly, even though re-maining inside those clusters, cooperators keep moving fasterthan defectors. We analyze the system dynamics to explainthe stability of this behavior.

IntroductionThe problem of the evolution of cooperation has been of in-terest for a long time. This problem is often tackled by usingsimple models, such as considering interactions to be a gameof Prisoner’s Dilemma (PD). Early results in game theoryshowed that cooperation in the case of well-mixed popula-tion was not a given (Axelrod and Hamilton 1981; Smith1982), yet it is a very common phenomenon in nature.

The PD is a classic two-player “game” in which playersare given two options: cooperate (C) or defect (D). The pay-offs are such that T > R > P > S, where T stands forTemptation (D versus C), R for Reward (C versus C), P forpunishment (D versus D) and S for Sucker’s payoff (C ver-sus D). It is also often admitted that 2R > T + S, meaningthat cooperating is overall better for the whole system, whiledefecting is better for the individual. In particular, T > Rand P > S means that it is always the best choice for anindividual to defect, no matter the strategy of its opponent.In a system where everyone can interact with everyone else,without memory of past games or ways to distinguish oppo-nents, defecting is obviously the best strategy. However, ithas been shown that spacial locality helps cooperators sur-vive and even thrive (Nowak and May 1993).

This early work has triggered several lines of investiga-tion, in particular attempts to add movement. While resultscan be mixed in specific cases (Sicardi et al. 2009), it iswidely recognized that movement is helpful (Vainstein et al.

2007). Particular interest has been given to random move-ment (e.g. Chen et al. 2011; Gelimson et al. 2013). In thiscase, though, we argue that this movement acts as a wayto restrict the neighborhood of specific individuals, thus in-creasing locality. Diffusion (Vainstein and Arenzon 2014)is another example where the environment is sparse, allow-ing agents to move to empty areas. Interesting dynamicscan also be obtained when the agents can actually choose ontheir own when and/or where to move (Aktipis 2004, 2011).

In this work , we investigate the impact of limited move-ment control on agents in a three dimensional space. Agentsare all moving at a common constant speed, but choose theirdirection through the output of a neural network. We alsoadd the possibility to communicate, through the emissionof signal. Such communication might be similar to green-beards, a phenomenon where an otherwise useless pheno-type element is used to choose whether to cooperate or not(see for instance Gardner and West 2010). We argue, how-ever, that a slightly different mechanism is at work in ourcase. Indeed, since the signal is also an output of the neuralnetwork, agents can adapt their response to the environment.Signal may be used both to detect where friendly agents are,or as a way to choose a strategy. In this last case, coopera-tion can arise both from the fact that related agents will havesimilar signaling (similar to kin selection), or the adaptabil-ity of an external agent (mimicry). We show that, when leftto their own devices, cooperators will move more than de-fectors, even though their cluster is static. They also tend tocommunicate much more than defectors, displaying a com-plex dynamic to prevent defectors from taking over. We alsoshow that speed matters, as it impacts the radius of the clus-ter.

In the following, we describe the details of the model usedin our experiments. Then, we present the proportion of co-operators over time, and compare it to the static case (nomovement allowed). We also show other metrics, such asthe average displacement over time and the amount of re-ceived signal over time. We then analyse those results aswell as giving a simple condition on the survival of a clusterbefore concluding.

Figure 1: Graphical representation of the world in a simu-lation. Each agent is represented as an arrow indicating itscurrent direction. The color of an agent indicates its currentaction, either cooperation (blue) or defection (red). Note thecluster of cooperators being invaded by defectors.

ModelA population of agents move around in a three-dimensionalspace. Each one is playing the Prisoner’s Dilemma gamewith its direct neighbors. The strategies are evolved via acontinuous genetic algorithm, that is agents with high levelof fitness are allowed to replicate (with mutations) wheneverpossible.

EnvironmentAgents are placed in a three-dimensional world with pe-riodic boundary conditions. While most previous workfocuses on two-dimensional simulation, a third dimensiongives the system more freedom of movement, making it eas-ier to choose not to play (i.e. move away). The environmentis a toroidal cube of size 600 (arbitrary unit), where eachface connects directly to the opposite one. The world is con-sidered to be continuous, so that agents can get arbitrarilyclose to each other (Figure 1), up to the precision of the sim-ulation. Thus, the dimensionality of the simulation comesdown to the choice of the agent’s interaction radius.

We enforce a maximum size for the population. Thismakes it easier to compare, for instance, to lattices, wherethe number of agent also has a physical maximum due to thenumber of positions. Note that this maximum does not haveto be equal to the number of agents at any moment in thesimulation. This might also happen in lattices, for instancein Vainstein and Arenzon (2014) where partially empty lat-

tices are used to add a diffusion phenomenon.Finally, a given simulation is prevented from stopping

from lack of agents by adding one new random agent pertime step if the current population is below a threshold (seeTable 1).

Agents

Agents are given a certain energy, that also acts as their fit-ness. Each agent comes with a set of 12 different sensors.The neural network (represented on Figure 2) takes the in-formation from those sensors as inputs, in order to decidethe agent’s actions at every time step. The possible actionsamount to the agent’s movement, a Prisoner’s Dilemma ac-tion (cooperate or defect) and two output signals. The archi-tecture is composed of a 12 input, 10 hidden, 5 output, and10 context neurons connected to the hidden layer (see Figure2).

The agents’ motion is controlled by M1 and M2, out-putting two Euler rotation angles: ψ for pitch (i.e. eleva-tion) and θ for yaw (i.e. heading), with floating point valuesbetween 0 and π. Even though the agents’ speed is fixed,the rotation angles still allow the agent to control its aver-age speed (for example, if ψ is constant and theta equalszero, the agents will continuously loop on a circular trajec-tory, which results in an almost-zero average speed over 100steps).

The outputs S(1)out and S(2)

out control the signals emitted ontwo distinct channels, which are propagated through the en-vironment to the agents within a neighboring radius set to50. The choice for two channels was made to allow for sig-nals of higher complexity, and possibly more interesting dy-namics than greenbeard studies (Gardner and West 2010).

The received signals are summed separately for each di-rection (front, back, right, left, up, down), and weightedby the squared inverse of the emitters distance. This way,agents further away have much less impact on the sensorsthan closer ones do. Every agent is able to receive signalson the two emission channels, from 6 different directions,totalling 12 different values sensed per time step. For exam-ple, the input S(6,1)

in corresponds to the signals reaching theagent from the neighbors below.

Fitness

At every time step, agents are playing a N-player version ofthe prisoner’s dilemma with their surrounding, meaning thatthey make a single decision that affects all agents aroundthem. They get reward and/or punishment based on the num-ber of cooperator around them. Their decision is one of theoutputs of their neural network.

The payoff matrix is an extension of Chiong and Kirley(2012), where we added the distance to take into account the

Figure 2: Architecture of the agents controller, composedof 12 input neurons, 10 hidden neurons, 10 context neuronsand 5 output neurons.

spatial continuity. It is defined by:

C : b∑

coop∈radius

1

1 + distance(coop,me)

−c∑

any∈radius

1

1 + distance(any,me)

D : b∑

coop∈radius

1

1 + distance(coop,me)

(1)

With b the bonus, c the cooperation cost, b > c > 0,and distance the Euclidian distance between two agents. Ra-dius represent the sphere of radius radius around the agent.Note that the agent itself is not considered part of its neigh-borhood. The distance is not part of the original fitness,which made sense since Chiong and Kirley (2012) are bas-ing their simulation on a lattice, where the distance is alwaysthe same. Our version integrates nicely the fact that interac-tions with distant agents should be much weaker than withcloser ones.

Another advantage of this fitness is that defection can alsobe assimilated to not playing (no cost). Note that there isalso no cost and no reward for cooperating when alone.

We can see that this fitness is equivalent to the traditionalPD game, since, for two agents A and B at a distance d ofeach other, (1) yields the payoff matrix:

C D

C(b− c)1 + d

− c

1 + d

Db

1 + d0

It is clear that for the conditions b > c > 0, this matrixcorrespond to a PD.

Based on the outcome of the match, agents can choose anew direction, which is similar to leaving the group in the

Initial energy 2Maximum age 5000Maximum energy 20Maximum population size 500Population threshold 100Reproduction threshold 10Reproduction cost 2Reproduction radius 2Survival cost per turn 2Mutation rate (per gene) 0.05

Table 1: Parameters used for the simulation.

walk away strategy (Aktipis 2004), the main difference be-ing that, in our case, it is also possible for groups to split. Itis also similar in another aspect: there is a cost to leaving agroup, as a lone agent may need time to meet others.

Evolution/ParametersEvolution is done continuously. Agents with negative orzero energy are removed, while agents with energy abovea threshold are forced to reproduce, within the limits of oneinfant per time step. The reproduction cost is low enough,considering the threshold, to not put the life of the agent atrisk. Table 1 indicates the various parameters used for evo-lution.

ResultsResults were obtained on a set of 10 runs, with additionalsets used for control. In our setting, all agents have a con-stant speed, but can choose in which direction they are head-ing. This allows for pseudo-static behaviors by looping incircles.

While some characteristics, such as agents’ movement,were strongly run dependent, the overall dynamics of thesystem was not. At the beginning of the run, the envi-ronment is seeded with random agents. Since all weightsin their neural network are set at random, roughly half ofthe agents initially choose to cooperate while the other halfchoose to defect. This leads to a fast extinction of coopera-tors (Figure 3, until approximately 50000 time steps), until agroup emerges strong enough to survive. The second phasefollows, in which cooperators are quickly increasing in num-ber due to the autocatalytic nature of this strategy (Figure 3).

A third step happens eventually, where defectors invadethe cluster, followed either by the survival of the cluster dueto cooperators running away or a reboot of the cycle. In caseof survival, oscillations in the proportion of cooperators canbe observed. However, this phenomenon is averaged awayover multiple runs, since period and phase of the oscillationsare not correlated from one experiment to the other. Figure4 shows those oscillations in a typical run. The frequency ofthose phenomenon is shown in Table 2.

Figure 3: First quartile, average and third quartile of cooper-ation proportion over 20 runs. Note that agents may chooseat each time step which action (cooperation or defection)they will perform, leading to high-frequency noise.

Figure 4: Proportion of cooperating agents in a typical run.Clear oscillations between the “high cooperation” state andthe “low oscillation” state are observable.

Minimum 2First quartile 2.5

Median 4Third quartile 8

Maximum 9Average 5

Table 2: Number of oscillations between high and low co-operations over 106 time steps in ten runs

As a control, we ran the simulation after removing thepossibility for agents to move. In this case, cooperatorshave much less to fear from defectors and quickly overtakethe whole population while defectors quickly exhaust theirenergy as well as the energy of their cooperative neighbors(Figure 5). Were a defector to appear near a cluster of co-operators, the cluster would react by “reproducing away”.However, the chances to be overtaken by the defectors ismuch higher than in the dynamic case.

Figure 5: Average proportion of cooperators, comparisonbetween the static and dynamic cases.

Another control was to allow agents to have a neighbor-hood large enough to interact with all other agents, or aspeed such that the system is virtually well-mixed. In bothcases, the classical result holds, with an almost homoge-neous population of defectors, with the occasional cooper-ator obtained from random generation.

Finally, we observed the movement tendencies (figure 6)and signal transmission (figure 8) among the two groups ofagents. The average displacement is the norm of the totalmovement over 100 steps (an example for 5 steps is illus-trated at figure 7). It is interesting to note that, even thoughthey mostly stay in clusters, cooperators move more than de-fectors. In the next section, we will attempt to interpret thoseresults.

Analysis

The critical mass necessary for a cooperator to survive canbe computed from its surrounding and from the costs of co-operation (Nowak and May 1993). Let us note R the maxi-mum interaction radius, N the total number of agents insidethe neighborhood (excluding the cooperator itself), and nthe number of other cooperators in the radius. For the coop-erator to survive over time, the costs have to exactly balanceor be less than the benefits of cooperation. If we assumethat agents are homogeneously distributed in the euclidiansphere around our focus, we can rewrite the sum over allsurrounding agents weighted by the distance as an integral

Figure 6: Average displacement of agents over a 100 stepssliding window.

Figure 7: Illustration of the average displacement based on5 time steps

Figure 8: Average signal transmitted by cooperators and de-fectors.

over the densities ρcoop and ρall:

ρcoop =3

4· n

πR3

ρall =3

4· N

πR3

This gives us the equivalence:

∑coop

1

1 + dist'

∫ R

0

ρcoop ·1

1 + rdr

Which yields:

fitcoop = (bn− cN)3 ln(1 +R)

4πR3

Therefore the condition for survival is simply that the pro-portion of cooperators should be at least n

N = cb .

Note that this condition is strongly dependent on the ac-tual distribution of agents. The closer the cooperators, thestronger they are against external threats. Conversely, a de-fector at the very center of a group of cooperators can bemuch more damaging.

In previous work (Chen et al. 2011), it has been observedthat random mobility was helping cooperator, if the speed islow enough. However, in this case, this mobility has onlythe effect of reducing the neighborhood. Additionally, if thespeed is too high, the system gets to an almost well-mixedstate, with the expected results on cooperation. Note thateven the effect of high speed can be counterbalanced by amotion keeping the agents in a neighborhood.

In absence of movement, we have pseudo-movement aris-ing from cooperators dying near defectors. As a result, thecluster of cooperators “reproduces away” from its previousposition.

When movement is enabled, cooperators also appear inclusters, inside which they seem to be moving quickly. Thismainly results from the major phenomenon helping coop-erators, that is their autocatalytic tendencies, which mightbe a bias from the limit on the population size. If enoughcooperators are close to each other, they will keep their en-ergy high at all times, allowing them to reproduce as muchas possible. Once the population reaches its maximum ca-pacity, the cooperators typically represent a larger fractionof the population, especially when weighted by the energythey possess. For this reason, the cluster will remain stableuntil some agents die of old age, before being immediatelyreplaced by other cooperators with a high probability.

Also, this strategy might allow them to avoid spending toomuch time close to defectors, while remaining constantly inthe neighborhood of fellow cooperators.

The clustering is strongly dependent on signaling amongthe cooperating agents, hinted by the difference in signalemission between cooperators and defectors. Additionally,we performed two batch of five control runs with respec-tively signal on or off at all time. In the “off” case, no clus-ter can form, yielding a near-uniform population of defec-tors. The “on” case still shows qualitatively the emergenceof clusters, but are much more diffuse as signaling is nowambiguous.

ConclusionIn this article, we introduced a three-dimensional model ofagents playing the Prisoner’s Dilemma. While it can be ex-pected that cooperators, if any are present, would quicklyevolve to form clusters, it was interesting to see that they still

have a higher movement rate overall than defectors. This iseven more surprising considering that those clusters do notseem to move fast. Instead, analysis shows that cooperatorsare moving quickly inside the cluster, which may be a wayto adapt to an aggressive environment.

Additionally, comparison with the static case showed thatmovement made the apparition of cooperators harder, butmore stable in the long run. Since it is harder for defectorsto overtake a cluster of cooperators, our systems often showa soft bistability, meaning that they will eventually switchfrom one state to the other. It is even possible to observea sort of symbiosis, where cooperators are generating moreenergy than necessary, which is in turn used by peripheraldefectors. In this case, replacement rates allow cooperatorsto stay ahead, keeping this small ecosystem stable.

Finally, this cohesion among cooperators seems to be en-hanced by signaling, even though signal might attract defec-tors. Additional investigation on the transfer of entropy, forinstance, could be a promising next step.

AcknowledgementsWe would like to thank Julien Hubert for his valuable com-ments.

ReferencesAktipis, C. (2004). Know when to walk away: contingent move-

ment and the evolution of cooperation. Journal of TheoreticalBiology, 231(2):249–260.

Aktipis, C. (2011). Is cooperation viable in mobile organisms?simple walk away rule favors the evolution of cooperation ingroups. Evolution and Human Behavior, 32(4):263–276.

Axelrod, R. and Hamilton, W. D. (1981). The evolution of cooper-ation. Science, 211(4489):1390–1396.

Chen, Z., Gao, J., Cai, Y., and Xu, X. (2011). Evolution of cooper-ation among mobile agents. Physica A: Statistical Mechanicsand its Applications, 390(9):1615–1622.

Chiong, R. and Kirley, M. (2012). Random mobility and the evo-lution of cooperation in spatial n-player iterated prisonersdilemma games. Physica A: Statistical Mechanics and itsApplications, 391(15):3915–3923.

Gardner, A. and West, S. A. (2010). Greenbeards. Evolution,64(1):25–38.

Gelimson, A., Cremer, J., and Frey, E. (2013). Mobility, fitness col-lection, and the breakdown of cooperation. Physical ReviewE, 87(4):042711.

Nowak, M. A. and May, R. M. (1993). The spatial dilemmas ofevolution. International Journal of bifurcation and chaos,3(01):35–78.

Sicardi, E. A., Fort, H., Vainstein, M. H., and Arenzon, J. J. (2009).Random mobility and spatial structure often enhance cooper-ation. Journal of theoretical biology, 256(2):240–246.

Smith, J. M. (1982). Evolution and the Theory of Games. Cam-bridge university press.

Vainstein, M. H. and Arenzon, J. J. (2014). Spatial social dilem-mas: Dilution, mobility and grouping effects with imitationdynamics. Physica A: Statistical Mechanics and its Applica-tions, 394:145–157.

Vainstein, M. H., TC Silva, A., and Arenzon, J. J. (2007). Does mo-bility decrease cooperation? Journal of theoretical biology,244(4):722–728.

Pseudo-Static Cooperators: Moving Isn’t Always about Going Somewhere

Documents