Top Banner
Z theo~ BioL (1990) 142, 189-200 An ESS-analysis for Ensembles of Prisoner's Dilemma Strategies B. BOR~TNIKt, D. PUMPERNIK'~, I. L. HOFACKER~: AND G. L. HOFACKER:~ t Boris Kidri~ Institute of Chemistry, P.O. Box 30, 61115 Ljubljana, Yugo- slavia and ~-Lehrstuhl Fiir Theoretische Chemie, Technische Universitiit Miinchen, 8046 Oarching, F.R.G. (Received on 6 June 1989, Accepted on 25 August 1989) The ESS (Evolutionary Stable Strategy) concept of Maynard Smith can be applied in its weak form to ensembles of competing PD ("Prisoner's Dilemma") strategies memorizing two to three of one's own and one's opponent's moves. The format of our study is: (1) games have very long duration; (2) Taylor-Jonker dynamics applies; (3) Effects of finite population size can be ignored. It is shown that in the case R > (T+ S)/2 a set of strategies can be singled out which do not lose against any other strategy while co-operating with themselves. Such a set is uninvadable by other PD strategies if it constitutes more than half of the total population. 1. Introduction Prisoner's dilemma (PD) (Axelrod & Hamilton, 1981; Axelrod, 1984; Axelrod & Dion, 1988; Nowak & Sigmund, 1989a, b) is one of the simplest strategic games. For this reason it is frequently used as a proving ground on which theoretical or numerical schemes can be tested. For mathematical biologists game theory is of particular interest with regard to the dynamics of populations under evolutionary changes (Maynard Smith, 1982, 1984; Schuster & Sigmund, 1985). In a previous paper on molecular evolution (Bor~tnik et al., 1987) we interpreted mutational processes as a search for those sites in the protein sequence space which represent sequences contributing most to the fitness of a phenotype. Similarly, strategies can be coded in game theory like molecular information in genes and the rules of the game allow for the use of concepts analogous to biological fitness. In the PD case, fitness can be defined a quantity proportional to the rewards gained in mutual encounters and may therefore serve as a selective criterion in the dynamics of whole populations of strategies. The question of evolutionary stability of PD populations is of considerable current interest and to some extent also a matter of controversy (Axelrod & Hamilton, 1981; Boyd & Lorberbaum, 1987; Kurka, 1986). We consider deterministic strategies which compute their next move from past experience. To characterize the populations of competing strategies participating in a PD game, one can either search for evolutionarily stable strategies as defined by Maynard Smith (1982) or explicitly analyse the dynamics of populations (Taylor & Jonker, 1979). 189 0022-5193/90/020189+ 12 $03.00/0 © 1990 Academic Press Limited
12

An ESS-analysis for ensembles of Prisoner's dilemma strategies

Apr 21, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An ESS-analysis for ensembles of Prisoner's dilemma strategies

Z theo~ BioL (1990) 142, 189-200

An ESS-analysis for Ensembles of Prisoner's Dilemma Strategies

B. BOR~TNIKt, D. PUMPERNIK'~, I. L. HOFACKER~: AND G. L. HOFACKER:~

t Boris Kidri~ Institute of Chemistry, P.O. Box 30, 61115 Ljubljana, Yugo- slavia and ~-Lehrstuhl Fiir Theoretische Chemie, Technische Universitiit

Miinchen, 8046 Oarching, F.R.G.

(Received on 6 June 1989, Accepted on 25 August 1989)

The ESS (Evolutionary Stable Strategy) concept of Maynard Smith can be applied in its weak form to ensembles of competing PD ("Prisoner's Dilemma") strategies memorizing two to three of one's own and one's opponent's moves. The format of our study is: (1) games have very long duration; (2) Taylor-Jonker dynamics applies; (3) Effects of finite population size can be ignored.

It is shown that in the case R > (T+ S)/2 a set of strategies can be singled out which do not lose against any other strategy while co-operating with themselves. Such a set is uninvadable by other PD strategies if it constitutes more than half of the total population.

1. Introduction

Prisoner's d i lemma (PD) (Axelrod & Hamilton, 1981; Axelrod, 1984; Axelrod & Dion, 1988; Nowak & Sigmund, 1989a, b) is one of the simplest strategic games. For this reason it is frequently used as a proving ground on which theoretical or numerical schemes can be tested. For mathematical biologists game theory is of particular interest with regard to the dynamics of populat ions under evolutionary changes (Maynard Smith, 1982, 1984; Schuster & Sigmund, 1985). In a previous paper on molecular evolution (Bor~tnik et al., 1987) we interpreted mutational processes as a search for those sites in the protein sequence space which represent sequences contributing most to the fitness of a phenotype. Similarly, strategies can be coded in game theory like molecular information in genes and the rules of the game allow for the use of concepts analogous to biological fitness. In the PD case, fitness can be defined a quantity proport ional to the rewards gained in mutual encounters and may therefore serve as a selective criterion in the dynamics of whole populat ions of strategies. The question of evolutionary stability of PD populations is of considerable current interest and to some extent also a matter of controversy (Axelrod & Hamilton, 1981; Boyd & Lorberbaum, 1987; Kurka, 1986).

We consider deterministic strategies which compute their next move from past experience. To characterize the populations of competing strategies participating in a PD game, one can either search for evolutionarily stable strategies as defined by Maynard Smith (1982) or explicitly analyse the dynamics of populat ions (Taylor & Jonker, 1979).

189

0022-5193/90/020189+ 12 $03.00/0 © 1990 Academic Press Limited

Page 2: An ESS-analysis for ensembles of Prisoner's dilemma strategies

190 B. B O R ~ T N I K E T A L .

Despite the game's great simplicity, it is known and demonstrated here again that even finer details of the rules can strongly influence its outcome. Nevertheless, the results are categorical if interpreted in terms of appropriate concepts.

Adopting the most common and straightforward version of the rules (Axelrod & Hamilton, 1981; Axelrod, 1984, I987) means to favour co-operativity. The dynamics of populat ions within the frame of a Taylor & Jonker (1979) description will then not lead to very surprising distributions. We are able to show, however, that there exists a subset of PD strategies with special properties, called "self-co-operative non-losers (SCNL)" , which populates clusters stable in the sense of the Maynard Smith criterion of uninvadability.

SCNL strategies can be constructed algorithmically. They constitute a very appropr ia te concept in the dynamic analysis of interacting populations.

2. Rules of the Game and Strategies

Prisoner's Di lemma is a strategic game in which two opponents interact repeatedly in pairwise contests, each time choosing whether to co-operate (C) or defect (D). Depending on their mutual choice, they are rewarded the amount R for mutual co-operation, P for mutual defection. T is rewarded the first player if he defects while the second co-operates and S if vice versa.

Normally, payoff values are ordered in the way T > R > P > S. The game is then suited to study the evolution of co-operation. A co-operative mode prevails when both players co-operate repeatedly. I f it is not to be inferior with respect to more defective modes R, T and S should be related as R >- (T+ S)/2. Here we adopt the payoff parametrizat ion S = 0, P = 1, R = 3, T = 5 and the results of our numerical simulations are typical as long as the above condition holds.

Strategies are defined as a set of rules which determine the next move of a player based on the preceding moves of both players. Consider a strategy which takes into account the last n, own moves and n 2 opponent ' s moves. The number of different move combinations which two players can submit in n, , respectively n2, turns is m = 2 n,+"2. A game strategy should then determine the next move for each combina- tion of n~ + n_~ last moves which can be either C or D. Any strategy can thus be represented as a string of length m of C, D (or 1, 0) entries. The number of distinct strings is consequently equal to 2". This definition must be augmented for the initial steps of the game when the history comprises less than max (n~, ti2) moves. Therefore, each strategy has 2"'-" variants, where n,2 = max (n~, n2).

One can now calculate the relative efficiency for each pair of strategies by the game rules. Following the above methodology it is easy to simulate the pairwise contest for each pair of strategies. This means to account for an arbitrarily long sequence of turns which start with the initiation moves which are a part of the strategy. The sequence of moves becomes periodic, the latest after 2 ",+n-" steps. Under the hypothesis of very long duration of the contest the average payoff of both players can be determined on the basis of one period of moves while neglecting the initial sequences. Within this frame we worked out algorithms for the calculation

Page 3: An ESS-analysis for ensembles of Prisoner's dilemma strategies

E S S - A N A L Y S 1 S F O R P D D I L E M M A 191

of average rewards gained in pairwise encounters, leading to the payoff matrix (V). Its matrix elements V~j represent the reward gained by strategy i when playing against j.

In this way, the strategies can be divided into two classes, losers and non-losers. The strategies which never lose a game in pairwise encounters form the non-loser class and all the remaining strategies are losers. In terms of matrix elements the non-loser criterion can be formulated as follows. If the index i refers to a non-loser and j to an arbitrary strategy it is

Vii >- Vii. (1)

The larger sign in eqn (1) hold only in the case where j is a loser strategy. Among non-loser strategies there are some which co-operate with themselves, i.e.

Vii = R. (2)

We call these self-co-operative non-losers (SCNL) and it will be shown that they are distinguished by their evolutionary stability in the PD game.

3. Dynamics of Populations Interacting by Fixed PD Strategies

The average reward which a strategy wins within an ensemble of strategies can readily be related to the fitness concept. Among several possible assumptions for the development of PD playing populations the most reasonable and widely accepted takes the reproductive rate proportional to the average award gained by a strategy over the average reward of all strategies. This is expressed in the Taylor-Jonker (1979) replicator equation:

x~={ ~ Vijxj- ~ ~ VjkX~Xk}. (3) j = l j = l k = l

Equation (3) can easily be solved numerically by the trapezoid rule. A time step of d t = 0 . 1 is sufficiently short such that further reduction has no influence on the solution.

Individual solutions of the replicator equation display many interesting features (Nowak & Sigmund, 1989; Schuster & Sigmund, 1985) but, except for short-memory strategies, it is difficult to derive general rules for the evolution of populations from them. We therefore try to analyse the population dynamics in terms of SCNL-sets and subsets.

According to eqns (1) and (2) the submatrix containing the matrix elements from SCNL strategies is symmetric, with diagonal elements equal to R and off-diagonal elements less or equal to R. Moreover, within the SCNL strategies one can identify subsets of strategies which co-operate mutually (abbreviated CSS). We will show that, due to the replicator equation, the SCNL-CSS wins out as soon as it gains 50% of the total population.

Page 4: An ESS-analysis for ensembles of Prisoner's dilemma strategies

192 B. B O R ~ T N I K E T AL.

Suppose at an arbitrary time the components of the populat ion vector belonging to the SCNL-CSS be x~ . . . . . x, while the remaining strategies are given by x , + ~ , . . . , xN. The most direct way to predict in coarse terms the time evolution of the populat ion is then to evaluate the time derivative of the cumulative populat ion of SCNL-CSS strategies.

With

PA = i Xi, PB = 1 --PA (4) i= l

the time derivative of PA can be expressed through eqn (3) as follows:

Introducing Q,~ =~_.~.Zj~o Vox~xs, where a and /3 stand for the index intervals A - ( 1 , n), respectively B-= (n + 1, N ) eqn (5) can be written in the form

PA = PB (QAA 4- QAB ) -- PA( QBA + QBB). (6)

As A refers to SCNL-CSS strategies and there will be losers among the strategies in the B subset

QAA = pAR; QAB >- QBA; QsA <-- papeR, QBs <- p~R.

I f these relationships are inserted into eqn (5) one obtains

l)a >- ( P A -- PB )( PApBR -- QBA) + Ps( Q AB -- QaA) (7)

and both terms on the rhs of eqn (7) are positive provided that PA > Pn. This is an important property since it proves that as soon as a CSS grows beyond 50% of the total populat ion it becomes evolutionarily stable since its share in the populat ion will not decrease.

The above concepts are related to those introduced by Kurka (1986). In his nomenclature our SCNL-CSS are "final compat ible sets" from which no evolutionary transitions can lead away.

The question arises what one can say about the time evolution of populat ions in which no CSS are present at all, respectively where the latter constitute less than 50% of the population. In such cases we cannot present a simple analysis. The numerical simulations described later give evidence, however, that SCNL strategies, once introduced into the populat ion, will ultimately lead to a 50% CSS populat ion which takes over according to the conclusions based on eqn (7).

4. The Structure of Population in Terms of SCNL and CSS Strategies

In view of the importance of SCNL properties it seemed appropr ia te to analyse the dynamics of populat ions playing the PD game by the rules defined in section 2.

Page 5: An ESS-analysis for ensembles of Prisoner's dilemma strategies

E S S - A N A L Y S I S F O R P D D I L E M M A

TABLE 1

The total number of strategies, N .... the number of strategies which co-operate with themselves, No, and the number of S C N L strategies, NscN~, for six different classes of strategies defined by the pair n~, n2, nl refers to the number own, n2 of opponent's preceding moves determining the next step in the game. It is necessary to point out that N,o, is calculated as the number of possible realizations of coding. Since synonymous codes can exist, the number of distinct strategies is usually lower than N .... This remark applies also to Nc and NscNL. The detection of synonyms is rather time consuming. We found that there are two quadruplets of synonimous strategies among the

32 strategies with n~ = n 2 = 2

n~ n., n,o , N~ NscN~

0 1 8 3 1 0 2 64 17 4 1 2 1 024 272 32 0 3 2 048 382 37 2 2 262 144 69 632 > 2 6 0 3 3 1 -48 .1020 2 - 4 . 1 0 ~9 > 1 0 ~2

193

This could be done for up to three of one's own and one's opponent ' s moves remembered. Results are given in Table 1. Besides the total number of strategies the number of strategies which co-operate with themselves and the number of SCNL strategies are given.

In cases where the number of strategies is so large that it prohibits the search for SCNL strategies by scanning the (V) matrix, only lower bounds are given. Methods by means of which lower bounds can be estimated will be described later. It is obvious that the number of SCNL strategies grows with the total number of strategies but their share decreases.

It is most instructive to analyse in some depth a set of strategies of modest size, such as nt = 1, n2 = 2, where 32 SCNL strategies exist as listed in Table 2. The SCNL are, in fact, the ones possessing the standard optimal properties defined by Boyd & Lorberbaum (1987). They have the essential attributes of being nice and pro- vocable. Niceness is a prerequisit for seif-co-operativity and provocability is necessary to avoid defeat by strategies which are inclined to defect.

As far as forgiveness is concerned, members of the NL set are not uniform in this respect and one can find strategies exhibiting this trait f rom hardly any to large measure. It can roughly be quantified as the number of different move histories leading to co-operation. This number is equal to the number of C's in the representa- tion of strategies in Table 2. The Table is arranged such that forgiveness increases from top to bottom. The first line will thus display the "retal iator", a strategy

Page 6: An ESS-analysis for ensembles of Prisoner's dilemma strategies

194 B. BOR~TNIK E T A L .

TABLE 2

The list o f codes for 32 S C N L strategies in the nl = 1, n 2 = 2 case. The first two rows define the histories (h~, preceding own move; h2, two preceding opponent 's moves ordered in such a way that the last move is the character to the right. Subsequent rows code the moves ( C or D ) as the answer to the eight possible histories given above. In the columns to the right the initiation moves are given, on the left the strategies are enumerated. Strategy no. 1 is the Retaliator and nos 22, 23 are Tit f o r Tat. The codes

7, 16, 23 and 32, respectively 6, 14, 22 and 30, are synonimous

h I C C C C D D D D Code no. h z CC CD DC DD CC CD DC DD

1 c D D D D D D D CC 2, 3 C D C D D D D D DC, CC 4 C C D D D D D D CC 5 C D D D D D C D CC 6, 7 C D C D D D C D DC, CC 8 C D D D D C D D CC 9,10,11 C D C D D C D D DC, CD, CC 12 C C D D D C D D CC 13 C D D D D C C D CC 14, 15, 16 C D C D D C C D DC, CD, CC 17 C D D D C D D D CC 18, 19 C D C D C D D D DC, CC 20 C C D D C D D D CC 21 C D D D C D C D CC 22, 23 C D C D C D C D DC, CC 24 C D D D C C D D CC 25, 26, 27 C D C D C C D D DC, CD, CC 28 C C D D C C D D CC 29 C D D D C C C D CC 30,31,32 C D C D C C C D DC, CD, CC

co-opera t ing only if all the moves r emembered are co-operative. In the lower part of the Table , at posi t ions 22 and 23, one finds the strategy which repeats the o p p o n e n t ' s last move ("Ti t for Ta t" ) , usua l ly cons idered a pr ime cand ida te for an

opt imal strategy. The last three strategies (nos 30-32) are the most forgiving ones. They award with co-opera t ion not only the o p p o n e n t s co-opera t ing in their last move but also opponen t s which defected in the last move but co-opera ted one move

earlier, provided though their own last move was defect ion. Figure 1 gives est imates for the par t i t ion ing of the S C N L set with regard to

co-operat ive subsets (CSS). The greatest CSS consists of all the strategies out of the S C N L set which start the game with CC. The r ema in ing strategies form a dist inct pa t te rn which can be b rough t out by examin ing the e lements of the reward matrix.

With the help of Fig. 1, where circles mark non-co-opera t ive pairs, one can construct more a l ternat ive CSS. The usefulness of the loser -non- loser concept becomes even more appa ren t when appl ied to the results of a complete Taylor & Jonker dynamics

s tudy of the strategies p o p u l a t i o n with nl = 1, n2 = 2. T a y l o r - J o n k e r compu ta t ions were per formed in the fol lowing way. The complete

set of 1024 b inary strings was a t t r ibuted a un i fo rm init ial popu la t i on [xi(0) = 1/1024]

Page 7: An ESS-analysis for ensembles of Prisoner's dilemma strategies

zl 6 g

I0

e

~ a8

N 22

25

26

30

31

E S S - A N A L Y S I S F O R P D D I L E M M A 195

Strategy No.

0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @ 0 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O 0 0 0 O 0 0 0 O 0 0 0 O 0 0 D O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

FIG. 1. Th'e s t ruc ture of ope ra t i ng pai rs are m a r k e d

the S C N L set of s t ra tegies for n~ = 1, n_,=2 is d i sp layed . All the non-co- by circles. The e n u m e r a t i o n of s t ra tegies is the same as in Table 2.

and the time evolution of the population vector was followed by numerical integra- tion of eqn (1) up to t = 2 0 . This time span proved to be sufficient. The computer time spent was modest (8 hr on Micro Vax III). The main results are depicted in Fig. 2. Its most striking feature is the emergency of strategies CSS-SCNL with double co-operation CC as initiation moves. Towards the end of the time interval the system is close to a stationary state. Nearly all strategies remaining are mutually co-operating though a rest of about 20% non-SCNL strategies remains in the final population. The latter consists of loser strategies co-operating with all members of the CSS. They expanded only after the strategies which could eliminate them had died out. These loser strategies could be called "hitchhikers" in analogy to the term used in genetics for a gene which codes no-fitness-increasing properties but persists in the population due to its link to some proliferating supporter gene.

The results of Fig. 2 corroborate the general analyses given in the previous section. We showed that a CSS population is nondecreasing whence it rises beyond 50% but could not account for the early stages of the population development. Neverthe- less, some insight into the dynamical processes at an early stage can be gained from the time dependence of the mean productivity

E = Y. x,xjV~j q

given in Fig. 2(b). Initially, the mean productivity decreases but then starts rising until it is saturated at E = R. This pattern can be explained as follows. In the early stages of population development all strategies are present in similar proportions such that loser strategies constitute a major fraction. The rather high mean produc- tivity is provided by the high profit of selfish strategies in encounters with more generous strategies lacking provocability. As time goes on the generous and

Page 8: An ESS-analysis for ensembles of Prisoner's dilemma strategies

1 9 6 B. B O R . ~ T N I K E T A L .

15

IO

s

o

'X~

~X a

:x7

XB

J J

(o~"

X9

Xi2 X~3

8 Xi6 4 o o[X,~

[ Xa

f Xe3

4 [X37

o

f

s l0 i s

t 20

v ba

(b)

o ~ ib l's 20 t

FIG. 2. (a) Time evolution of the population vectors of S C N L strategies in t h e n t = 1 , n 2 = 2 case. The calculation was started from a uniform initial population for all 1024 strings of strategies. The enumeration of strategies is the same as in Table 2. Since the strategies 7, 16, 23 and 32 are synonimous their time evolution is the same. (b) The time dependence of the mean productivity

Page 9: An ESS-analysis for ensembles of Prisoner's dilemma strategies

E S S - A N A L Y S I S F O R P D D I L E M M A 197

non-provocable strategies die out, thus starving to death the selfish strategies, with the consequence of a decrease in mean productivity. This process is to the advantage of SCNL strategies which become competitive, thus raising mean productivity again after reaching a sufficient level in the population. Naturally, less forgiving SCNL strategies are better off in the earlier stages of populat ion development, as can be seen from the upper part of Fig. 2, whereas later on more forgiving strategies rise quicker as evidenced by the lower part of the figure.

5. Populations with Large Numbers of Strategies

As soon as nl and n2 of a strategy exceeds 2 we can see from Table 1 that the total number of possible strategies exceeds the limits where one can classify each particular strategy. In these cases one cannot scan the entire (V) matrix in order to determine the SCNL strategies on the basis of eqns (1) and (2). We will consider two approaches which still allow a search for optimal strategies: a computer simula- tion approach and analysis of the representation of strategies in terms of binary strings.

Compute r simulations of the evolution towards optimal populat ions of strategies should be conducted in analogy to biological evolution of behaviour where genotypes are generated which code for behavioural phenotypes. The latter have to compete successfully for survival with other strategies in the population. We therefore changed an initial populat ion of random strategies according to replicator equations and introduced new mutants at various time intervals. Each newly introduced mutant was given a low initial share in the populat ion, such as to keep the equilibrium between the existing strategies intact. For simplicity the number of strategies in the populat ion was held fixed; for each mutant introduced the strategy with the lowest share in the populat ion was eliminated.

The simulation procedure described above allows to choose the following para- meters and procedures:

frequency of appearance of new mutants; selection of the strategy to be mutated and selection of the algorithm by which the new mutant is generated from the parent

strategy. In our simulations the frequency of mutant creation was rather low, such that

between two subsequent mutant introductions the population approached a steady state. As parental strategy to be mutated, the strategy with the highest component in the populat ion vector was selected but other choices of parental strategies were also investigated. Furthermore, the procedure for constructing the mutant strategies started from a randomly chosen string site of the parental strategy and generated interchanges C ~-~ D. The number of point mutations was varied from single bit to multiple replacements by which entire random strategies could be generated.

The results of these evolutionary simulations are compatible with the previous ones about the time evolution of the populat ion of strategies with nl = 1; n2 = 2, but also with the results of other authors (Axelrod, 1987). Evolutionary simulations within the set of n~ = 1, n 2 - ~ 2 strategies are not of great interest though, because the density of SCNL strategies is rather high so that among, say 100, random

Page 10: An ESS-analysis for ensembles of Prisoner's dilemma strategies

198 a. BOR~TNIK E T A L .

strategies there is likely to be at least one SCNL strategy which will overtake the entire populat ion in relatively short time.

Simulation of evolution makes good sense in the cases of large sets of strategies, such as nl = 3, n2 = 3, with = 1 0 2o strategies. The search for optimal strategies within this pool was performed by Axelrod (1987) in terms of genetic algorithms where, besides point and multiple mutations, also recombinant creation of strategies was introduced. Both Axelrod's and our results favour an outcome with populations of strategies having standard optimal properties as described in section 2.

Our main interest still concerns the question whether the populat ions of strategies resulting in the simulation agree with our analysis in terms of loser-non-loser properties. In particular, we are faced here with the problem to generate a finite number of SCNL strategies for sets of strategies so large that the entire reward matrix (V) cannot be scanned. As an example we will take the n, = 3, n2 = 3 case where a strategy can be represented as a string of C's and D's (or binary digits) of length 67, of which 64 digits represent the moves corresponding to 64 possible histories while the first three define the initialization moves.

First we try to answer the following question: What is the minimal number of C's in the 64 string positions for a strategy to be of SCNL type? Simply one C only is needed in response to the last three co-operative moves of both players. I f the answer to all other histories is D we obtain the retaliatory strategy which must, in order to co-operate with itself, have CCC as initialization moves. To find more forgiving SCNL strategies one has to search for histories under which a strategy can afford to co-operate without losing the game. This is certainly the case for histories where the number of the opponent ' s C-moves exceeds own co-operations by at least 1. Extending this analysis one realizes that this holds also true for all histories in which the last opponent ' s move was co-operation. The combination of both requirements creates 37 of 64 histories where co-operation is tolerated (without counting the C C C / C C C history where co-operat ion is unconditionally required in order to guarantee the SCNL property). This criterion gives us a lower estimate for the number of SCNL strategies, namely 237. There is, unfortunately, no simple algorithm to detect all synonyms and for this reason the estimated lower bound is not quite certain for the number of distinct SCNL strategies. Table 3 displays some simple rules about the structure of SCNL strategies which can be predicted on the basis of these considerations. Following the procedures outlined above one can construct at least part of the SCNL strategies for any pair of n, , n2 values which may be of practical importance.

These findings about SCNL strategies lend more meaning to computer simulations in searching for pools of optimal strategies. Provided a fraction of SCNL strategies is known, then their stability and yield provides a yardstick to judge the efficacy of the mutation-selection scheme applied. In our simulation of the n~ = n2 = 3 case, we noted two characteristic features of the evolution:

the simulation rarely ends completely within the subset of SCNL strategies given in Table 3;

if a SCNL strategy appears in the populat ion it will most likely spread and overrule less co-operative strategies.

Page 11: An ESS-analysis for ensembles of Prisoner's dilemma strategies

E S S - A N A L Y S I S F O R PD D I L E M M A

TABLE 3

By means of the data presented in this table it is possible to construct 237 codes for S C N L strategies belonging to n~ = 3, n2 = 3 set. Since there exist 23+3=64 different histories one needs to define 64 answers. Following the argu- ments given in the text it turns out that the S C N L property will be realized if:

(i) the answer to C C C / CCC would be C. (ii) the answer to 37 histories defined in

columns 2 and 3 would be either C or D,

(iii) the answers to all the remaining 26 histories would be D,

(iv) the initialization moves, which are also part o f the code would be CCC.

The number of possible ways to fulfil condition (ii) is 237 and this is the number of different strategies. Tit for Tat and Retaliator strategies

are comprised in this set

199

History Ones own last Opponent ' s no. three moves last three moves

1 DDD DCD 2 DDD CDD 3 DDD CCD 4 DDC CCD 5 DCD CCD 6 CDD CCD

7-37 XXX XXC

X = C or D.

This can be understood qualitatively in terms of the dynamic properties of replicator equations and the structure of PD sets of strategies. The dependence of the populat ion dynamics on the algorithm which creates new strategies, however, is a major unresolved problem. One promising route of approach will be, in our opinion, to study its effect on SCNL sets.

6. Discussion

The analysis of PD populat ions in terms of the concept of SCNL strategies is easy and straightforward as long as the total number of strategies is of moderate size. It allows concrete predictions for the outcome of the game. Strategies remember- ing a larger number of own and, particularly, opponent ' s moves are more difficult to assess under such viewpoints. The rules given in section 5 are meant as a first at tempt at a problem of many practical implications.

Page 12: An ESS-analysis for ensembles of Prisoner's dilemma strategies

200 B. BOR~TN1K ET AL.

Recently it has become more and more apparent that asking for the one optimal strategy, like Tit for Tat was considered to be, does not make sense. Rather, it is the dynamics and stability of self-co-operative subpopulations which must be analysed under ESS aspects but also in its dependence on game rules (rewards, stochasticity, population size, duration and elimination/proliferation). The latter problem is still widely unresolved within the realm of the complex strategies. Simulations, like the ones reported here in section 5, allow to discern some simple features but the great number of parameters involved makes it of limited value if one does not find a description in terms of time- or ensemble-smoothing concepts (of which SCNL is just one).

Finally, we would like to point to the enormous generality inherent in a scheme where populations of strategies, represented by binary strings, are set against each other to play by single- or more-digit comparisons valued according to a reward matrix.

REFERENCES

AXELROD, R. & HAMILTON, W. D. (1981). The evolution of co-operation. Science 211, 1390-1396. AXELROD, R. (1984). The Evolution of Co-operation. New York: Basic Books. AXELROD, R. (1987). The evolution of strategies in the iterated Prisoner's Dilemma. In: Genetic

Algorithms and Simulated Annealing (Davis, L., ed.) pp. 32-41. London: Pitman. AXELROD, R. & Dion, D. (1988). The further evolution of co-operation. Science 242, 1385-1390. BOR~TNIK, B., PUMPERNIK, D. & HOFACKER, G. L. (1987). Point mutations as an optimal search

process in biological evolution. J. theor. Biol. 125, 249-268. BOYD, R. & LDRBERBAUM, J. P. (1987). No pure strategy is evolutionarily stable in the repeated

Prisoner's Dilemma Game. Nature, Lond. 327, 58-59. KURKA, P. (1986). Game dynamics and evolutionary transitions. Biol. Cybern. 54, 85-90. MAYNARD SMITH, J. (1982). Evolution and the Theory of Games. Cambridge: University Press. MAYNARD SMITH, J. (1984). Game theory and the evolution of behaviour. Behav. Brain Sci. 7, 95-125. NOWAK, M. (1989). Stochastic Strategies in the Prisoner's Dilemma. (preprint). NOWAK, M. & SIGMUND, K. (1989a). Oscillations in the evolution of reciprocity. J. theor. Biol., in press. NOWAK, M. & SIGMUND, K. (1989b). Game dynamical aspects of the Prisoner's Dilemma. J. appl.

Math. Comp., in press. SCHUSTER, P. & SIGMUND, K. (1985). Towards a dynamics of social behaviour: Strategic and genetic

models for the evolution of animal conflicts. J. Social. Biol. Struct. 8, 255-277. TAYLOR, P. & JONKER, L. (1979). Evolutionarily stable strategies and game dynamics. Math. Biosc. 40,

145-156.