Top Banner
Co-evolution time, changing environments agents
49

Co-evolution time, changing environments agents. Static search space solutions are determined by the optimization process only, by doing variation and.

Dec 29, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Co-evolution

time,

changing environments

agents

Page 2: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Static search space

solutions are determined by the optimization process only, by doing variation and selection

evaluation determines fitness of each solution

BUT…

Page 3: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Dynamic search space

what happens if state of environment changes between actions

of the optimization process

AND / OR other processes are concurrently trying to

change the state also

Page 4: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Agents to act in changing environments the search is now for an agent strategy to

act successfully in a changing environmentexample:

agent is defined with an algorithm based on a set of parameters

‘learning’ means searching for parameter settings to make the algorithm successful in the environment

agent algorithm takes current state of environment as input and outputs an action to change the environment

Page 5: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Optimizing an agent

procedural optimization recall 7.1.3 finite state machines

(recognize a string)

and

7.1.4 symbolic expressions

(control operation of a cart)

which are functions

Page 6: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

The environment of activity

parameter space, e.g. game nodes are game states edges are legal moves that transform one

state to another example: tic-tac-toe, rock-scissors-paper

Page 7: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Agent in the environment

procedure to input current state and determine an action to alter the state

evaluated according to its success at achieving goals in the environment external goal - environment in desirable state

(win game) internal goal - maintain internal state

(survive)

Page 8: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Models of agent implementation

1. look-up table: state / action2. deterministic algorithm

hard-coded intelligence

ideal if perfect strategy is known,otherwise, parameterized algorithm

‘tune’ the parameters to optimize performance

3. neural network

Page 9: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Optimizing agent procedure

initialize agent parameters

evaluate agent success in environment

repeat

modify parameters

evaluate modified agent success

if improved success, retain modified parameters

Page 10: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Environment features

1. other agents?

2. state changes independent of agent(s)?

3. randomness?

4. discrete or continuous?

5. sequential or parallel?

Page 11: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Environment featuresgame examples

1. other agents?

solitaire, two-person, multi-player

Page 12: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Environment featuresgame examples

2. state changes independent of agent(s)?

YES

simulators (weather, resources)

NO

board games

Page 13: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Environment featuresgame examples

3. randomness?

NO

chess, sudoku(!)

YES

dice, card games

Page 14: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Environment featuresgame examples

4. discrete or continuous?video gamesboard games

Page 15: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Environment featuresgame examples

5. sequential or parallel?turn-taking, simultaneous

which sports?marathon , javelin, hockey, tennis

Page 16: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Models of agent optimization

evolving agent vs environment evolving agent vs skilled player co-evolving agents

competing as equals e.g., playing a symmetric game

competing as non-equals (predator - prey) e.g., playing an asymmetric game

co-operative

Page 17: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Example: Game Theoryinvented by John von Neumann models concurrent* interaction between

agents each player has a set of choices players concurrently make a choice each player receives a payoff based on the

outcome of play: the vector of choices e.g. paper-scissors-rock

2 players, with 3 choices each 3 outcomes: win, lose, draw

*there are sequential Game Theory models also

Page 18: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Payoff matrix

tabulates payoffs for all outcomes e.g. paper-scissors-rock

P1\P2 paper scissors rock

paper (draw,draw) (lose,win) (win,lose)

scissors (win,lose) (draw,draw) (lose,win)

rock (lose,win) (win,lose) (draw,draw)

Page 19: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Two-player two-choiceordinal games (2 x 2 games) Four outcomes Each player has four ordered payoffs

example game:

P1\P2 L R

U 1,2 3,1

D 2,4 4,3

Page 20: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

2 x 2 games

144 distinct games based on relation of payoffs to outcomes: (4! x 4!) / (2 x 2)

model many real world encounters environment for studying agent strategies

how do players decide what choice to make?

P1\P2 L R

U 1,2 3,1

D 2,4 4,3

Page 21: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

2 x 2 games

player strategy: minimax - minimize losses

pick row / column with largest minimum assumes nothing about other player

P1\P2 L R

U 1,2 3,1

D 2,4 4,3

Page 22: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

2 x 2 games

player strategies: dominant strategy

preferred choice, whatever opponent does does not determine a choice in all games

P1\P2 L R

U 1,3 4,2

D 3,4 2,1

P1\P2 L R

U 1,2 4,3

D 3,4 2,1

P1\P2 L R

U 2,1 1,2

D 4,3 3,4

Page 23: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

2 x 2 games

some games are dilemmas

“prisoner’s dilemma”

C = ‘cooperate’, D = ‘defect’

P1\P2 C D

C 3,3 1,4

D 4,1 2,2

Page 24: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

2 x 2 games - playing strategies

what happens if players use different strategies?

how do/should players decide in difficult games?

modeling players and evaluating in games

P1\P2 C D

C 3,3 1,4

D 4,1 2,2

Page 25: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

iterated games

can players analyze games after playing and determine a better strategy for next encounter?

iterated play

e.g., prisoner’s dilemma:

can players learn to trust each other and get better payoffs?

P1\P2 C D

C 3,3 1,4

D 4,1 2,2

Page 26: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

iterated prisoner’s dilemma

player strategy for choice as a function of previous choices and outcomes

P1\P2 C D

C 3,3 1,4

D 4,1 2,2

game 1 2 … i-1 i i+1 …

P1 C C … D D ?

P2 D C … C D ?

P1: choicei+1,P1 = f( choicei,P1, choicei,P2)

Page 27: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

iterated prisoner’s dilemma

evaluating strategies based on payoffs in iterated play

P1\P2 C D

C 3,3 1,4

D 4,1 2,2

P1: choicen+1,P1 = f( choicen,P1, choicen,P2)

game 1 2 … i-1 i i+1 … n-1 n Σ

1 3 … 4 2 3 … 4 3 298

P1 C C … D D C … D C

P2 D C … C D C … C C

4 3 … 1 2 3 … 1 3 343

Page 28: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

tournament play

set of K strategies for a game,e.g., iterated prisoner’s dilemma

evaluate strategies by performance against all other strategies

round-robin tournament

Page 29: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

example of tournament play

P1 P2 … P(K-1) PK ΣP1 308 298 … 340 278 7140

P2 343 302 … 297 254 6989

… … … … … … …

P(K-1) 280 288 … 301 289 6860

PK 312 322 … 297 277 7045

Page 30: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

tournament play

P1 P2 … P(K-1) PK Σ

P1 308 298 … 340 278 7140

P2 343 302 … 297 254 6989

… … … … … … …

P(K-1) 280 288 … 301 289 6860

PK 312 322 … 297 277 7045

order players by total payoff

change ‘weights’ (proportion of population) based on success

repeat tournament until weights stabilize

Page 31: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

tournament survivors

final weights of players define who is successful

typical distribution:

1 dominant player

a few minor players

most players eliminated

Page 32: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

variations on fitness evaluation

spatially distributed agents

random players on an n x n grid

each player plays 4 iterated games against neighbours and computes total payoff

each player compares total payoff with 4 neighbours; replaced by neighbour with best payoff

Page 33: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

spatial distribution

Page 34: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

defining agents, e.g., prisoner’s dilemma

algorithms from experts Axelrod 1978

exhaustive variations on a model e.g. react to previous choices of both players:

need choices for:first moveafter (C,C) // (self, opponent)after (C,D)after (D,C)after (D,D)

5 bit representation of strategy32 possible strategies, 32 players in

tournament

tit-for-tatfirst: C(C,C) C(C,D) D(D,C) C(D,D) D

tit-for-tatfirst: C(C,C) C(C,D) D(D,C) C(D,D) D

Page 35: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

tit-for-tat interpreting the strategy

start by trusting repeat opponent’s last behaviour

cooperate after cooperation punish defection

don’t hold a grudge

BUT… Axelrod’s payoffsbecame the standard

tit-for-tatfirst: C(C,C) C(C,D) D(D,C) C(D,D) D

tit-for-tatfirst: C(C,C) C(C,D) D(D,C) C(D,D) D

P1\P2 C D

C 3,3 0,5

D 5,0 1,1

Page 36: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

problems with the research

if payoffs are changed but the prisoner’s dilemma pattern remains,

tournaments can produce different winners

--> relative payoff does matter

P1\P2 C D

C 3,3 1,4

D 4,1 2,2

P1\P2 C D

C 3,3 0,5

D 5,0 1,1

game 1 2 … i-1 i i+1 … n-1 n Σ

1 3 … 4 2 3 … 4 3 298

P1 C C … D D C … D C

P2 D C … C D C … C C

4 3 … 1 2 3 … 1 3 343

Page 37: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

agent is function of more than previous strategy

choiceP1,n influenced by variables previous choices - how many? 2, 4, 6, ..? payoffs - of both players 4, 8 ? individual’s goals 1

same, different measures? maximum payoff maximum difference altruism equality social utility

other factors? ?

Page 38: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

agent research (Robinson & Goforth) choiceP1,n influenced by

previous choices - how many payoffs - of both playersrepeated tournaments by sampling the space of payoffs- demonstrated that payoffs matterexample of discovery: extra level of trust beyond tit-for-tat

effect of players’ goals on outcomes and payoffs and classic strategies 3 x 3 games ~3 billion games -> grid computing

Page 39: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

general model environment is payoff space where encounters

take place multiple agents interact in the space

repeated encounters where agents get payoffs agents ‘display’ their own strategies and agents ‘learn’ the strategies of others

goals of research reproduce real-world behaviour by simulating strategies develop effective strategies

Page 40: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Developing effective agent strategies

co-evolution bootstrapping symmetric competition

games, symmetric game theory asymmetric competition

asymmetric game theory cooperation

common goal, social goal

Page 41: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Coevolutionin symmetric competition

Fogel’s checkers player evolving player strategies search space of strategies

fitness is determined by game play variation by random change to parameters selection by ranking in play against others initial strategies randomly generated

Page 42: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Checkers game representation

32 element vector for the playable squares on the board (one colour)

Domain of each element: {-K,-1,0,1,K} e.g., start state of game:

{-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1,1,1}

Page 43: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Game play algorithm

board.initialize()

while (not board.gameOver())

player1.makeMove(board)

if (board.gameOver()) break

player2.makeMove(board)

end while

board.assignPoints(player1,player2)

Page 44: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

makeMove algorithm

makeMove(board) {

boardList = board.getNextBoards(this)

bestBoardVal = -1; bestBoard = null

forAll (tempBoard in boardList)

boardVal = evaluate(tempBoard)

if (boardVal>bestBoardVal)

bestBoardVal = boardVal

bestBoard = tempBoard

board = bestBoard

Page 45: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

evaluate algorithm

evaluate(board) { function of 32 elements of board parameters are variables of search space, including K output is in range [-1,1]

used for comparing boards}

what to put in the evaluation?

Page 46: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Chellapilla and Fogel’s evaluation

no knowledge of chess rules board generates legal moves and updates

state no ‘look-ahead’ to future moves implicitly a pattern recognition system

BUT only relational patterns predefined - see diagram

(neural network)

Page 47: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Neural network

input layer output layerhidden layer(s)

i1

h1 = w11.i1+w12.i2+w13.i3

o1=w31.g1+w32.g2+w33.g3g1

Page 48: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Neural network gent 32 inputs - board state 1st hidden layer: 91 nodes

3x3 squares (36), 4x4 (25), 5x5 (16), 6x6 (9), 7x7 (4), 8x8 (1)

2nd hidden layer: 40 nodes 3rd hidden layer: 10 nodes 1 output - evaluation of board [-1, 1] parameters - weights on neural network

plus K (value of king)

Page 49: Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Genetic algorithm population: 15 player agents one child agent from each parent: variation

by random changes in K and weights fitness based on score in tournament

among 30 agents (win: 1, loss: -2, draw: 0) agents with best 15 scores selected for

next generationafter 840 generations, effective checkers

player in competition against humans