Top Banner
The complexity of solving reachability games using value and strategy iteration Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen Aarhus University Denmark CSR 2011, 14’th June
118

Csr2011 june14 16_30_ibsen-jensen

Oct 21, 2014

Download

Documents

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Csr2011 june14 16_30_ibsen-jensen

The complexity of solving reachability games using value andstrategy iteration

Kristoffer Arnsfelt HansenRasmus Ibsen-Jensen Peter Bro Miltersen

Aarhus UniversityDenmarkCSR 2011, 14’th June

Page 2: Csr2011 june14 16_30_ibsen-jensen

Overview

What are concurrent reachabillity games? Two standard algorithms solving concurrent

reachabillity games: The value iteration algorithm The strategy iteration algorithm

Examplify important facts for the proof of the time lower bound for both algorithms

1/42

Page 3: Csr2011 june14 16_30_ibsen-jensen

Matrix games von Neumann 1928

0 -1 1

1 0 -1

-1 1 0

2/42

Page 4: Csr2011 june14 16_30_ibsen-jensen

Matrix games von Neumann 1928

0 -1 1

1 0 -1

-1 1 0

2/42

Page 5: Csr2011 june14 16_30_ibsen-jensen

0 -1 1

1 0 -1

-1 1 0

Each entry can be either 0, 1 or a pointer

vs.Dante* Lucifer*

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

0 1

* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42

Page 6: Csr2011 june14 16_30_ibsen-jensen

vs.Dante* Lucifer*

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42

Page 7: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

3/42

Page 8: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

0

0 0

0

0 0

0

0 0

0

0 0

3/42

Page 9: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1

0 1

0 0 1

0

0 0

0

0 0

0

0 0

3/42

Page 10: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

3/42

Page 11: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S:

S S

0 S

0 0

S S

0 S

0 0

3/42

Page 12: Csr2011 june14 16_30_ibsen-jensen

Histories

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

4/42

Page 13: Csr2011 june14 16_30_ibsen-jensen

Histories and strategies

History: Sequence of positions and choices for each player in each position.

Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history

S1: Set of strategies for Dante

S2: Set of strategies for Lucifer

H1/H2: Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)

5/42

Page 14: Csr2011 june14 16_30_ibsen-jensen

Payoffs

v(i,σ,π): The probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π.

6/42

Page 15: Csr2011 june14 16_30_ibsen-jensen

Everett 1957

iviviv

),,( supinf),,( infsup :i1221 SSSS

Value of i

iH

viviv

),,( supinf),,( infsup :i1221 SSH

7/42

Page 16: Csr2011 june14 16_30_ibsen-jensen

Algorithmic problems

Quantitatively solving a game: Given the game, compute the value of all positions.

Strategically solving a game: Given the game and ε>0, compute σ such that for all π and i: v(i,σ,π)>vi-ε.

8/42

Page 17: Csr2011 june14 16_30_ibsen-jensen

Value iteration Shapley 1953

9/42

Value iteration computes the value of each position in Gt in iteration t, on the basis of the value of each position in Gt-1.

Gt: A modified version of G, where Dante loses after t moves.

Page 18: Csr2011 june14 16_30_ibsen-jensen

Our results: Lower bound for value iteration There exists a concurrent reachabillity game

G, with N matrices and m rows and columns in each matrix, so that:

val(G)=1 and val(Gt) = 3m-N/2, for t=2mN/2

10/42

Page 19: Csr2011 june14 16_30_ibsen-jensen

Our results: Upper bound for value iteration For any concurrent reachabillity game G val(G)-val(Gt)<ε for t=(1/ε)mO(N)

11/42

Page 20: Csr2011 june14 16_30_ibsen-jensen

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

12/42

Value iteration example – G0

Page 21: Csr2011 june14 16_30_ibsen-jensen

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

0

0 0

0

12/42

Value iteration example – G0

Page 22: Csr2011 june14 16_30_ibsen-jensen

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

0

0

0

0

1 S S

0 1 S

0 0 1

13/42

Value iteration example – G1

Page 23: Csr2011 june14 16_30_ibsen-jensen

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0 0

00

0 0

13/42

Value iteration example – G1

Page 24: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1 0

0 0

0

0 0

01

1

1

1

13/42

Page 25: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1

0 1

0 0 1

0

0

0

0

13/42

Page 26: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1

0 1

0 0 1

0

0 0000

0

13/42

Page 27: Csr2011 june14 16_30_ibsen-jensen

0

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1 0 0

0 1 0

0 0 1

0.33333/

0

0 0

13/42

Page 28: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0 0

0.33333/0 00

0 0

13/42

Page 29: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1 0

0 0

0 0000

00000.33333/

0 0

13/42

Page 30: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0 0 0

0 0 0

0 0 0

0

0.33333/0

00/

0

13/42

Page 31: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0

0

0.33333/0

0/ 0/

0/

13/42

Page 32: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G2

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0

0

0.33333/0.33333

0.11111/ 0/

0/

14/42

Page 33: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G3

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0

0

0.33333/0.33333

0.11111/ 0/

0.03704/

15/42

Page 34: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G4

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0.03704

0

0.33333/0.33333

0.11111/ 0.01235/

0.03704/

16/42

Page 35: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G5

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0.03704

0.01235

0.33748/0.33333

0.11533/ 0.01754/

0.04147/

17/42

Page 36: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G6

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11533

0.04147

0.01754

0.33925/0.33748

0.11855/ 0.02172/

0.04493/

18/42

Page 37: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G7

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11855

0.04493

0.02172

0.34068/0.33925

0.12064/ 0.02519/

0.04772/

19/42

Page 38: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G8

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.12064

0.04772

0.02519

0.34187/0.34068

0.12388/ 0.02815/

0.04991/

20/42

Page 39: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G9

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.12388

0.04991

0.02815

0.34378/0.34187

0.12517/ 0.03070/

0.05129/

21/42

Page 40: Csr2011 june14 16_30_ibsen-jensen

Strategy iterationChatterjee, de Alfaro, Henzinger ’06

22/42

Was conjectured to be fast

Page 41: Csr2011 june14 16_30_ibsen-jensen

Our results: Upper bound for strategy iteration An ε-optimal strategy is computed after

t=(1/ε)mO(N) iterations of strategy iteration

This follows from the corresponding results for value iteration

23/42

Page 42: Csr2011 june14 16_30_ibsen-jensen

Our results: Lower bound for strategy iteration There exists a concurrent reachabillity game

G, with N matrices, for large N, and m rows and columns in each matrix, so that:

val(G)=1 and The strategy optained by strategy iteration

guarantees winning probability at most 4m-N/2, for t= 2mN/4

24/42

Strategy iteration, m=2

N Number of iterations neededto get over 1/2

7 18446744073709551617

8 340282366920938463463374607431768211457

9 115792089237316195423570985008687907853269984665640564039457584007913129639937

Page 43: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Before iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1. Start strategy for Dante:= Uniform

25/42

Page 44: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Before iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

SS S

0 S

0 0

1. Start strategy for Dante:= Uniform

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

25/42

Page 45: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 46: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 47: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1

1

1

0

0 0

0

0 0

0

0 0

0

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 48: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

0 S

S S

0 S

0 0

S S

0 S

0 0

1

0 0

S S

S S

0 S

0 0

0.66667

The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 49: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

0

1

0.66667

The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.

0.66667

0.66667

0.66667

0.66667

0.66667

0.66667

0.66667

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

26/42

Page 50: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 51: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 52: Csr2011 june14 16_30_ibsen-jensen

0.11111

0.03704

0.01235

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

0.01235

0

0 0

S

1

1

1

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.012350.012350.01235

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.33748

26/42

Page 53: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

S

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

Page 54: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

26/42

Page 55: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Page 56: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Page 57: Csr2011 june14 16_30_ibsen-jensen

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Strategy iteration: Iteration 2

Page 58: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Page 59: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

27/42

Page 60: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 61: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 62: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 63: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 64: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 65: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

28/42

Page 66: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 67: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 68: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 69: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 70: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12360

0.05185

0.03154

0.34241

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 71: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

29/42

Page 72: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

29/42

Page 73: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 74: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 75: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 76: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 77: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 78: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 79: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 80: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 81: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 82: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 83: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 84: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 85: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 86: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 87: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 88: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 89: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 90: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 91: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 92: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 93: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 94: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 95: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 96: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 97: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 98: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 99: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 100: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34923

0.33309

0.31768

0.38176

0.33109

0.28715

0.48241

0.31366

0.20393

0.74985

0.19791

0.05224

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 101: Csr2011 june14 16_30_ibsen-jensen

Generalized Purgatory P(N,m) Lucifer repeatedly hides a number between 1

and m. Dante must try to guess the number. If he guesses correctly N times in a row, he

goes to heaven. If he ever guesses incorrectly overshooting

Lucifer’s number, he goes to hell.

35/42

Page 102: Csr2011 june14 16_30_ibsen-jensen

Interesting fact

The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.

36/42

Page 103: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

1

0 1

0

0

1

0 1

1

0 1

Strategy iteration on 3 matrices

37/42

Page 104: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

1

0 1

0

0

1

0 1

1

0 1

t:=0

Strategy iteration on 3 matrices

37/42

Page 105: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=00

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

37/42

Page 106: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

38/42

Page 107: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

38/42

Page 108: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.25

0.125

38/42

Page 109: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=10.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

38/42

Page 110: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

39/42

Page 111: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

39/42

Page 112: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.66667

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.66667

0.53333

0.30476

0.20317

1

0 1

0

0

1

0 1

39/42

Page 113: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

39/42

Page 114: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

40/42

Page 115: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

40/42

Page 116: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.75000

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.75000

0.55654

0.34374

0.25781

1

0 1

0

0

1

0 1

40/42

Page 117: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.75000

0.80000

0.20000

0.80000

0.20000

0.65072

0.34928

0.57399

0.42601

0.75000

0.55654

0.34374

0.25781

1

0 1

0

0

1

0 1

41/42

Page 118: Csr2011 june14 16_30_ibsen-jensen

The end

Open problems: Find a fast algorithm for the problem

There exists a PSPACE algorithm for the problem, but it is not fast.

Thanks for listening

42/42