Top Banner
Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. http://martin.zinkevich.org/lemonade
36
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Evaluation Through Conflict

Martin ZinkevichYahoo! Inc.

http://martin.zinkevich.org/lemonade

Page 2: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Who was I

• Worked with U Alberta Computer Poker Research Group– Designed Counterfactual Regret Algorithm– Theory behind DIVAT

• Worked on AAAI Computer Poker Competition– 2006 as lead programmer, 2007 as chair

• Work used in Man Vs Machine

Page 3: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Who am I

• Run the Lemonade Stand Game Competition• Work with Yahoo Anti-Abuse Team

Page 4: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

AAAI Computer Poker Competition

• 5 years running• Now the ANNUAL Computer Poker

Competition• Latest-11 universities et al

Page 5: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Competitions:Science vs Entertainment

Page 6: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

AAAI Computer Poker Competition

May The Best Program Win!And Win Again IF WE PLAYED AGAIN!

Page 7: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Head to Head

VS

for 1000 hands

Page 8: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Head to Head

VS

for 1000 hands

Page 9: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

All Combinations

7,-7 10,-10

-7,7 5,-5

-10,10 -5,5

Page 10: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

OK, But Who Won?

• Online: Maximize total winnings• Equilibrium: Maximize number of people I can

win money from (or don’t lose against)

Page 11: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Why a New Competition?

ComputingEquilibria

✓Choosing Equilibria

?

Page 12: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Bach or Stravinsky

2,1 0,0

0,0 1,2

Page 13: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Big Question: How Do (or Would) People Get to Nash Equilibria?

Page 14: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Solvable Games

$

Page 15: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Unsolvable Games

$ ?

Page 16: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

An Old Idea

• Think about learning in the presence of other intelligent agents.

• Prove cool stuff about your learning algorithm given:– constraints about the adversary– constraints about the game

Page 17: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Solving the Unsolvable

• In current competitions, people are often applying techniques that are effective in solvable games, even when the game is not solvable.

• In what competitions is it useless to approximate the game as solvable?

Page 18: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Axelrod’s Iterated Prisoner’s Dilemma

• A competition between many competitors.• One entry: tit-for-tat (Anatol Rapaport)

– Nice (initially)– Retaliating– Forgiving– Non-envious

• Learned that cooperation has value, but:– Cooperate with whom?– How do we cooperate?

Page 19: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

The Lemonade Stand Game

Page 20: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

The Lemonade Stand Game

Page 21: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

The Lemonade Stand Game

Page 22: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

What Is The Lemonade Stand Game?

• Every round for 100 rounds:– each person selects an action privately– then, the actions are revealed

• The score of a player is the distance clockwise to the next player plus the distance counterclockwise.

Page 23: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Key Observations• A constant-sum game between 3 players.

– For every gain, someone has to lose.• Possibilities For Cooperation

– Opposite sides of the circle, “sandwiching”• Not a “Solvable Game” (Nash, 1951)

– Playing equilibrium strategies is not advisable• Easy To Set “Table Image”

– The constant strategy often evokes cooperative behavior• Existing Techniques Fail

– Experts algorithms lose to constant strategy

Strategy #1: Play Constant

Strategy #2: Play Opposite

Strategy #3: Sandwich

Page 24: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Competition Structure

• Every set of three players played 100 rounds 180 times (1.5 million rounds total)

• Highest Total Score Wins• Mean, Standard Error can be calculated

Page 25: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Competitors

• 28 players, 9 teams– University of Southampton/Imperial College London

(Soton)– Yahoo! Inc. (Pujara)– Rutgers University (RL3)– Brown University (Brown)– Carnegie Mellon (2 teams-Waugh, ACTR)– University of Michigan (FrozenPontiac)– Princeton University (Schapire)– (Greg Kuhlmann)

Page 26: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Competition Results

0123456789

10

Soton

PujaraRL3

Waugh

ACTR

Schapire

Brown

Froze

nPontiac

Kuhlmann

Competitor

Scor

e Pe

r Rou

nd

Page 27: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Results

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Soton

PujaraRL3

Waugh

ACTR

Schapire

Brown

Froze

nPontiac

Kuhlmann

Competitor

Scor

e Pe

r Rou

nd-8

Modified Constant Uniformly Random

Page 28: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Restricting to Top 6

-1.5

-1

-0.5

0

0.5

1

Pujara Soton RL3 Waugh ACTR Schapire

Competitor

Scor

e Pe

r Rou

nd-8

Page 29: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Restricting to Top 4

-1.5

-1

-0.5

0

0.5

1

Pujara RL3 Soton Waugh

Page 30: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Teach Simply!EQUILIBRIUM

FREE

=

Page 31: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Learn

=

=

= ?

Page 32: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Learn

=

=

10 7

Page 33: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

The High Level

• Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.

Page 34: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Lofty Goals

• Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.

• behavior: a fully specified strategy.• used: actually leveraged

Page 35: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Practical Concessions

• Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.

• Not any intelligent agent• Not any time (people change)• Not any task (context matters)

Page 36: Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. .

Thank You

http://martin.zinkevich.org/lemonade