Page 1
CS188:ArtificialIntelligenceAdversarialSearch
Prof.ScottNiekumTheUniversityofTexasatAustin
[TheseslidesarebasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]
Page 2
GamePlayingState-of-the-Art
▪ Checkers:1950:Firstcomputerplayer.1994:Firstcomputerchampion:Chinookended40-year-reignofhumanchampionMarionTinsleyusingcomplete8-pieceendgame.2007:Checkerssolved!
▪ Chess:1997:DeepBluedefeatshumanchampionGaryKasparovinasix-gamematch.DeepBlueexamined200Mpositionspersecond,usedverysophisticatedevaluationandundisclosedmethodsforextendingsomelinesofsearchupto40ply.Currentprogramsareevenbetter,iflesshistoric.
▪ Go:2016:AlphaGo,createdbyGoogleDeepMindbeat9-danprofessionalGoplayerLeeSedol4-1onafullsized19x19board.AlphaGocombinedMonteCarloTreeSearchwithdeepneuralnetworks,improvingviareinforcementlearningthroughself-play.
▪ OpenAIFive(DOTA):gettingclosetoworld-class
Page 3
Howtoconsiderbehaviorofghosts?
Page 5
▪ Manydifferentkindsofgames!
▪ Axes:▪ Deterministicorstochastic?▪ One,two,ormoreplayers?▪ Zerosum?▪ Perfectinformation(canyouseethestate)?
▪ Wantalgorithmsforcalculatingastrategy(policy)whichrecommendsamovefromeachstate
TypesofGames
Page 6
DeterministicGames
▪ Manypossibleformalizations,oneis:▪ States:S(startats0)▪ Players:P={1...N}(usuallytaketurns)▪ Actions:A(maydependonplayer/state)▪ TransitionFunction:SxA→ S▪ TerminalTest:S→ {t,f}▪ TerminalUtilities:SxP → R
▪ Solutionforaplayerisapolicy:S→ A
Page 7
Zero-SumGames
▪ Zero-SumGames▪ Agentshaveoppositeutilities(valuesonoutcomes)▪ Letsusthinkofasinglevaluethatonemaximizesand
theotherminimizes▪ Adversarial,purecompetition
▪ GeneralGames▪ Agentshaveindependentutilities(valueson
outcomes)▪ Cooperation,indifference,competition,and
moreareallpossible▪ Morelateronnon-zero-sumgames
Page 9
Single-AgentTrees
8
2 0 2 6 4 6… …
Page 10
ValueofaState
Non-TerminalStates:
8
2 0 2 6 4 6… … TerminalStates:
Valueofastate:Thebestachievableoutcome(utility)fromthatstate
Page 11
AdversarialGameTrees
-20 -8 -18 -5 -10 +4… … -20 +8
Page 12
MinimaxValues
+8-10-5-8
StatesUnderAgent’sControl:
TerminalStates:
StatesUnderOpponent’sControl:
Page 13
Tic-Tac-ToeGameTree
Page 14
AdversarialSearch(Minimax)
▪ Deterministic,zero-sumgames:
▪ Tic-tac-toe,chess,checkers▪ Oneplayermaximizesresult▪ Theotherminimizesresult
▪ Minimaxsearch:
▪ Astate-spacesearchtree▪ Playersalternateturns▪ Computeeachnode’sminimaxvalue:thebestachievableutilityagainstarational(optimal)adversary
8 2 5 6
max
min2 5
5
Terminalvalues:partofthegame
Minimaxvalues:computedrecursively
Page 15
defmax-value(state):initializev=-∞ foreachsuccessorofstate:
v=max(v,min-value(successor))returnv
MinimaxImplementation
defmin-value(state):initializev=+∞ foreachsuccessorofstate:
v=min(v,max-value(successor))returnv
Page 16
MinimaxImplementation(Dispatch)
defvalue(state):ifthestateisaterminalstate:returnthestate’sutilityifthenextagentisMAX:returnmax-value(state)ifthenextagentisMIN:returnmin-value(state)
defmin-value(state):initializev=+∞ foreachsuccessorofstate:
v=min(v,value(successor))returnv
defmax-value(state):initializev=-∞ foreachsuccessorofstate:
v=max(v,value(successor))returnv
Page 17
MinimaxExample
12 8 5 23 2 144 6
Page 18
MinimaxEfficiency
▪ Howefficientisminimax?▪ Justlike(exhaustive)DFS▪ Time:O(bm)▪ Space:O(bm)
▪ Example:Forchess,b≈ 35,m≈ 100▪ Exactsolutioniscompletelyinfeasible▪ But,doweneedtoexplorethewhole
tree?
Page 19
MinimaxProperties
Optimalagainstaperfectplayer.Otherwise?
10 10 9 100
max
min
Page 20
MinimaxvsExpectimax(Min)
End your misery!
Page 21
MinimaxvsExpectimax(Exp)
Hold on to hope, Pacman!
Page 23
ResourceLimits
▪ Problem:Inrealisticgames,cannotsearchtoleaves!
▪ Solution:Depth-limitedsearch▪ Instead,searchonlytoalimiteddepthinthetree▪ Replaceterminalutilitieswithanevaluationfunctionfornon-terminal
positions
▪ Example:▪ Supposewehave100seconds,canexplore10Knodes/sec▪ Socancheck1Mnodespermove▪ α-β reachesaboutdepth8–decentchessprogram
▪ Guaranteeofoptimalplayisgone
▪ MorepliesmakesaBIGdifference
▪ Useiterativedeepeningforananytimealgorithm? ? ? ?
-1 -2 4 9
4
min
max
-2 4
Page 24
DepthMatters
▪ Evaluationfunctionsarealwaysimperfect
▪ Thedeeperinthetreetheevaluationfunctionisburied,thelessthequalityoftheevaluationfunctionmatters
▪ Animportantexampleofthetradeoffbetweencomplexityoffeaturesandcomplexityofcomputation
Page 25
VideoofDemoLimitedDepth(2)
Page 26
VideoofDemoLimitedDepth(10)
Page 27
EvaluationFunctions
Page 28
EvaluationFunctions
▪ Evaluationfunctionsscorenon-terminalsindepth-limitedsearch
▪ Idealfunction:returnstheactualminimaxvalueoftheposition▪ Inpractice:typicallyweightedlinearsumoffeatures:
▪ e.g.f1(s)=(numwhitequeens–numblackqueens),etc.
Page 29
Thrashing(d=2)
Evaluation function: Score
Page 30
WhyPacmanStarves
▪ Adangerofreplanningagents!▪ Heknowshisscorewillgoupbyeatingthedotnow(left,right)▪ Heknowshisscorewillgoupjustasmuchbyeatingthedotlater(right,right)▪ Therearenopoint-scoringopportunitiesaftereatingthedot(withinthehorizon,twohere)▪ Therefore,waitingseemsjustasgoodaseating:hemaygoeast,thenbackwestinthenextroundofreplanning!
Page 31
Thrashing--Fixed(d=2)
Evaluation function: Score + proximity to nearest dot
Page 32
Smartghosts—implicitcoordination
Evaluation function: proximity to Pacman
Page 34
MinimaxExample
12 8 5 23 2 144 6
Page 35
MinimaxPruning
12 8 5 23 2 14
Page 36
Alpha-BetaPruning
▪ Generalconfiguration(MINversion)▪ We’recomputingtheMIN-VALUEatsomenoden▪ We’reloopingovern’schildren▪ n’sestimateofthechildrens’minisdropping▪ Whocaresaboutn’svalue?MAX▪ LetabethebestvaluethatMAXcangetatanychoicepoint
alongthecurrentpathfromtheroot▪ Ifnbecomesworsethana,MAXwillavoidit,sowecanstop
consideringn’sotherchildren(it’salreadybadenoughthatitwon’tbeplayed)
▪ MAXversionissymmetric
MAX
MIN
MAX
MIN
a
n
Page 37
Alpha-BetaImplementation
defmin-value(state,α,β):initializev=+∞ foreachsuccessorofstate:
v=min(v,value(successor,α,β))ifv≤αreturnvβ=min(β,v)
returnv
defmax-value(state,α,β):initializev=-∞ foreachsuccessorofstate:
v=max(v,value(successor,α,β))ifv≥βreturnvα=max(α,v)
returnv
α:MAX’sbestoptiononpathtorootβ:MIN’sbestoptiononpathtoroot
Page 38
Alpha-BetaPruningProperties
▪ Thispruninghasnoeffectonminimaxvaluecomputedfortheroot!
▪ Valuesofintermediatenodesmightbewrong▪ Important:childrenoftherootmayhavethewrongvalue▪ Sothemostnaïveversionwon’tletyoudoactionselection
▪ Goodchildorderingimproveseffectivenessofpruning
▪ With“perfectordering”:▪ TimecomplexitydropstoO(bm/2)▪ Doublessolvabledepth!▪ Fullsearchof,e.g.chess,isstillhopeless…
▪ Thisisasimpleexampleofmetareasoning(computingaboutwhattocompute)
10 10 0
max
min
Page 39
Alpha-BetaQuiz
8 4
8
Page 40
Alpha-BetaQuiz2
10 100
10
2
2
10
Page 41
NextTime:Uncertainty!