Top Banner
AI in Games: Achievements and Challenges Yuandong Tian Facebook AI Research
67

Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Jan 23, 2018

Download

Data & Analytics

AI Frontiers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

AIinGames:AchievementsandChallenges

YuandongTianFacebookAIResearch

Page 2: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

GameasaVehicleofAI

Lesssafetyandethical concerns

Fasterthanreal-time

Infinite supplyoffully labeleddata

Controllable and replicable Low costper sample

Complicateddynamicswithsimplerules.

Page 3: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

GameasaVehicleofAI

Abstractgametoreal-world

Algorithmisslowanddata-inefficient Requirealotofresources.

Hardtobenchmarktheprogress

Page 4: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

GameasaVehicleofAI

BetterGames

Abstractgametoreal-world

Algorithmisslowanddata-inefficient Requirealotofresources.

Hardtobenchmarktheprogress

Page 5: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Page 6: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

ChessGo Poker

Page 7: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Breakout (1978)Pong (1972)

Page 8: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Super Mario Bro (1985) Contra (1987)

Page 9: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Doom (1993) KOF’94 (1994) StarCraft (1998)

Page 10: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Counter Strike (2000) The Sims 3 (2009)

Page 11: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

GTA V (2013)StarCraft II (2010) Final Fantasy XV (2016)

Page 12: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

GameasaVehicleofAI

BetterAlgorithm/System BetterEnvironment

Abstractgametoreal-world

Algorithmisslowanddata-inefficient Requirealotofresources.

Hardtobenchmarktheprogress

Page 13: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Ourwork

DarkForest GoEngine(YuandongTian,YanZhu,ICLR16)

DoomAI(Yuxin Wu,YuandongTian,ICLR17)

ELF:ExtensiveLightweightandFlexibleFramework(YuandongTianetal,arXiv)

BetterAlgorithm/System BetterEnvironment

Page 14: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

HowGameAIworks

Evenwithasuper-supercomputer,itisnotpossibletosearchtheentirespace.

Page 15: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

HowGameAIworks

Extensivesearch Evaluate

Evenwithasuper-supercomputer,itisnotpossibletosearchtheentirespace.

Consequence

Blackwins

Whitewins

Blackwins

Whitewins

Blackwins

Currentgamesituation

Lufei Ruan vs. Yifan Hou (2010)

Page 16: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

HowGameAIworks

Extensivesearch Evaluate Consequence

Blackwins

Whitewins

Blackwins

Whitewins

Blackwins

Howmanyactiondoyouhaveperstep?Checker:a few possiblemovesPoker:a few possiblemovesChess:30-40 possiblemovesGo:100-200 possiblemovesStarCraft:50^100possiblemoves

Alpha-betapruning+Iterativedeepening[MajorChessengine]

Monte-CarloTreeSearch+UCBexploration[MajorGoengine]???

CounterfactualRegretMinimization[Libratus,DeepStack]

Currentgamesituation

Page 17: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

HowGameAIworks

Extensivesearch Evaluate Consequence

Blackwins

Whitewins

Blackwins

Whitewins

Blackwins

Howcomplicatedisthegamesituation?Howdeepisthegame?

ChessGoPokerStarCraft

Linearfunctionforsituationevaluation[Stockfish]

DeepValuenetwork[AlphaGo,DeepStack]

Randomgameplaywithsimplerules[Zen,CrazyStone,DarkForest]

Endgamedatabase

Rule-based

Currentgamesituation

Page 18: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

HowtomodelPolicy/Valuefunction?

• Manymanualsteps• Conflictingparameters,notscalable.• Needstrongdomainknowledge.

• End-to-Endtraining• Lotsofdata,lesstuning.

• Minimaldomainknowledge.• Amazingperformance

Traditionalapproach DeepLearning

Non-smooth+high-dimensionalSensitivetosituations.OnestonechangesinGoleadstodifferentgame.

Page 19: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Case study: AlphaGo• Computations

• Trainwithmany GPUsandinferencewithTPU.

• Policynetwork• Trainedsupervisedfromhumanreplays.• Self-playnetworkwithRL.

• Highqualityplayout/rolloutpolicy• 2microsecondpermove,24.2%accuracy. ~30%• ThousandsoftimesfasterthanDCNNprediction.

• Valuenetwork• Predictsgameconsequenceforcurrentsituation.• Trainedon30Mself-playgames.

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 20: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

AlphaGo• PolicynetworkSL(trainedwithhumangames)

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 21: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

AlphaGo• FastRollout(2microsecond),~30% accuracy

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 22: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

MonteCarloTreeSearch

2/10

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/40

1/1

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/402/10

1/1

21/31

11/19

10/11

10/12

1/8

23/41

1/1

(a) (b) (c)

Tree policyDefault policy

Aggregatewinrates,andsearchtowardsthegoodnodes.

PUCT

Page 23: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

AlphaGo• ValueNetwork (trained via 30M self-played games)• How data are collected?

Game start

Current state

Sampling SL network(more diverse moves)

Game terminates

Sampling RL network (higher win rate)Uniformsampling

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 24: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

AlphaGo• ValueNetwork (trained via 30M self-played games)

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 25: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

AlphaGo

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 26: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Ourwork

Page 27: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

OurcomputerGoplayer: DarkForest

• DCNNasatreepolicy• Predictnextkmoves(ratherthannextmove)• Trainedon170kKGSdataset/80kGoGoD,57.1% accuracy.• KGS3Dwithoutsearch(0.1spermove)• Release3monthbeforeAlphaGo,<1%GPUs(fromAjaHuang)

Yuandong Tian and Yan Zhu, ICLR 2016

Yan Zhu

Page 28: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Name

Our/enemyliberties

Ko location

Our/enemystones/emptyplace

Our/enemystonehistory

Opponent rank

FeatureusedforDCNN

OurcomputerGoplayer: DarkForest

Page 29: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

PureDCNN

WinratebetweenDCNNandopensourceengines.

darkforest:Onlyusetop-1prediction,trainedonKGSdarkfores1:Usetop-3prediction,trainedonGoGoDdarkfores2:darkfores1 withfine-tuning.

Page 30: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

MonteCarloTreeSearch

2/10

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/40

1/1

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/402/10

1/1

21/31

11/19

10/11

10/12

1/8

23/41

1/1

(a) (b) (c)

Tree policyDefault policy

Aggregatewinrates,andsearchtowardsthegoodnodes.

Page 31: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

WinratebetweenDCNN+MCTSandopensourceengines.

darkfmcts3:Top-3/5,75krollouts,~12sec/move,KGS5d

94.2%

DCNN + MCTS

Page 32: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

• DCNN+MCTS• Usetop3/5movesfromDCNN,75krollouts.• StableKGS5d.Opensource.• 3rd placeonKGSJanuaryTournaments• 2nd placein9th UECComputerGoCompetition(NotthistimeJ)

DarkForest versusKoichiKobayashi(9p)

OurcomputerGoplayer: DarkForest

https://github.com/facebookresearch/darkforestGo

Page 33: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

WinRateanalysis(usingDarkForest)(AlphaGo versusLeeSedol)

https://github.com/facebookresearch/ELF/tree/master/go

New version of DarkForest on ELF platform

Page 34: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

FirstPersonShooter(FPS)Game

Playthegamefromthe raw image!

Yuxin Wu, Yuandong Tian, ICLR 2017

Yuxin Wu

Page 35: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

NetworkStructure

SimpleFrameStacking isveryuseful(ratherthanUsingLSTM)

Page 36: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Actor-Critic Models

Update Policynetwork

UpdateValuenetwork

Reward

Encourageactionsleadingtostateswithhigh-than-expectedvalue.Encouragevaluefunctiontoconvergetothetruecumulativerewards.Keepthediversityofactions

sT

s0

V (sT )

Page 37: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

CurriculumTraining

Fromsimpletocomplicated

Page 38: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Curriculum Training

Page 39: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

VizDoom AICompetition2016 (Track1)

Rank Bot 1 2 3 4 5 6 7 8 9 10 11 Totalfrags

1 F1 56 62 n/a 54 47 43 47 55 50 48 50 559

2 Arnold 36 34 42 36 36 45 36 39 n/a 33 36 413

3 CLYDE 37 n/a 38 32 37 30 46 42 33 24 44 393

Videos:https://www.youtube.com/watch?v=94EPSjQH38Yhttps://www.youtube.com/watch?v=Qv4esGWOg7w&t=394s

Wewonthefirstplace!

Page 40: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
Page 41: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

VisualizationofValuefunctionsBest4frames(agentisabouttoshoottheenemy)

Worst4frames(agentmissedtheshootandisoutofammo)

Page 42: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

ELF:Extensive,LightweightandFlexibleFrameworkforGameResearch

• Extensive• AnygameswithC++interfacescanbeincorporated.

• Lightweight• Fast.Mini-RTS(40KFPSpercore)• Minimalresourceusage(1GPU+severalCPUs)• Fast training (a couple of hours for a RTS game)

• Flexible• Environment-Actortopology• Parametrized gameenvironments.• ChoiceofdifferentRLmethods.

Yuandong Tian, Qucheng Gong, Wendy Shang, Yuxin Wu, Larry Zitnick (NIPS 2017 Oral)

Arxiv:https://arxiv.org/abs/1707.01067

Larry Zitnick

Qucheng Gong Wendy Shang

Yuxin Wu

https://github.com/facebookresearch/ELF

Page 43: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

How RL system works

Game1

GameN

Game2

Consumers (Python)

Actor

Model

Optimizer

Process 1

Process 2

Process N

Replay Buffer

Page 44: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

ELFdesign

Plug-and-play; no worry about the concurrency anymore.

Game1

GameN

Daemon(batch

collector)

Producer(GamesinC++)

Game2

History buffer

History buffer

History buffer

Consumers (Python)

Reply

BatchwithHistoryinfo

Actor

Model

Optimizer

Page 45: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
Page 46: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

PossibleUsage

• GameResearch• Boardgame(Chess,Go,etc)• Real-timeStrategyGame

• Complicated RL algorithms.• Discrete/Continuouscontrol

• Robotics

• DialogandQ&ASystem

Page 47: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Initialization

Page 48: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

MainLoop

Page 49: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Training

Page 50: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Self-Play

Page 51: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Multi-Agent

Page 52: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Monte-Carlo Tree Search

1/1

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/40

Page 53: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

FlexibleEnvironment-Actortopology

(b) Many-to-One (c) One-to-Many

Environment Actor

(a) One-to-OneVanilla A3C BatchA3C, GA3C Self-Play,

Monte-Carlo Tree Search

Environment Actor

Environment Actor

Environment

Environment Actor

Environment

Actor

Environment Actor

Actor

Page 54: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

RLPytorch

• ARLplatforminPyTorch• A3Cin30lines.• Interfacingwithdict.

Page 55: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Architecture Hierarchy

ELF

RTS EngineALEGo(DarkForest)

Mini-RTS Capturethe Flag

TowerDefense

An extensive framework that can host many games.

Specific game engines.

EnvironmentsPong Breakout

Page 56: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

AminiatureRTSengine

Enemy base

Your base

Your barracks

Worker

Enemy unit

Resource Game Name Descriptions Avg Game Length

Mini-RTS Gather resource and build troops to destroy opponent’s base.

1000-6000 ticks

Capture the Flag Capture the flag and bring it to your own base

1000-4000 ticks

Tower Defense Builds defensive towers to block enemy invasion.

1000-2000 ticks

Fog of War

Page 57: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Simulation Speed

Platform ALE RLE Universe Malmo

FPS 6000 530 60 120

Platform DeepMind Lab VizDoom TorchCraft Mini-RTSFPS 287(C) / 866(G)

6CPU+1GPU7,000 2,000 (Frameskip=50) 40,000

Page 58: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Training AI

Conv ReLUBN

x4

Policy

Value

Game visualization Game internal data(respectingfogofwar)

Location of all workers

Location of all melee tanks

Location of all range tanks

HP portion

UsingInternal Game data and A3C.Rewardisonlyavailableoncethegameisover.

Resource

Page 59: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

MiniRTS

Buildingthatcanbuildworkersandcollectresources.

Resourceunitthatcontains1000minerals.

Workerwhocanbuildbarracksandgatherresource.Lowspeedin movement andlowattackdamage.

Buildingthatcanbuildmeleeattackerandrangeattacker.

TankwithhighHP,mediummovementspeed,shortattackrange,highattackdamage.

TankwithlowHP,highmovementspeed,longattackrangeandmediumattackdamage.

Page 60: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Training AI9 discrete actions.

No. Action name Descriptions

1 IDLE Do nothing

2 BUILDWORKER If the base is idle, build a worker

3 BUILDBARRACK Moveaworker(gatheringoridle)toanemptyplaceandbuildabarrack.

4 BUILDMELEEATTACKER Ifwehaveanidlebarrack,buildanmeleeattacker.

5 BUILDRANGEATTACKER Ifwehaveanidlebarrack,builda range attacker.

6 HITANDRUNIfwehaverangeattackers,movetowardsopponentbaseandattack.Takeadvantageoftheirlongattackrangeandhighmovementspeedtohitandrunifenemycounter-attack.

7 ATTACK Allmeleeandrangeattackersattacktheopponent’sbase.

8 ATTACKINRANGE Allmeleeandrangeattackersattackenemiesinsight.

9 ALLDEFEND Alltroopsattackenemytroopsnearthebaseandresource.

Page 61: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Win rate against rule-based AI

Frame skip AI_SIMPLE AI_HIT_AND_RUN

50 68.4(±4.3) 63.6(±7.9)

20 61.4(±5.8) 55.4(±4.7)

10 52.8(±2.4) 51.1(±5.0)

SIMPLE(median)

SIMPLE(mean/std)

HIT_AND_RUN(median)

HIT_AND_RUN(mean/std)

ReLU 52.8 54.7(±4.2) 60.4 57.0(±6.8)

Leaky ReLU 59.8 61.0(±2.6) 60.2 60.3(±3.3)

ReLU + BN 61.0 64.4(±7.4) 55.6 57.5(±6.8)

Leaky ReLU + BN 72.2 68.4(±4.3) 65.5 63.6(±7.9)

Frame skip (how often AI makes decisions)

Network Architecture Conv ReLUBN

Page 62: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Effect of T-steps

Large T is better.

Page 63: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Transfer Learning and Curriculum TrainingAI_SIMPLE AI_HIT_AND_RUN Combined

(50%SIMPLE+50% H&R)

SIMPLE 68.4(±4.3) 26.6(±7.6) 47.5(±5.1)

HIT_AND_RUN 34.6(±13.1) 63.6(±7.9) 49.1(±10.5)

Combined(No curriculum) 49.4(±10.0) 46.0(±15.3) 47.7(±11.0)

Combined 51.8(±10.6) 54.7(±11.2) 53.2(±8.5)

AI_SIMPLE AI_HIT_AND_RUN CAPTURE_THE_FLAG

Withoutcurriculum training 66.0 (±2.4) 54.4 (±15.9) 54.2(±20.0)

Withcurriculum training 68.4 (±4.3) 63.6(±7.9) 59.9(±7.4)

MixtureofSIMPLE_AIandTrainedAI

Trainingtime

99%

Highest win rate against AI_SIMPLE: 80%

Page 64: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Monte Carlo Tree Search

MiniRTS (AI_SIMPLE) MiniRTS (Hit_and_Run)Random 24.2 (±3.9) 25.9 (±0.6)MCTS 73.2 (±0.6) 62.7 (±2.0)

MCTS uses complete information and perfect dynamicsMCTS evaluation is repeated on 1000 games, using 800 rollouts.

Page 65: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
Page 66: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Ongoing Work• One framework for different games.

• DarkForest remastered: https://github.com/facebookresearch/ELF/tree/master/go

• Richergamescenarios for MiniRTS.• Multiplebases(Expand?Rush?Defending?)• Morecomplicatedunits.• Provide a LUA interface for easier modification of the game.

• Realisticactionspace• Onecommand perunit

• Model-basedReinforcementLearning• MCTS with perfect information and perfect dynamics also achieves ~70% winrate

• Self-Play(TrainedAIversusTrainedAI)

Page 67: Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges

Thanks!