1 Integrating Learning in Interactive Gaming Simulators Testbed for Integrating and Evaluating Learning Techniques David W. Aha 1 & Matthew Molineaux 2 1 Intelligent Decision Aids Group Navy Center for Applied Research in AI Naval Research Laboratory; Washington, DC 2 ITT Industries; AES Division; Alexandria, VA first.surname@nrl.navy.mil 17 November 2004 TIEL T
83
Embed
Testbed for Integrating and Evaluating Learning Techniques
Testbed for Integrating and Evaluating Learning Techniques. TIELT. David W. Aha 1 & Matthew Molineaux 2 1 Intelligent Decision Aids Group Navy Center for Applied Research in AI Naval Research Laboratory; Washington, DC 2 ITT Industries; AES Division; Alexandria, VA - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1Integrating Learning in Interactive Gaming Simulators
Testbed for Integrating and Evaluating Learning Techniques
David W. Aha1 & Matthew Molineaux2
1Intelligent Decision Aids GroupNavy Center for Applied Research in AI
Naval Research Laboratory; Washington, DC2ITT Industries; AES Division; Alexandria, VA
7Testbed for Integrating and Evaluating Learning Techniques
Problem
Status of Learning in Cognitive Systems
Few deployed cognitive systems integrate techniques that exhibit rapid & enduring learning behavior on complex tasks
– It’s costly to integrate & evaluate embedded learning techniques
Few deployed cognitive systems integrate techniques that exhibit rapid & enduring learning behavior on complex tasks
– It’s costly to integrate & evaluate embedded learning techniques
Complication
Machine learning (ML) researchers tend to investigate:¬Rapid: Knowledge poor algorithms
¬Enduring: Learning over a short time period
¬Embedded: Stand-alone evaluations
Machine learning (ML) researchers tend to investigate:¬Rapid: Knowledge poor algorithms
¬Enduring: Learning over a short time period
¬Embedded: Stand-alone evaluations
8Testbed for Integrating and Evaluating Learning Techniques
TIELT Motivation
We want Cognitive Agents that Learn• Rapidly,• in context, and• over the long-term.
We have few (if any) of them
We want Cognitive Agents that Learn• Rapidly,• in context, and• over the long-term.
We have few (if any) of them
9Testbed for Integrating and Evaluating Learning Techniques
TIELT Objective
Encourage the study of research on learning in cognitive systems, with subsequent transition goals
ML Researchers
Learning Modules
Cognitive Agents
Cognitive Agents
ThatLearn
Military
Industry
10Testbed for Integrating and Evaluating Learning Techniques
Current ML Research Focus
Benchmark studies of multiple algorithms on simple (e.g., supervised) learning tasks from many static datasets
ML Researcher
ML System1Database1
This was encouraged (in part) by the availability of datasets in a standard (interface) format
Database2
Databasem
ML System2
ML Systemn
......
m results on System1
m results on System2
m results on Systemn
...Analysis
BenchmarkAnalysis
11Testbed for Integrating and Evaluating Learning Techniques
Previous API for ML Investigations
Supervised Learning ML
SystemDatabase Interface
(standard format)
(e.g., UCI Repository)
ReasoningSystem
Supervised Learning ML
SystemDatabase Interface
(standard format)
(e.g., UCI Repository)
ReasoningSystem
Supervised Learning ML
Systemj
DatabaseiInterface
(standard format)
DecisionSystemk
Inspiration
UC Irvine Repository of Machine Learning (ML) Databases • An interface for empirical benchmarking studies on supervised learning• 1525 citations (and many publications use it w/o citing) since 1986
UC Irvine Repository of Machine Learning (ML) Databases • An interface for empirical benchmarking studies on supervised learning• 1525 citations (and many publications use it w/o citing) since 1986
Limitation
• Only useful for isolated ML studies• Has not encouraged studies of ML in cognitive systems
• Only useful for isolated ML studies• Has not encouraged studies of ML in cognitive systems
12Testbed for Integrating and Evaluating Learning Techniques
Accomplishing TIELT’s Objective
One approach: Shift ML research focus from static datasets to dynamic simulators of rich environments
Supervised Learning ML
SystemDatabase Interface
(standard format)
(e.g., UCI Repository)
ReasoningSystem
Supervised Learning ML
SystemDatabase Interface
(standard format)
(e.g., UCI Repository)
ReasoningSystem
Supervised Learning ML
Systemj
DatabaseiInterface
(standard format)
(e.g., UCI Repository of ML Databases)
DecisionSystemk
Cognitive Learning Reasoning Modules
Wor
ld(S
imu
late
d/R
ea
l)
Sensors ML Module
Interface(standard API)
ML Module
ML Module(e.g., TIELT)Effectors
Cognitive Learning Reasoning Modules
Wor
ld(S
imu
late
d/R
ea
l)
Sensors ML Module
Interface(standard API)
ML Module
ML Module(e.g., TIELT)Effectors
Cognitive Learning Decision Systemk
Wor
ldi
(Sim
ula
ted/
Re
al)
Sensors ML Module
Interface(standard API)
ML Module
ML Modulej
(e.g., TIELT)Effectors
13Testbed for Integrating and Evaluating Learning Techniques
Refining TIELT’s Objective
Objective
Develop a tool for evaluating decision systems in simulators– Specific support for evaluating learning techniques– Demonstrate research utility prior to approaching industry/military
Develop a tool for evaluating decision systems in simulators– Specific support for evaluating learning techniques– Demonstrate research utility prior to approaching industry/military
Benefits
1. Reduce system-simulator integration costs from m*n to m+n (see next)
2. Permits benchmark studies on selected simulator tasks
3. Encourages study of ML for knowledge-intensive problems
4. Provide support for DARPA Challenge Problems on Cognitive Learning
1. Reduce system-simulator integration costs from m*n to m+n (see next)
2. Permits benchmark studies on selected simulator tasks
3. Encourages study of ML for knowledge-intensive problems
4. Provide support for DARPA Challenge Problems on Cognitive Learning
14Testbed for Integrating and Evaluating Learning Techniques
Reducing Integration Costs
Integrating a simulator & cognitive system: Its expensive! (time, $)
Simulator1 Cognitive System1
Problem: Prohibitive integration costs retard research progress
m*n integrations
Simulator1
Simulatorm
Cognitive System1
Cognitive Systemn
......
Proposed Solution: Standardize integrations to reduce costs
m+n integrations
Simulator1
Simulatorm
Cognitive System1
Cognitive Systemn
TIELT......
15Testbed for Integrating and Evaluating Learning Techniques
What Domain?
Desiderata
1. Available implementations (cheap to acquire & run)2. Challenging problems for CogSys/ML research 3. Significant interest (academia, military, industry, funding, public)
1. Available implementations (cheap to acquire & run)2. Challenging problems for CogSys/ML research 3. Significant interest (academia, military, industry, funding, public)
Simulation Games?
16Testbed for Integrating and Evaluating Learning Techniques
Gaming Genres of Interest(modified from (Laird & van Lent, 2001))
Control units and strategic enemy (i.e., other coach), commentator
Act as coach and a key player
Madden NFL Football
Team Sports
Control enemy1st vs. 3rd personIndividual competitionMany (e.g., driving games)
Individual Sports
Control all units and strategic enemies
God, first-person perspectives
Controlling at multiple levels (e.g., strategic, tactical warfare)
Empire Earth 2, AoE, Civilization
Strategy (real-time, discrete)
Control enemies, partners, and supporting characters
Solo vs. (massively) multi-player
Be a character (includes puzzle solving, etc.)
Temple of Elemental Evil
Role-Playing
Control enemies1st vs. 3rd person, solo vs team play
Control a characterQuake, UnrealAction
AI RolesSub-GenresDescriptionExampleGenre
17Testbed for Integrating and Evaluating Learning Techniques
Some Game Environment Challenges
• Significant background knowledge available– e.g., Processes, tasks, objects, actions– Use: Provide opportunities for rapid learning
• MeetingsAAAI symposia (several in recent years)International Conference on Computers and GamesAAAI’04 Workshop on Challenges in Game AIAI in Interactive Digital Entertainment Conference (2005-) …
• New journals focusing on (e.g., real-time) simulation gamesJ. of Game DevelopmentInt. J. of Intelligent Games and Simulation
• MeetingsAAAI symposia (several in recent years)International Conference on Computers and GamesAAAI’04 Workshop on Challenges in Game AIAI in Interactive Digital Entertainment Conference (2005-) …
• New journals focusing on (e.g., real-time) simulation gamesJ. of Game DevelopmentInt. J. of Intelligent Games and Simulation
• Game engines (e.g., GameBots, ORTS, RoboCup Soccer Server)Use (other) open source engines (e.g., FreeCiv, Stratagus)
(Ponsen, 2004 M.S. Thesis) Dynamic Scripting & GA for rule learning
Rule wts and new rules
Defeat Wargus opponent
Offline: Vary map size, learning algorithm, and opponent control alg; measure % wins
(Ulam et al., AAAI’04 Workshop) Self-adaptation Task Edits Defend city (FreeCiv) Offline: Vary trace size; measure % successes
Survey: Selected Previous Work onLearning & Gaming Simulators
20Testbed for Integrating and Evaluating Learning Techniques
Industry: Learning in Simulation Games
Focus: Increase sales via enhanced gaming experience
• USA: $7B in sales in 2003 (ESA, 2004)– Strategy games: $0.3B
• Simulators: Many! (e.g., SimCity, Quake, SoF, UT)• Target: Control avatars, unit behaviors
• USA: $7B in sales in 2003 (ESA, 2004)– Strategy games: $0.3B
• Simulators: Many! (e.g., SimCity, Quake, SoF, UT)• Target: Control avatars, unit behaviors
Evidence of commitment
• Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report)
• Middleware products that support learning (e.g., MASA, SHAI, LearningMachine)• Long-term investments in learning (e.g., iKuni, Inc.)• Conferences:
– Game Developer’s Conference– Computer Game Technology Conference
• Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report)
• Middleware products that support learning (e.g., MASA, SHAI, LearningMachine)• Long-term investments in learning (e.g., iKuni, Inc.)• Conferences:
– Game Developer’s Conference– Computer Game Technology Conference
21Testbed for Integrating and Evaluating Learning Techniques
Industry: Learning in Simulation Games
Some Promising Techniques (Rabin, 2004)• Belief networks for probabilistic inference• Decision tree learning• Genetic algorithms (e.g., for offline parameter tuning)• Statistical prediction (e.g., using N-grams to predict future events)• Neural networks (e.g., for offline applications)• Player modeling (e.g., to regulate game difficulty, model reputation)• Reinforcement learning• Weakness modification learning (e.g., don’t repeat failed
strategies)
• Belief networks for probabilistic inference• Decision tree learning• Genetic algorithms (e.g., for offline parameter tuning)• Statistical prediction (e.g., using N-grams to predict future events)• Neural networks (e.g., for offline applications)• Player modeling (e.g., to regulate game difficulty, model reputation)• Reinforcement learning• Weakness modification learning (e.g., don’t repeat failed
strategies)
Status
• Few deployed systems have used learning (Kirby, 2004): e.g.,1. Black & White: on-line, explicit (player immediately reinforces behavior)2. C&C Renegade: on-line, implicit (agent updates set of legal paths)3. Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping)
• Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system
• Few deployed systems have used learning (Kirby, 2004): e.g.,1. Black & White: on-line, explicit (player immediately reinforces behavior)2. C&C Renegade: on-line, implicit (agent updates set of legal paths)3. Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping)
• Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system
22Testbed for Integrating and Evaluating Learning Techniques
Military: Learning in Simulation Games
Focus: Training, analysis, & experimentation
• Learning: Acquisition of new knowledge or behaviors• Simulators: JWARS, OneSAF, Full Spectrum Command, etc.• Target: Control strategic opponent or own units
• Learning: Acquisition of new knowledge or behaviors• Simulators: JWARS, OneSAF, Full Spectrum Command, etc.• Target: Control strategic opponent or own units
Evidence of commitment
• “Learning is an essential ability of intelligent systems” (NRC, 1998)• “To realize the full benefit of a human behavior model within an intelligent
simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00)• “Successful employment of human behavior models…requires that [they] possess
the ability to integrate learning” (Banks & Stytz, CCGBR’00)• Conferences: BRIMS, I/ITSEC
• “Learning is an essential ability of intelligent systems” (NRC, 1998)• “To realize the full benefit of a human behavior model within an intelligent
simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00)• “Successful employment of human behavior models…requires that [they] possess
the ability to integrate learning” (Banks & Stytz, CCGBR’00)• Conferences: BRIMS, I/ITSEC
Status: No CGF simulator has been deployed with learning (D. Reece, 2003)
Some problems (Petty, CGFBR’01):• Cost of training phase• Loss of training control• Learning non-doctrinal behaviors• Learning unpredictable behaviors
Some problems (Petty, CGFBR’01):• Cost of training phase• Loss of training control• Learning non-doctrinal behaviors• Learning unpredictable behaviors
23Testbed for Integrating and Evaluating Learning Techniques
Analysis: Conclusions
State-of-the-art
1. Research on learning in complex gaming simulators is in its infancy• Knowledge-poor approaches are limited to simple performance tasks• Knowledge-intensive approaches require huge knowledge bases, which to date
have been manually encoded
2. Existing approaches have many simplifying assumptions• Scenario limitations (e.g., on number and/or capabilities of adversaries)• Learning is (usually) performed only off-line • Learned knowledge is not transferred (e.g., to playing other games)
1. Research on learning in complex gaming simulators is in its infancy• Knowledge-poor approaches are limited to simple performance tasks• Knowledge-intensive approaches require huge knowledge bases, which to date
have been manually encoded
2. Existing approaches have many simplifying assumptions• Scenario limitations (e.g., on number and/or capabilities of adversaries)• Learning is (usually) performed only off-line • Learned knowledge is not transferred (e.g., to playing other games)
Significant advances would include:
1. Fast acquisition approaches for a large amount of domain knowledge • This would enable rapid learning without requiring manual encoding
2. Demonstrations of on-line learning (i.e., within a single simulation run)3. Increasing knowledge transfer among tasks & simulators over time
24Testbed for Integrating and Evaluating Learning Techniques
TIELT Specification
1. Simplifies integration & evaluation!• Learning-embedded decision systems & gaming simulators• Supports communications, game model, perf. task, evaluation• Free & available
2. Learning foci• Task (e.g., learn how to execute, or advise on, a task)• Player (e.g., accept advice, predict a player’s strategies)• Game (e.g., learn/refine its objects, their relations, & behaviors)
active/passive, online/offline, direct/indirect, automated/interactive• Learning results should be available for inspection
4. Gaming simulators: Those with challenging learning tasks5. Reuse:
• Communications are separated from the game model & perf. task• Provide access to libraries of simulators & decision systems
1. Simplifies integration & evaluation!• Learning-embedded decision systems & gaming simulators• Supports communications, game model, perf. task, evaluation• Free & available
2. Learning foci• Task (e.g., learn how to execute, or advise on, a task)• Player (e.g., accept advice, predict a player’s strategies)• Game (e.g., learn/refine its objects, their relations, & behaviors)
27Testbed for Integrating and Evaluating Learning Techniques
TIELT’s Knowledge Bases
GameModel
Agent Description
GameInterface
Model
Decision System
InterfaceModel
ExperimentMethodology
Defines communication processes with the game engine
Defines communication processes with the decision system
Defines interpretation of the game• e.g., initial state, classes, operators, behaviors (rules)• Behaviors could be used to provide constraints on learning
Defines what decision tasks (if any) TIELT must support
Defines selected performance tasks (taken from Game Model Description) and the experiment to conduct
28Testbed for Integrating and Evaluating Learning Techniques
TIELT: Supported Performance Tasks
Types of Problem Solving Tasks
Analysis Synthesis
Classification
Diagnosis
Planning Design Scheduling
StructuralParametric
DecisionSupport
Performance vs. learning tasks
Performance: Application of the learned knowledge (e.g., classification)Learning: Activity of learning system (e.g., update weights in a neural net)
TIELT users will define complex, user-configurable performance tasks
29Testbed for Integrating and Evaluating Learning Techniques
An Example Complex Learning Task
Subtasks and supporting operations
1. Diagnosis: Identify (computer and/or human) opponent strategies & goals• Classification: Opponent recognition • Recording: Actions of opponents and their effects
–This repeatedly involves classification• Diagnosis: Identify goal(s) being solved by these effects• Classification: Identify goal(s), if solved, that prevents opponent goals
2. Planning: Select/adapt or create plan to achieve goals and win the game• Classification: Select top-level actions to achieve goals
– Iteratively identify necessary sub-goals and, finally, primitive actions• Design (parametric): Identify good initial layout of controllable assets
3. Execute plan• Recording: Collect measures of effectiveness, to provide feedback• Planning: If needed, re-plan, based on feedback, at Step 2
Task description
Win a real-time strategy gameThis involves several
challenging learning tasks
30Testbed for Integrating and Evaluating Learning Techniques
2. Define/store game model3. Select decision system/interface4. Define performance task(s)5. Define/select expt. methodology6. Run experiments7. Analyze displayed results
1. Define/store game interface model
2. Define/store game model3. Select decision system/interface4. Define performance task(s)5. Define/select expt. methodology6. Run experiments7. Analyze displayed results
GameInterface
Model
SelectedGameEngine
SelectedGameEngine
GameModel
Agent Description
ExperimentMethodology
Knowledge Base Libraries
Knowledge Base Libraries
GIMGIM
GIM
DSIMDSIM
DSIM
GMGM
GM
ADAD
AD
EMEM
EM
40Testbed for Integrating and Evaluating Learning Techniques
SelectedGame
Engine
Expt. Method.Editor
Game InterfaceModel Editor
Percepts
User
Decision SystemInterface Model Editor
Game ModelEditor
Agent Descr.Editor
GameModel
AgentDescription
Pe
rf.
Ta
sk
EvaluationInterface
Evaluator
Action / Control
Translator(Mapper)
Learning OutputsActions
Translated Model (Subset)
Learning Task
GameInterface
Model
Decision System
InterfaceModel
LearningTranslator(Mapper)
CurrentState
ModelUpdater
Database
ExperimentMethodology
StoredState
AdviceInterface Database
EngineState
Controller
SelectedDecisionSystem
TIELT’s Internal Communication Modules
41Testbed for Integrating and Evaluating Learning Techniques
Sensing the Game State(City placement example, inspired by Alpha Centauri, etc.)
TIELT
Game Interface
Model Editor
Sensors
User
Game ModelEditor
GameModel
Up
da
tes
GameInterface
Model
ActionTranslator
Actions
GameEngine
GameEngine
CurrentState
1
2
4 3
4
In Game Engine, the game begins; a colony pod is created and placed.
1
The Game Engine sends a “See” sensor message identifying the pod’s location.
This message template provides updates (instructions) to the Current State, telling it that there is a pod at the location See describes.
4
2
The Model Updater receives the sensor message and finds the corresponding message template in the Game Interface Model.
3
ControllerModel
Updater
3
The Model Updater notifies the Controller that the See action event has occurred.
5
5
42Testbed for Integrating and Evaluating Learning Techniques
Fetching Decisions from the Decision System(City placement example)
TIELT
SelectedDecisionSystem
SelectedDecisionSystem
LearningModule #1
LearningModule #n
. . .
User
Decision SystemInterface Model Editor
Agent Desc.Editor
AgentDescription
LearningTranslator
Translated Model (Subset)
Decision System
InterfaceModel
ActionTranslator
LearningOutputs
The Controller notifies the Learning Translator that it has received a See message.
The Learning Translator finds a city location task, which is triggered by the See message. It queries the controller for the learning mode, then creates a TestInput message to send to the reasoning system with information on the pod’s location and the map from the Current State.
The Decision System transmits output to the Action Translator.
The Learning Translator transmits the TestInput message to the Decision System.
1
2 23
4
Controller
CurrentState
1
4
2
3
43Testbed for Integrating and Evaluating Learning Techniques
TIELT
Game InterfaceModel Editor
User
ActionTranslator
Actions
GameEngine
GameEngine
1
2
4.a
The Action Translator receives a TestOutput message from the Decision System.
The Action Translator finds the TestOutput message template, determines it is associated with the city location task, and builds a MovePod operator (defined by the Current State) with the parameters of TestOutput.
The Game Engine receives Move and updates the game to move the pod toward its destination, or
The Action Translator determines that the Move Action from the Game Interface Model is triggered by the MovePod Operator and binds Move using information from MovePod.
Decision SystemInterface Model Editor
3
GameInterface
Model
Decision System
InterfaceModel
AdviceInterface
The Advice Interface receives Move and displays advice to a human player on what to do next, or makes a Prediction.
4.b, c1
4.a
2
3
Acting in the Game World(City placement example)
4.b
CurrentState
2
PredictionInterface
4.c
3
44Testbed for Integrating and Evaluating Learning Techniques
– Message content: Configurable• Instantiated templates tell it how to communicate with other modules
– Initialization messages: Start, Stop, Load Scenario, Set Speed– Game Model representations (w/ Lehigh University)
• Simple programs• TMK process models• PDDL (language used in planning competitions)
45Testbed for Integrating and Evaluating Learning Techniques
TIELT Status (November 2004)
Documentation
• TIELT User’s Manual (82 pages)1. TIELT Overview2. The TIELT User Interface3. Scripting in TIELT4. Theory of the Game Model5. Communications6. TMK Models7. Experiments
• TIELT Tutorial (45 pages)1. The Game Model2. The Game Interface Model3. Decision System Interface Model4. Agent Description5. Experiment Methodology
• TIELT User’s Manual (82 pages)1. TIELT Overview2. The TIELT User Interface3. Scripting in TIELT4. Theory of the Game Model5. Communications6. TMK Models7. Experiments
• TIELT Tutorial (45 pages)1. The Game Model2. The Game Interface Model3. Decision System Interface Model4. Agent Description5. Experiment Methodology
46Testbed for Integrating and Evaluating Learning Techniques
TIELT Status (November 2004)
Access
• TIELT www site (new) • Selected Components
– Documents: Documentation, publications, XML Spec– Status– Forum: A full-featured web forum/bulletin board – Bug Tracker: TIELT bug/feature tracking facility – FAQ-o-Matic: Questions and problem solutions; user-driven– Download
• TIELT www site (new) • Selected Components
– Documents: Documentation, publications, XML Spec– Status– Forum: A full-featured web forum/bulletin board – Bug Tracker: TIELT bug/feature tracking facility – FAQ-o-Matic: Questions and problem solutions; user-driven– Download
47Testbed for Integrating and Evaluating Learning Techniques
1. Communication
You Are
Here
2. Resources for learning to use TIELT
• TIELT Scripting syntax highlighting• Map of TIELT Component Interactions
– Thanks, Megan• Typed script interface
• TIELT Scripting syntax highlighting• Map of TIELT Component Interactions
– Thanks, Megan• Typed script interface
TIELT Issues (November 2004)
TCP/IP
TIELT
Library Calls
SWIG
TIELT is a multilingual application; this provides interfacing with many different games.
TIELT is a multilingual application; this provides interfacing with many different games.
48Testbed for Integrating and Evaluating Learning Techniques
Game Model3. Formatting
To no one’s surprise, everyone agrees that TIELT’s Game Model representation is inadequate.
Requests have been made for:• 3D Maps (Quake)• A different programming language• A relational operator representation• Standardized events
To no one’s surprise, everyone agrees that TIELT’s Game Model representation is inadequate.
Requests have been made for:• 3D Maps (Quake)• A different programming language• A relational operator representation• Standardized events
TIELT Issues (November 2004)
“We’re working on it”
49Testbed for Integrating and Evaluating Learning Techniques
Military • Full Spectrum Command (USC/Inst. Creative Technologies)
• RTS • Leading an Army Light Infantry Company
• 1st person
52Testbed for Integrating and Evaluating Learning Techniques
Promising Learning Strategies
Learning Strategy
Description When to Use Justification
Advice Giving Expert explains how to perform in a given state (this is the only interactive strategy listed here)
Speedup needed & expert is available
Permits quick acquisition of specific and general domain knowledge
Backpropagation Trains a 3-layer neural network (NN) of sigmoidal hidden units
Target is a non-linear function; offline training is ok
Many learning tasks are non-linear and some can be performed off-line
Case-Based Reasoning
Use/adapt solutions from experiences to solve similar problems
Cases complement incomplete domain model; problem-solving speed is crucial.
Quicker to adapt cases than reason from scratch, but requires domain-specific adaptation knowledge
Chunking Compile a sequence of steps into a macro For tasks requiring speedup Transforms a complex reasoning task into a fast retrieval task
Dynamic Scripting
RL for tasks with large state spaces that w/ domain knowledge can be collapsed into a smaller set
Small set of states exist, with a set of rules for each
Greatly speeds up RL approach, but requires analysis of task states
Evolutionary Computation
Evolutionary (genetic) selection on a population of genomes, where application dictates their rep’n
Search space is huge, and training can be done offline
Genome rep’ns can be task specific, so this powerful search method can be tuned for the task
Meta Reasoning After a failure, this identifies its type & task that failed, it retrieves a task-specific strategy to avoid this failure, and updates its model
To support self-adaptation Although knowledge intensive, this is an excellent method for changing problem-solving strategies
Neuroevolution Using a separate genetic algorithm population for learning each hidden unit’s weight in a NN
To support cooperating heterogeneous agents
A good offline agent-based learning approach for multi-agent gaming
Reinforcement Learning (RL)
Reinforce sequence of decisions after problem solving is completed
Reward is known only after sequence ends, and blame can be ascribed
Well-understood paradigm for learning action policies (i.e., what action to perform in a given state)
Relational MDPs Learn a Markov decision process re: objects & their relations using probabilistic relational models
Seeking knowledge transfer (KT) to similar environments
KT is crucial for learning quickly, and feasibly, for some tasks
53Testbed for Integrating and Evaluating Learning Techniques
TIELT-General Game Player Integration(with Stanford University’s Michael Genesereth)
GGP
TIELT GGP-TIELT
• Logical game formalisms• Access to remote players• WWW access
• Experiment design/control capabilities• Common game engine interface• Support for several learning approaches
• Play entire class of general games as well as TIELT-integrated gaming simulators.
• Compete remotely against reference players and other GGP systems.
• Define evaluation methodologies for learning experimentation.
• Participate in AAAI’05 GGP Competition.
Integration Architecture
GGPTest Bed
GGP Competitors
TIELT
TIELT-Ready GGP Competitors
W W W
Reference Opponents
54Testbed for Integrating and Evaluating Learning Techniques
Upcoming Events
1. National Conference on AI (AAAI’05; 24-28 July; Pittsburgh)– General Game Playing Competition ($10K prize)
2. Int. Joint Conference on AI (IJCAI’05; 30 July-5 August; Edinburgh)– Workshop: Reasoning, Representation, and Learning in Gaming
Simulation Tasks (Tentative title)
3. Int. Conference on ML (ICML’05; 7-11 August; Bonn)– Workshop submission in progress
4. Int. Conference on CBR (ICCBR’05; 23-26 August; Chicago)– Workshop & Competition: CBR in Games
1. National Conference on AI (AAAI’05; 24-28 July; Pittsburgh)– General Game Playing Competition ($10K prize)
2. Int. Joint Conference on AI (IJCAI’05; 30 July-5 August; Edinburgh)– Workshop: Reasoning, Representation, and Learning in Gaming
Simulation Tasks (Tentative title)
3. Int. Conference on ML (ICML’05; 7-11 August; Bonn)– Workshop submission in progress
4. Int. Conference on CBR (ICCBR’05; 23-26 August; Chicago)– Workshop & Competition: CBR in Games
55Testbed for Integrating and Evaluating Learning Techniques
Summary
TIELT: Mediates between a (gaming) simulator and a learning-embedded decision system
• Goals: – Simplify running learning expts with cognitive systems– Support DARPA challenge problems in learning
• Designed to work with many types of simulators & decision systems
TIELT: Mediates between a (gaming) simulator and a learning-embedded decision system
• Goals: – Simplify running learning expts with cognitive systems– Support DARPA challenge problems in learning
• Designed to work with many types of simulators & decision systems
– Enhances probability that TIELT will achieve its goals• We’re planning several TIELT-related events
56Testbed for Integrating and Evaluating Learning Techniques
Backup Slides
57Testbed for Integrating and Evaluating Learning Techniques
Metrics
Industry perspective
1. Ability to develop learned/learning behaviors of interest2. Time required to
• develop game interface & model KBs, and • these behaviors
3. Availability of learning-embedded reasoning systems4. Support for both off-line and on-line learning
1. Ability to develop learned/learning behaviors of interest2. Time required to
• develop game interface & model KBs, and • these behaviors
3. Availability of learning-embedded reasoning systems4. Support for both off-line and on-line learning
Research perspective
1. Time required to develop reasoning interface KB2. Ability to design/facilitate selected evaluation methodology3. Expressiveness of KB representation4. Breadth of learning techniques supported5. Breadth of learning and performance tasks supported6. Availability of integrated gaming simulators & challenges
1. Time required to develop reasoning interface KB2. Ability to design/facilitate selected evaluation methodology3. Expressiveness of KB representation4. Breadth of learning techniques supported5. Breadth of learning and performance tasks supported6. Availability of integrated gaming simulators & challenges
58Testbed for Integrating and Evaluating Learning Techniques
2. Decision making speed and accuracy3. Plan execution quality (e.g., time to execute, mission-
specific Measures of Effectiveness)4. Number of constraint violations5. Ability to transfer learned knowledge
59Testbed for Integrating and Evaluating Learning Techniques
TIELT: Potential Learning Challenge Problems
1. Learn to win a game (i.e., accomplish an objective)• e.g., solve a challenging diplomacy task, provide a realistic
military training course facing intelligent adversaries, or help users to develop real-time cognitive reasoning skills for a defined role in support of a multi-echelon mission
1. Learn to win a game (i.e., accomplish an objective)• e.g., solve a challenging diplomacy task, provide a realistic
military training course facing intelligent adversaries, or help users to develop real-time cognitive reasoning skills for a defined role in support of a multi-echelon mission
2. Learn an adversary’s strategy• e.g., predict a terrorist group’s plan and/or tactics, suggest
appropriate responses to prevent adversarial goals, help users identify characteristics of adversarial strategies
2. Learn an adversary’s strategy• e.g., predict a terrorist group’s plan and/or tactics, suggest
appropriate responses to prevent adversarial goals, help users identify characteristics of adversarial strategies
3. Learn crucial processes of an environment• e.g., learn to improve an incorrect/incomplete game model so that
it more accurately/reliably defines objects/agents in the game, their behaviors, their capabilities, and their limitations
3. Learn crucial processes of an environment• e.g., learn to improve an incorrect/incomplete game model so that
it more accurately/reliably defines objects/agents in the game, their behaviors, their capabilities, and their limitations
4. Intelligent situation assessment• e.g., learn which factors in the simulation require attention to
accomplish different types of tasks
4. Intelligent situation assessment• e.g., learn which factors in the simulation require attention to
accomplish different types of tasks
60Testbed for Integrating and Evaluating Learning Techniques
Example Game: FreeCiv(Discrete-time strategy)
http://www.freeciv.org
Civilization II (MicroProse)
•Civilization II (1996-): 850K+ copies sold – PC Gamer: Game of the Year Award winner– Many other awards
•Civilization series (1991-): Introduced the civilization-based game genre
•Civilization II (1996-): 850K+ copies sold – PC Gamer: Game of the Year Award winner– Many other awards
•Civilization series (1991-): Introduced the civilization-based game genre
FreeCiv (Civ II clone)
•Open source freeware•Discrete strategy game•Goal: Defeat opponents, or build a spaceship
•Resource management – Economy, diplomacy,
science, cities, buildings, world wonders
– Units (e.g., for combat)•Up to 7 opponent civs•Partial observability
•Open source freeware•Discrete strategy game•Goal: Defeat opponents, or build a spaceship
•Resource management – Economy, diplomacy,
science, cities, buildings, world wonders
– Units (e.g., for combat)•Up to 7 opponent civs•Partial observability
61Testbed for Integrating and Evaluating Learning Techniques
Previous FreeCiv/Learning Research
(Ulam et al., AAAI’04 Workshop on Challenges in Game AI)
• Title: Reflection in Action: Model-Based Self-Adaptation in Game Playing Agents • Scenarios:
– City defense: Defend a city for 3000 years
• Title: Reflection in Action: Model-Based Self-Adaptation in Game Playing Agents • Scenarios:
– City defense: Defend a city for 3000 years
62Testbed for Integrating and Evaluating Learning Techniques
FreeCiv CP Scenario
General description
• Game initialization: Your only unit, a “settler”, is placed randomly on a random world (see Game Options below). Players cyclically alternate play
• Objective: Obtain highest score, conquer all opponents, or build first spaceship• Scoring: “Basic” goal is to obtain 1000 points. Game options affect the score.
– Citizens: 2 pts per happy citizen, 1 per content citizen– Advances: 20 pts per World Wonder, 5 per “futuristic” advance– Peace: 3 pts per turn of world peace (no wars or combat)– Pollution: -10pts per square currently polluted
• Top-level tasks (to achieve a high score): – Develop an economy– Increase population– Pursue research advances– Opponent interactions: Diplomacy and defense/combat
• Game initialization: Your only unit, a “settler”, is placed randomly on a random world (see Game Options below). Players cyclically alternate play
• Objective: Obtain highest score, conquer all opponents, or build first spaceship• Scoring: “Basic” goal is to obtain 1000 points. Game options affect the score.
– Citizens: 2 pts per happy citizen, 1 per content citizen– Advances: 20 pts per World Wonder, 5 per “futuristic” advance– Peace: 3 pts per turn of world peace (no wars or combat)– Pollution: -10pts per square currently polluted
• Top-level tasks (to achieve a high score): – Develop an economy– Increase population– Pursue research advances– Opponent interactions: Diplomacy and defense/combat
Game Option Y1 Y2 Y3
World size Small Normal Large
Difficulty level Warlord (2/6) Prince (3/6) King (4/6)
#Opponent civilizations 5 5 7
Level of barbarian activity Low Medium High
63Testbed for Integrating and Evaluating Learning Techniques
FreeCiv CP Information Sources
Concepts in an Initial Knowledge Base
• Resources: Collection and useo Food, production, trade (money)
• Terrain: o Resources gained per turno Movement requirements
• Units:o Type (Military, trade, diplomatic, settlers, explorers)o Healtho Combat: Offense & defenseo Movement constraints (e.g., Land, sea, air)
• Government Types (e.g., anarchy, despotism, monarchy, democracy)• Research network: Identifies constraints on what can be studied at any time• Buildings (e.g., cost, capabilities)• Cities
o Population Growtho Happinesso Pollution
• Civilizations (e.g., military strength, aggressiveness, finances, cities, units)• Diplomatic states & negotiations
• Resources: Collection and useo Food, production, trade (money)
• Terrain: o Resources gained per turno Movement requirements
• Units:o Type (Military, trade, diplomatic, settlers, explorers)o Healtho Combat: Offense & defenseo Movement constraints (e.g., Land, sea, air)
• Government Types (e.g., anarchy, despotism, monarchy, democracy)• Research network: Identifies constraints on what can be studied at any time• Buildings (e.g., cost, capabilities)• Cities
o Population Growtho Happinesso Pollution
• Civilizations (e.g., military strength, aggressiveness, finances, cities, units)• Diplomatic states & negotiations
64Testbed for Integrating and Evaluating Learning Techniques
FreeCiv CP Decisions
Civilization decisions • Choice of government type (e.g., democracy)• Distribution of income devoted to research, entertainment, and wealth goals• Strategic decisions affecting other decisions (e.g., coordinated unit movement for trade)
• Choice of government type (e.g., democracy)• Distribution of income devoted to research, entertainment, and wealth goals• Strategic decisions affecting other decisions (e.g., coordinated unit movement for trade)
City decisions
Unit decisions
Diplomacy decisions
• Production choice (i.e., what to create, including city buildings and units)• Citizen roles (e.g., laborers, entertainers, or specialists), and laborer placement
– Note: Locations vary in their terrain, which generate different amounts of food, income, and production capability
• Production choice (i.e., what to create, including city buildings and units)• Citizen roles (e.g., laborers, entertainers, or specialists), and laborer placement
– Note: Locations vary in their terrain, which generate different amounts of food, income, and production capability
• Task (e.g., where to build a city, whether/where to engage in combat, espionage)• Movement
• Task (e.g., where to build a city, whether/where to engage in combat, espionage)• Movement
• Whether to sign a proffered peace treaty with another civilization• Whether to offer a gift
• Whether to sign a proffered peace treaty with another civilization• Whether to offer a gift
65Testbed for Integrating and Evaluating Learning Techniques
FreeCiv CP Decision Space
Variables • Civilization-wide variables
o N: Number of civilizations encounteredo D: Number of diplomatic states (that you can have with an opponent)o G: Number of government types available to youo R: Number of research advances that can be pursued o I: Number of partitions of income into entertainment, money, & research
• U: #Unitso L: Number of locations a unit can move to in a turn
• C: #Citieso Z: Number of citizens per cityo S: Citizen status (i.e., laborer, entertainer, doctor)o B: Number of choices for city production
• Civilization-wide variableso N: Number of civilizations encounteredo D: Number of diplomatic states (that you can have with an opponent)o G: Number of government types available to youo R: Number of research advances that can be pursued o I: Number of partitions of income into entertainment, money, & research
• U: #Unitso L: Number of locations a unit can move to in a turn
• C: #Citieso Z: Number of citizens per cityo S: Citizen status (i.e., laborer, entertainer, doctor)o B: Number of choices for city production
Decision complexity per turn (for a typical game state)
• O(DNGRI*LU*(SZB)C) ; this ignores both other variables and domain knowledgeo This becomes large with the number of units and citieso Example: N=3; D=5; G=3; R=4; I=10; U=25; L=4; C=8; Z=10; S=3; B=10o Size of decision space (i.e., possible next states): 2.5*1065 (in one turn!)
o Comparison: Decision space of chess per turn is well below 140 (e.g., 20 at first move)
• O(DNGRI*LU*(SZB)C) ; this ignores both other variables and domain knowledgeo This becomes large with the number of units and citieso Example: N=3; D=5; G=3; R=4; I=10; U=25; L=4; C=8; Z=10; S=3; B=10o Size of decision space (i.e., possible next states): 2.5*1065 (in one turn!)
o Comparison: Decision space of chess per turn is well below 140 (e.g., 20 at first move)
66Testbed for Integrating and Evaluating Learning Techniques
FreeCiv CP: A Simple Example Learning Task
Situation• We’re England (e.g., London)• Barbarians are north (in red)• Two other civs exist• Our military is weak
• We’re England (e.g., London)• Barbarians are north (in red)• Two other civs exist• Our military is weak
What should we do?• Ally with Wales? If so, how?• Build a military unit? Which?• Improve defenses? • Increase city’s production rate?• Build a new city to the south? Where?• Research “Gun Powder”? Or…?• Move our diplomat back to London?• A combination of these?
• Ally with Wales? If so, how?• Build a military unit? Which?• Improve defenses? • Increase city’s production rate?• Build a new city to the south? Where?• Research “Gun Powder”? Or…?• Move our diplomat back to London?• A combination of these?
What information could help with this decision?• Previous similar experiences• Generalizations of those experiences• Similarity knowledge
• Previous similar experiences• Generalizations of those experiences• Similarity knowledge
• Adaptation knowledge• Opponent model• Statistics on barbarian strength, etc.
• Adaptation knowledge• Opponent model• Statistics on barbarian strength, etc.
67Testbed for Integrating and Evaluating Learning Techniques
Decision Space Size
Analysis of the Example Learning Task
Situation• D: 3 (war, neutral, peace)• N: Only 1 other civilization
contacted (i.e., Wales)• G: 2 government types known• R: 4 research advances available• I: 5 partitions of income available• L: ~14 per unit• U: 3 Units (1 external, 2 in city)• C: 1 City
– S: 3 (entertainer, laborer, doctor)– Z: 6 citizens– B: 5 units/buildings it can produce
• D: 3 (war, neutral, peace)• N: Only 1 other civilization
contacted (i.e., Wales)• G: 2 government types known• R: 4 research advances available• I: 5 partitions of income available• L: ~14 per unit• U: 3 Units (1 external, 2 in city)• C: 1 City
– S: 3 (entertainer, laborer, doctor)– Z: 6 citizens– B: 5 units/buildings it can produce
• 1.2*109
• This reduces to ~32 sensible choices after applying some domain knowledge– e.g., don’t change diplomatic status now, keep units in city for defense, don’t change government
now (because it’ll slow production), keep external unit away from danger
• 1.2*109
• This reduces to ~32 sensible choices after applying some domain knowledge– e.g., don’t change diplomatic status now, keep units in city for defense, don’t change government
now (because it’ll slow production), keep external unit away from danger
Complexity function
• O(DNGRI*LU*(SZB)C)• O(DNGRI*LU*(SZB)C)
68Testbed for Integrating and Evaluating Learning Techniques
FreeCiv CP: Learning Opportunities
Learn to keep citizens happy • Citizens in a city who are unhappy will revolt; this temporarily eliminates city production• Several factors influence happiness (e.g., entertainment, military presence, gov’t type)
• Citizens in a city who are unhappy will revolt; this temporarily eliminates city production• Several factors influence happiness (e.g., entertainment, military presence, gov’t type)
Learn to obtain diplomatic advantages
Learn how to wage war successfully
Learn how to increase territory size
• Countries at war tend to have decreased trade, lose units and cities, etc.• Diplomats can sometimes obtain peace treaties or otherwise end wars• Unit movement decisions can also impact opponents’ diplomatic decisions
• Countries at war tend to have decreased trade, lose units and cities, etc.• Diplomats can sometimes obtain peace treaties or otherwise end wars• Unit movement decisions can also impact opponents’ diplomatic decisions
• Good military decisions can yield new cities/citizens/trade, but losses can be huge• Unit decisions can benefit from learning tactical coordinated behaviors• The selection of a military unit(s) for a task depends on the opponent’s capabilities
• Good military decisions can yield new cities/citizens/trade, but losses can be huge• Unit decisions can benefit from learning tactical coordinated behaviors• The selection of a military unit(s) for a task depends on the opponent’s capabilities
• Initially, unexplored areas are unknown; their resources (e.g., gold) cannot be harvested• Exploration needs to be balanced with security• City placement decisions influence territory expansion
• Initially, unexplored areas are unknown; their resources (e.g., gold) cannot be harvested• Exploration needs to be balanced with security• City placement decisions influence territory expansion
69Testbed for Integrating and Evaluating Learning Techniques
FreeCiv CP: Example Learned Knowledge
Learn what playing strategy to use in each adversarial situation
Combat Strength
Advantage
Current Diplomatic Status with Opponent
Allied Peace Neutral Distrustful War
None
Unfavorable
Favorable
Attack
Retreat!
Fortify
Legend
Trade
Seek Peace
Bribe
Strategy to use per adversarial situation
• Situations are defined by relative military strength, diplomatic status, whether the opponent has strong alliances, locations of forces, etc.
• Selecting a good playing strategy depends on many of these variables
• Situations are defined by relative military strength, diplomatic status, whether the opponent has strong alliances, locations of forces, etc.
• Selecting a good playing strategy depends on many of these variables
70Testbed for Integrating and Evaluating Learning Techniques
What Techniques Could Learn the Task of Selecting a Playing Strategy?
Meta-reasoning (e.g., Ulam et al., AAAI’04 Wkshp on Challenges in Game AI)
• Requires knowledge on:1. Tasks being performed
2. Types of failures that can occur when performing these tasks• T2: Overestimate own strength, underestimate enemy strength, …• T3: Incorrect assessment of enemy’s diplomatic status, …
3. Strategies for adapting these tasks• S1: Increase military strength • S2: Assess distribution of enemy forces• S3: Consider enemy’s diplomatic history
4. Mapping of failure types in (2) to adaptation strategies in (3)• Example: We decided to Attack, but underestimated enemy strength.
This was indexed by strategy S2, which we’ll do from now on in T2.
• Requires knowledge on:1. Tasks being performed
2. Types of failures that can occur when performing these tasks• T2: Overestimate own strength, underestimate enemy strength, …• T3: Incorrect assessment of enemy’s diplomatic status, …
3. Strategies for adapting these tasks• S1: Increase military strength • S2: Assess distribution of enemy forces• S3: Consider enemy’s diplomatic history
4. Mapping of failure types in (2) to adaptation strategies in (3)• Example: We decided to Attack, but underestimated enemy strength.
This was indexed by strategy S2, which we’ll do from now on in T2.
T1: Determine Playing Strategy
T3: Assess Diplomatic Status
T4: Select Strategy
Attack Retreat!Fortify TradeSeek PeaceBribe
T2: Assess Military Advantage
71Testbed for Integrating and Evaluating Learning Techniques
Challenges for Using Learning via Meta-Reasoning
How can its background knowledge be learned (efficiently)?
• i.e., tasks, failure types, failure adaptation strategies, mappings• Also, the agent needs to understand how to diagnosis an error
(i.e., identify which task failed and its failure type)
• i.e., tasks, failure types, failure adaptation strategies, mappings• Also, the agent needs to understand how to diagnosis an error
(i.e., identify which task failed and its failure type)
Can we scale it to more challenging learning problems?
• Currently, it has only been applied to simpler tasks– “Defend a City” (in FreeCiv)
• More difficult would be “Play Entire Game”
• Currently, it has only been applied to simpler tasks– “Defend a City” (in FreeCiv)
• More difficult would be “Play Entire Game”
What if only incomplete background knowledge exists?
• Could learning techniques be used to extend/correct it?– e.g., Learning from advice, case-based reasoning
72Testbed for Integrating and Evaluating Learning Techniques
Full Spectrum Command & Warrior(http://www.ict.usc.edu/disp.php?bd=proj_games)
Focus: US Army training tools (deployed @ Ft Benning & Afghanistan)
1. Full Spectrum Command (PC-based simulator)– Role: Commander of a U.S. Army light infantry Company (120 soldiers)– Tasks: Interpret the assigned mission, organize the force, plan
strategically, & coordinate the actions of the Company2. Full Spectrum Warrior (MS Xbox-based simulator)
1. Full Spectrum Command (PC-based simulator)– Role: Commander of a U.S. Army light infantry Company (120 soldiers)– Tasks: Interpret the assigned mission, organize the force, plan
strategically, & coordinate the actions of the Company2. Full Spectrum Warrior (MS Xbox-based simulator)
Organization: USC’s Institute for Creative Technologies
• POC: Michael van Lent (Editor-in-Chief, Journal of Game Development)• Goal: Develop immersive, interactive, real time training simulations to
help the Army create decision-making & leadership-development tools
• POC: Michael van Lent (Editor-in-Chief, Journal of Game Development)• Goal: Develop immersive, interactive, real time training simulations to
help the Army create decision-making & leadership-development tools
73Testbed for Integrating and Evaluating Learning Techniques
METAGAME(Pell, 1992)
Focus: Learn strategies to win any game in a pre-defined category
• Initial category: “Chess-like” games– Games are produced by a game generator
• Input: Rules on how to play the game– Move grammar is used to communicate actions
• Output (desired): A winning playing strategy
• Initial category: “Chess-like” games– Games are produced by a game generator
• Input: Rules on how to play the game– Move grammar is used to communicate actions
• Output (desired): A winning playing strategy
e.g., Knight-Zone Chess
Annual Competition based on METAGAME
• Title: General Game Playing (games.stanford.edu)• Champion: Michael Genesereth (Stanford U.)• AAAI’05 Prize: $10K
• Title: General Game Playing (games.stanford.edu)• Champion: Michael Genesereth (Stanford U.)• AAAI’05 Prize: $10K
Game Manager
Player
Games
RecordsTemporary State Data
Graphics for
Spectatorsperceptactionsclocks
action
74Testbed for Integrating and Evaluating Learning Techniques
Collaborator: Mad Doc Software
Summary
• PI: Ron Rosenberg (Producer)• Experience:
• Mad Doc is a leader in real-time strategy games; Empire Earth II is expected to sell in the millions of copies
• CEO Ian Davis (CMU PhD in Robotics) is a well known collaborator with the AI research community, and gave an invited presentation at AAAI’04. He will work with Ron on this contract.
• Deliverables: Mad Doc (RTS) game simulator API–This will be used by multiple other collaborators
• PI: Ron Rosenberg (Producer)• Experience:
• Mad Doc is a leader in real-time strategy games; Empire Earth II is expected to sell in the millions of copies
• CEO Ian Davis (CMU PhD in Robotics) is a well known collaborator with the AI research community, and gave an invited presentation at AAAI’04. He will work with Ron on this contract.
• Deliverables: Mad Doc (RTS) game simulator API–This will be used by multiple other collaborators
75Testbed for Integrating and Evaluating Learning Techniques
Collaborator: Troika Games
Summary
• PI: Tim Cain, Joint-CEO• Experience:
• Troika has outstanding experience with developing state-of-the-art role playing games, including Temple of Elemental Evil (ToEE)
• A game developer since 1982, Tim obtained an M.S. with a focus on machine learning at UC Irvine in the late 1980’s.
• Deliverables: ToEE (RPG) game simulator API–This will be used by some other collaborators (e.g., U. Michigan)
• PI: Tim Cain, Joint-CEO• Experience:
• Troika has outstanding experience with developing state-of-the-art role playing games, including Temple of Elemental Evil (ToEE)
• A game developer since 1982, Tim obtained an M.S. with a focus on machine learning at UC Irvine in the late 1980’s.
• Deliverables: ToEE (RPG) game simulator API–This will be used by some other collaborators (e.g., U. Michigan)
76Testbed for Integrating and Evaluating Learning Techniques
Collaborator: ISLE
Summary
• PIs: Dr. Seth Rogers, Dr. Pat Langley• Experience:
• ISLE (Institute for the Study of Learning and Expertise) is known for its ICARUS cognitive architecture, which is distinguished in part by its commitment to ground every symbol with a physical world object
• Pat Langley, founder of the journal Machine Learning, is known for his expertise in cognitive architectures and evaluation methodologies of learning systems.
• Deliverables: • ICARUS reasoning system API
• FreeCiv agent (with assistance from NWU) and SimCity agent• This will also be used by USC/ICT
• SimCity (RTS) game simulator API
• PIs: Dr. Seth Rogers, Dr. Pat Langley• Experience:
• ISLE (Institute for the Study of Learning and Expertise) is known for its ICARUS cognitive architecture, which is distinguished in part by its commitment to ground every symbol with a physical world object
• Pat Langley, founder of the journal Machine Learning, is known for his expertise in cognitive architectures and evaluation methodologies of learning systems.
• Deliverables: • ICARUS reasoning system API
• FreeCiv agent (with assistance from NWU) and SimCity agent• This will also be used by USC/ICT
• SimCity (RTS) game simulator API
77Testbed for Integrating and Evaluating Learning Techniques
Collaborator: Lehigh U.
Summary
• PI: Prof. Héctor Muñoz-Avila• Experience:
• Héctor is an expert on hierarchical planning technology, and in particular has expertise in case-based planning
• Collaborating with NRL on TIELT during CY04 on (1) Game Model description representations, (2) Stratagus/Wargus game simulator API, and (3) feedback on TIELT usage
• Deliverables: • Software for translating among Game Model representations• Stratagus/Wargus (RTS) game simulator API
– This may be used by UT Austin• Case-based planning reasoning system API
• PI: Prof. Héctor Muñoz-Avila• Experience:
• Héctor is an expert on hierarchical planning technology, and in particular has expertise in case-based planning
• Collaborating with NRL on TIELT during CY04 on (1) Game Model description representations, (2) Stratagus/Wargus game simulator API, and (3) feedback on TIELT usage
• Deliverables: • Software for translating among Game Model representations• Stratagus/Wargus (RTS) game simulator API
– This may be used by UT Austin• Case-based planning reasoning system API
78Testbed for Integrating and Evaluating Learning Techniques
Collaborator: NWU
Summary
• PIs: Prof. Ken Forbus, Prof. Tom Hinrichs• Experience:
• Ken is a leading AI/games researcher. He is also the leading worldwide researcher in computational approaches to reasoning by analogy.
• Ken’s group has extensive experience with qualitative reasoning approaches and with using the FreeCiv gaming simulator.
• Deliverables: • FreeCiv (Discrete Strategy) game simulator API
– This will be used by ISLE• Qualitative spatial reasoning system for FreeCiv API
• PIs: Prof. Ken Forbus, Prof. Tom Hinrichs• Experience:
• Ken is a leading AI/games researcher. He is also the leading worldwide researcher in computational approaches to reasoning by analogy.
• Ken’s group has extensive experience with qualitative reasoning approaches and with using the FreeCiv gaming simulator.
• Deliverables: • FreeCiv (Discrete Strategy) game simulator API
– This will be used by ISLE• Qualitative spatial reasoning system for FreeCiv API
79Testbed for Integrating and Evaluating Learning Techniques
Collaborator: U. Michigan
Summary
• PI: Prof. John Laird• Experience:
• John is the best-known AI/games researcher, and has extensive experience with integrating many commerical, freeware, and military game simulators with the Soar cognitive architecture.
• Deliverables: • Soar reasoning system API
– This will be used by USC/ICT• Applications of Soar to two game simulators (e.g., ToEE, Wargus)
• PI: Prof. John Laird• Experience:
• John is the best-known AI/games researcher, and has extensive experience with integrating many commerical, freeware, and military game simulators with the Soar cognitive architecture.
• Deliverables: • Soar reasoning system API
– This will be used by USC/ICT• Applications of Soar to two game simulators (e.g., ToEE, Wargus)
80Testbed for Integrating and Evaluating Learning Techniques
Collaborator: USC/ICT
Summary
• PI: Dr. Michael van Lent• Experience:
• Extensive implementation experience with AI/game research; PhD advisor was John Laird.
• Lead ICT’s development of Full Spectrum Warrior and Full Spectrum Command (FSC) in collaboration with Quicksilver Software and the Army’s PEO STRI. FSC is deployed at Ft. Benning and Afghanistan.
• Editor-in-Chief, Journal of Game Development• Deliverables:
• FSC (RTS) game simulator API• Applications of FSC with U. Michigan’s Soar and ISLE’s ICARUS
• PI: Dr. Michael van Lent• Experience:
• Extensive implementation experience with AI/game research; PhD advisor was John Laird.
• Lead ICT’s development of Full Spectrum Warrior and Full Spectrum Command (FSC) in collaboration with Quicksilver Software and the Army’s PEO STRI. FSC is deployed at Ft. Benning and Afghanistan.
• Editor-in-Chief, Journal of Game Development• Deliverables:
• FSC (RTS) game simulator API• Applications of FSC with U. Michigan’s Soar and ISLE’s ICARUS
81Testbed for Integrating and Evaluating Learning Techniques
Collaborator: UT Arlington
Summary
• PIs: Prof. Larry Holder, G. Michael Youngblood• Experience:
• Larry has extensive experience with developing unsupervised machine learning systems that use relational representations, and has lead efforts on developing the D’Artagnan cognitive architecture.
• Deliverables: • Urban Terror (FPS) game simulator API• D’Artagnan reasoning system API (partial)
• PIs: Prof. Larry Holder, G. Michael Youngblood• Experience:
• Larry has extensive experience with developing unsupervised machine learning systems that use relational representations, and has lead efforts on developing the D’Artagnan cognitive architecture.
• Deliverables: • Urban Terror (FPS) game simulator API• D’Artagnan reasoning system API (partial)
82Testbed for Integrating and Evaluating Learning Techniques
Collaborator: UT Austin
Summary
• PI: Prof. Risto Miikkulainen• Experience:
• Risto has significant experience with integrating neuro-evolution and similar approaches with game simulators.
• Collaborating with UT Austin’s Digital Media Laboratory’s development of the NERO (FPS) game simulator
• Deliverables: • Knowledge-intensive neuro-evolution reasoning system API• Application of this API using other simulators (e.g., FSC, Wargus)
and U. Wisconsin’s advice processing module
• PI: Prof. Risto Miikkulainen• Experience:
• Risto has significant experience with integrating neuro-evolution and similar approaches with game simulators.
• Collaborating with UT Austin’s Digital Media Laboratory’s development of the NERO (FPS) game simulator
• Deliverables: • Knowledge-intensive neuro-evolution reasoning system API• Application of this API using other simulators (e.g., FSC, Wargus)
and U. Wisconsin’s advice processing module
83Testbed for Integrating and Evaluating Learning Techniques
Collaborator: U. Wisconsin
Summary
• PI(s): Prof. Jude Shavlik (UW), Prof. Richard Maclin (U. Minn-Duluth)• Experience:
• Jude advised the first significant M.S. Thesis on applying machine learning to FPS game simulators (Geisler, 2002)
• Maclin, who will be on sabbatical at U. Wisconsin during this project, has performed extensive work with applying AI techinques (e.g., advice processing) to the RoboCup game simulator
• Deliverables: • RoboCup (team sports) game simulator API• Advice processing module• WWW-based repository for TIELT software components (e.g., APIs)
• PI(s): Prof. Jude Shavlik (UW), Prof. Richard Maclin (U. Minn-Duluth)• Experience:
• Jude advised the first significant M.S. Thesis on applying machine learning to FPS game simulators (Geisler, 2002)
• Maclin, who will be on sabbatical at U. Wisconsin during this project, has performed extensive work with applying AI techinques (e.g., advice processing) to the RoboCup game simulator
• Deliverables: • RoboCup (team sports) game simulator API• Advice processing module• WWW-based repository for TIELT software components (e.g., APIs)