Top Banner
A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan
20

A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Dec 27, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory

Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Page 2: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Motivation

Prior Work • Nuxoll & Laird (‘12): integration and capabilities • Derbinsky & Laird (‘09): efficient algorithms Core Question To what extent is Soar’s episodic memory effective and efficient for real-time agents that persist for long periods of time across a variety of tasks?

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 2

Page 3: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Approach: Multi-Domain Evaluation

• Existing agents from diverse tasks (49) – Linguistics, planning, games, robotics

• Long agent runs

– Hours-days RT (105 – 108 episodes)

• Evaluate at each X episodes – Memory consumption – Reactivity for >100 task relevant cues

• Maximum time for cue matching <? 50 msec.

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 3

Page 4: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Outline

• Overview of Soar’s EpMem • Word Sense Disambiguation (WSD) • Planning • Video Games & Robotics

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 4

Page 5: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Working Memory

Episodic Memory Problem Formulation

20 June 2012 5

Representation • Episode: connected di-graph • Store: temporal sequence

Encoding/Storage • Automatic • No dynamics

Retrieval • Cue: acyclic graph • Semantics: desired features in context • Find the most recent episode that shares

the most leaf nodes in common with the cue

Episodic Memory

Encoding

Storage

Matching

Cue

Soar Workshop 2012 - Ann Arbor, MI

Page 6: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Episodic Memory Algorithmic Overview

Storage – Capture WM-changes as temporal intervals

Cue Matching (reverse walk of cue-relevant Δ’s)

– 2-phase search • Only graph-match episodes that have all cue features

independently – Only evaluate episodes that have changes relevant to

cue features – Incrementally re-score episodes

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 6

Page 7: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Episodic Memory Storage Characterization

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 7

R² = 0.9825

0

1000

2000

3000

4000

5000

6000

0 20 40 60 80 100 120 140 160

Avg.

Byt

es p

er E

piso

de

Avg. Working Memory Changes 1 week, 8GB

1 day, 8GB

Page 8: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Episodic Memory Retrieval Characterization

Assumptions – Few changes per episode (temporal contiguity) – Representational re-use (structural regularity) – Small cue

Scaling

– Search distance (# changes to walk) • Temporal Selectivity: how often does a WME change • Feature Co-Occurrence: how often do WMEs co-occur within a

single episode (related to search-space size) – Episode scoring (similar to rule matching)

• Structural Selectivity: how many ways can a cue WME match an episode (i.e. multi-valued attributes)

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 8

Page 9: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Word Sense Disambiguation Experimental Setup

• Input: <“word”, POS>; Output: sense #; Result – Corpus: SemCor (~900K eps/exposure)

• Agent

– Maintain context as n-gram – Query EpMem for context

• If success, get next episode, output result • If failure, null

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 9

Accuracy First Second

2-gram 14.57% 92.82%

3-gram 2.32% 99.47%

Page 10: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Word Sense Disambiguation Results

Storage – Avg. 234 bytes/episode

Cue Matching

– All 1-, 2-, and 3-gram cues reactive – 0.2% of 4-grams exceed 50msec.

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 10

Page 11: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

N-gram Retrieval Scaling Retrieval Time (msec) vs. Episodes (x1000)

20 June 2012 11

Feat

ure

Co

-Occ

urre

nce

Tem

pora

l Se

lect

ivity

0

0.5

1

1.5

2

2.5

0 1000 2000 3000 4000

{be, say} (69)

{say, group} (6)

{friday, say} (1)

{say}

0

5

10

15

20

25

0 1000 2000 3000 4000

{well, be, say}

{friday, say, group}

{friday, say}

{say}

Soar Workshop 2012 - Ann Arbor, MI

Page 12: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Planning Experimental Setup

• 12 automatically converted PDDL domains – Logistics, Blocksworld, Eight-puzzle, Grid, Gripper,

Hanoi, Maze, Mine-Eater, Miconic, Mystery, Rockets, and Taxi

– 44 distinct problem instances (e.g. # blocks)

• Agent: randomly explore state space

– 50K episodes, measure every 1K

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 12

Page 13: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Planning Results

Storage – Reactive: <12.04 msec./episode – Memory: 562 – 5454 bytes/episode

Cue Matching (reactive: < 50 msec.)

1. Full State: only smallest state + space size (12) 2. Relational: none 3. Schema: all (max = 0.08 msec.)

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 13

Page 14: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Video Games & Mobile Robotics Experimental Setup

• Hand-coded cues (per domain)

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 14

Domain Agent Duration Eval. Rate

TankSoar mapping-bot 3.5M 50K

Eaters advanced-move 3.5M 50K

Infinite Mario [Mohan & Laird ‘11] 3.5M 50K

Rooms World [Laird, Derbinsky & Voigt ‘11] 12 hours 300K

Page 15: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Data: Eaters

20 June 2012 15

05

101520253035404550

0 0.5 1 1.5 2 2.5 3 3.5

Retr

ieva

l Tim

e (m

sec)

Episodes (x1 Million)

1234567

Soar Workshop 2012 - Ann Arbor, MI

813 bytes/episode

Page 16: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Data: Infinite Mario

20 June 2012 16

05

101520253035404550

0 1 2 3

Retr

ieva

l Tim

e (m

sec)

Episodes (x1 Million)

1234567891011

Structural Selectivity

Soar Workshop 2012 - Ann Arbor, MI

2646 bytes/episode

Page 17: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Data: TankSoar

20 June 2012 17

05

101520253035404550

0 1 2 3

Retr

ieva

l Tim

e (m

sec)

Episodes (x1 Million)

1234567891011

Co-Occurrence

Soar Workshop 2012 - Ann Arbor, MI

1035 bytes/episode

Page 18: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Data: Mobile Robotics

20 June 2012 18

05

101520253035404550

0 10 20 30 40 50 60 70 80 90 100 110

Retr

ieva

l Tim

e (m

sec)

Episodes (x1 Million)

123456

Temporal Selectivity

Soar Workshop 2012 - Ann Arbor, MI

113 bytes/episode

Page 19: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Summary of Results Generality

– Demonstrated 7 cognitive capabilities • Virtual sensing, action modeling, long-term goal management, …

Reactivity

– <50 msec. storage time for all tasks (ex. temporal discontiguity) – <50 msec. cue matching for many cues

Scalability

– No growth in cue matching for many cues (days!) • Validated predictive performance models

– 0.18 - 4 kb/episode (days – months)

20 June 2012 19 Soar Workshop 2012 - Ann Arbor, MI

Page 20: A Multi-Domain Evaluation of Scaling in Soar’s Episodic …A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of

Evaluation

Nuggets • Unprecedented evaluation of

general episodic memory – Breadth, temporal extent, analysis

• Characterization of EpMem

performance via task-independent properties

• Soar’s EpMem (v9.3.2) is effective and efficient for many tasks and cues!

• Domains and cues available

Coal • Still easy to construct

domain/cue that makes Soar unreactive

• Unbounded memory consumption (given enough time)

20 June 2012 Soar Workshop 2012 - Ann Arbor, MI 20

For more details, see paper in proceedings of

AAAI 2012