Top Banner
1 CS 188: Artificial Intelligence Fall 2010 Advanced Applications: Robotics / Vision / Language Dan Klein – UC Berkeley Many slides from Pieter Abbeel, John DeNero 1 Announcements Project 5: Classification up now! Due date now after contest Also: drop-the-lowest Contest: In progress! New staff bot (w/ extra credit) New achievements 2
16

CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

Aug 28, 2018

Download

Documents

doanhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

1

CS 188: Artificial IntelligenceFall 2010

Advanced Applications:

Robotics / Vision / Language

Dan Klein – UC Berkeley

Many slides from Pieter Abbeel, John DeNero

1

Announcements

� Project 5: Classification up now!

� Due date now after contest

� Also: drop-the-lowest

� Contest: In progress!

� New staff bot (w/ extra credit)

� New achievements

2

Page 2: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

2

So Far: Foundational Methods

3

Now: Advanced Applications

4

Page 3: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

3

Web Search / IR

� Information retrieval:

� Given information needs,

produce information

� Includes, e.g. web search,

question answering, and

classic IR

� Web search: not exactly

classification, but rather

ranking

x = “Apple Computers”

Feature-Based Ranking

x = “Apple Computers”

x,

x,

Page 4: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

4

Perceptron for Ranking

� Inputs

� Candidates

� Many feature vectors:

� One weight vector:

� Prediction:

� Update (if wrong):

Inverse RL: Motivation

� How do we specify a task like this?

[demo: hover / autorotate]

Page 5: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

5

Autonomous Helicopter Setup

On-board inertial

measurement unit (IMU)

Send out controls to

helicopter

Position

Helicopter MDP

� State:

� Actions (control inputs):� alon : Main rotor longitudinal cyclic pitch control (affects pitch rate)

� alat : Main rotor latitudinal cyclic pitch control (affects roll rate)

� acoll : Main rotor collective pitch (affects main rotor thrust)

� arud : Tail rotor collective pitch (affects tail rotor thrust)

� Transitions (dynamics):

� st+1 = f (st, at) + wt

[f encodes helicopter dynamics]

[w is a probabilistic noise model]

� Can we solve the MDP yet?

Page 6: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

6

Problem: What’s the Reward?

� Rewards for hovering:

� Rewards for “Tic-Toc”?

� Problem: what’s the target trajectory?

� Just write it down by hand?

11

[demo: hover / tic-toc]

[demo: bad]

Apprenticeship Learning

� Goal: learn reward function from expert demonstration

� Assume

� Get expert demonstrations

� Guess initial policy

� Repeat:

� Find w which make the expert better than

� Solve MDP for new weights w:

12

Page 7: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

7

Pacman Apprenticeship!

� Demonstrations are expert games

� Features defined over states s

� Score of a state given by:

� Learning goal: find weights which explain expert actions

[demo: pac apprentice]

Helicopter Apprenticeship?

14

[demo: unaligned / aligned]

Page 8: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

8

Probabilistic Alignment

� Intended trajectory satisfies dynamics.

� Expert trajectory is a noisy observation of one of the

hidden states.

� But we don’t know exactly which one.

Intended trajectory

Expert

demonstrations

Time indices

Alignment of Samples

� Result: inferred sequence is much cleaner!16

[demo: alignment]

Page 9: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

9

Final Behavior

17

[demo: airshow]

What is NLP?

� Fundamental goal: analyze and process human language, broadly, robustly, accurately…

� End systems that we want to build:� Ambitious: speech recognition, machine translation, information extraction,

dialog interfaces, question answering…

� Modest: spelling correction, text categorization…

18

Page 10: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

10

Problem: Ambiguities

� Headlines:� Enraged Cow Injures Farmer With Ax

� Hospitals Are Sued by 7 Foot Doctors

� Ban on Nude Dancing on Governor’s Desk

� Iraqi Head Seeks Arms

� Local HS Dropouts Cut in Half

� Juvenile Court to Try Shooting Defendant

� Stolen Painting Found by Tree

� Kids Make Nutritious Snacks

� Why are these funny?

Parsing as Search

20

Page 11: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

11

Grammar: PCFGs

� Natural language grammars are very ambiguous!

� PCFGs are a formal probabilistic model of trees

� Each “rule” has a conditional probability (like an HMM)

� Tree’s probability is the product of all rules used

� Parsing: Given a sentence, find the best tree – search!

ROOT → S 375/420

S → NP VP . 320/392

NP → PRP 127/539

VP → VBD ADJP 32/401

…..

21

Syntactic Analysis

Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun, where frightened tourists squeezed into musty shelters .

22

[demo]

Page 12: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

12

Machine Translation

� Translate text from one language to another

� Recombines fragments of example translations

� Challenges:

� What fragments? [learning to translate]

� How to make efficient? [fast translation search]

24

Page 13: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

13

Page 14: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

14

28

Page 15: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

15

Levels of Transfer

Machine Translation

Page 16: CS 188: Artificial Intelligencecs188/fa10/slides/FA10 cs188... · Apprenticeship Learning Goal: learn reward function from expert demonstration Assume Get expert demonstrations Guess

16

32

[demo: MT]