Top Banner
Learning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan Waldie Toryn Q. Klassen Richard Valenzano Margarita P. Castro Sheila A. McIlraith KR 2020 September 16
73

Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Mar 14, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Learning Reward Machines for Partially ObservableReinforcement Learning

Rodrigo Toro Icarte Ethan Waldie Toryn Q. Klassen Richard ValenzanoMargarita P. Castro Sheila A. McIlraith

KR 2020September 16

Page 2: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Hi, I’m Rodrigo :)

“The ultimate goal of AI is to create computer programsthat can solve problems in the world as well as humans.”

— John McCarthy

Our research incorporates insights from knowledge, reasoning, and learning,in service of building general-purpose agents.

Page 3: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Hi, I’m an AI researcher

“The ultimate goal of AI is to create computer programsthat can solve problems in the world as well as humans.”

— John McCarthy

Our research incorporates insights from knowledge, reasoning, and learning,in service of building general-purpose agents.

Page 4: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Hi, I’m an AI researcher

“The ultimate goal of AI is to create computer programsthat can solve problems in the world as well as humans.”

— John McCarthy

Our research incorporates insights from knowledge, reasoning, and learning,in service of building general-purpose agents.

Page 5: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Hi, I’m an AI researcher

“The ultimate goal of AI is to create computer programsthat can solve problems in the world as well as humans.”

— John McCarthy

Our research incorporates insights from knowledge, reasoning, and learning,in service of building general-purpose agents.

Page 6: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reinforcement Learning (RL)

RL Agent

Policy

Environment

Transition Probabilities

Reward Function

Action

Observation & reward

This learning process captures some aspects of human intelligence.

Page 7: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reinforcement Learning (RL)

RL Agent

Policy

Environment

Transition Probabilities

Reward Function

Action

Observation & reward

This learning process captures some aspects of human intelligence.

Page 8: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reinforcement Learning (RL)

RL Agent

Policy

Environment

Transition Probabilities

Reward Function

Action

Observation & reward

This learning process captures some aspects of human intelligence.

Page 9: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reinforcement Learning (RL)

RL Agent

Policy

Environment

Transition Probabilities

Reward Function

Action

Observation & reward

This learning process captures some aspects of human intelligence.

Page 10: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reinforcement Learning (RL)

Page 11: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reinforcement Learning (RL)

Page 12: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

How to enhance RL with KR

Page 13: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reinforcement Learning (RL)

Long-standing RL problems that we tackled using KR:

Reward specification.

Sample efficiency.

Memory.

...

Page 14: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward specification

Page 15: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward specification

Make a bridge: get wood, iron, and use the factory

LTL specifications1:3(got wood ∧ 3used factory) ∧ 3(got iron ∧3used factory)

Reward machines2:Automata-based reward functions

1 Teaching Multiple Tasks to an RL Agent using LTL (AAMAS-18).

2 Using Reward Machines for High-Level Task Specification and Decomposition in RL (ICML-18).

Page 16: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward specification

Make a bridge: get wood, iron, and use the factory

LTL specifications1:3(got wood ∧ 3used factory) ∧ 3(got iron ∧3used factory)

Reward machines2:Automata-based reward functions

1 Teaching Multiple Tasks to an RL Agent using LTL (AAMAS-18).

2 Using Reward Machines for High-Level Task Specification and Decomposition in RL (ICML-18).

Page 17: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward specification

Make a bridge: get wood, iron, and use the factory

LTL specifications1:3(got wood ∧ 3used factory) ∧ 3(got iron ∧3used factory)

Reward machines2:Automata-based reward functions

1 Teaching Multiple Tasks to an RL Agent using LTL (AAMAS-18).

2 Using Reward Machines for High-Level Task Specification and Decomposition in RL (ICML-18).

Page 18: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward machine

u0start u1

u2 u3

〈w, 0〉

〈¬w ∧ ¬i, 0〉

〈i, 0〉

〈¬i, 0〉

〈f, 1〉

〈¬f, 0〉

〈¬w ∧ i, 0〉

〈w, 0〉〈¬w, 0〉

Make a bridge: get wood, iron, and use the factory

Page 19: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward specification

Make a bridge: get wood, iron, and use the factory

LTL specifications1:3(got wood ∧ 3used factory) ∧ 3(got iron ∧3used factory)

Reward machines2:Automata-based reward functions

Formal languages3:Many formal languages → Reward machines.

1 Teaching Multiple Tasks to an RL Agent using LTL (AAMAS-18).2 Using Reward Machines for High-Level Task Specification and Decomposition in RL (ICML-18).

3 LTL and Beyond: Formal Languages for Reward Function Specification in RL (IJCAI-19).

Page 20: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Sample efficiency

Page 21: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Sample efficiency

500 1,000 1,500

0

0.2

0.4

0.6

0.8

1

Training steps (in thousands)

Avg.rewardper

step

Craft World

Page 22: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward machine

u0start u1

u2 u3

〈w, 0〉

〈¬w ∧ ¬i, 0〉

〈i, 0〉

〈¬i, 0〉

〈f, 1〉

〈¬f, 0〉

〈¬w ∧ i, 0〉

〈w, 0〉〈¬w, 0〉 How to exploit the reward machine’s structure:

CRM: Counterfactual reasoning.

HRM: Task decomposition.

RS: Reward shaping.

Page 23: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Sample efficiency

500 1,000 1,500

0

0.2

0.4

0.6

0.8

1

Training steps (in thousands)

Avg.

rewardper

step

Craft World

Legend: QL QL+RS HRM CRM CRM+RS

Page 24: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Sample efficiency

500 1,000 1,500 2,000 2,500

0

2

4

6

8

Training steps (in thousands)

Avg.

rewardper

step

Half-Cheetah

Legend: DDPG DDPG+RS HRM CRM CRM+RS

Page 25: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 26: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Agent

Button

(Cookie)

Page 27: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Agent

Button

(Cookie)

Page 28: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Agent

Button

(Cookie)

Page 29: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Agent

Button

(Cookie)

Page 30: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 31: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 32: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 33: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 34: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 35: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 36: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 37: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 38: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 39: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 40: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 41: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 42: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 43: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 44: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 45: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 46: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 47: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 48: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 49: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 50: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 51: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 52: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 53: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 54: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 55: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 56: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 57: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

(+1 Reward)

Page 58: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 59: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 60: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 61: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 62: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 63: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 64: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Page 65: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

The most popular approach:

Training LSTMs policies using a policy gradient method.

... starves in the cookie domain.

1 · 106 3 · 106 5 · 1060

50

100

150

200

Training steps

Rew

ard

Legend:OptimalACERA3CPPODDQN

Page 66: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward Machines as memory

If the agent can detect the color of the rooms ( , , , ), and when it presses thebutton ( ), eats a cookie ( ), and sees a cookie ( ), then:

B0

B1 B2B3

〈o/w, 0〉

〈o/w, 0〉 〈o/w, 0〉〈o/w, 0〉

〈 , 0〉

〈 , 0〉;〈 , 0〉

〈 , 0〉;〈 , 0〉

〈 , 1〉〈 , 1〉

... becomes a “perfect” memory for the cookie domain.

Learning Reward Machines for Partially Observable Reinforcement Learning (NeurIPS-19).

Page 67: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Reward Machines as memory

If the agent can detect the color of the rooms ( , , , ), and when it presses thebutton ( ), eats a cookie ( ), and sees a cookie ( ), then:

B0

B1 B2B3

〈o/w, 0〉

〈o/w, 0〉 〈o/w, 0〉〈o/w, 0〉

〈 , 0〉

〈 , 0〉;〈 , 0〉

〈 , 0〉;〈 , 0〉

〈 , 1〉〈 , 1〉

... becomes a “perfect” memory for the cookie domain.

Learning Reward Machines for Partially Observable Reinforcement Learning (NeurIPS-19).

Page 68: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Memory

Cookie domain

0 1 · 106 2 · 106 3 · 1060

50

100

150

200

Training steps

Rew

ard

Two keys domain

0 2 · 106 4 · 1060

50

100

150

Training stepsR

ewar

d

OptimalLRM-V2LRM-V1ACERA3CPPODDQN

∗Note: The detectors were also given to the baselines.

Page 69: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Summary

Page 70: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Summary

If you are interested in KR ∩ RL, consider reading our papers:

Advice-Based Exploration in Model-Based Reinforcement Learning (Canadian AI-18)Teaching Multiple Tasks to an RL Agent using LTL (AAMAS-18)Using Reward Machines for High-Level Task Specification and Decomposition in RL (ICML-18)LTL and Beyond: Formal Languages for Reward Function Specification in RL (IJCAI-19)Learning Reward Machines for Partially Observable RL (NeurIPS-19)

Symbolic Plans as High-Level Instructions for Reinforcement Learning (ICAPS-20)

Code: https://bitbucket.org/RToroIcarte/

Thanks! :)

Page 71: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Summary

If you are interested in KR ∩ RL, consider reading our papers:

Advice-Based Exploration in Model-Based Reinforcement Learning (Canadian AI-18)Teaching Multiple Tasks to an RL Agent using LTL (AAMAS-18)Using Reward Machines for High-Level Task Specification and Decomposition in RL (ICML-18)LTL and Beyond: Formal Languages for Reward Function Specification in RL (IJCAI-19)Learning Reward Machines for Partially Observable RL (NeurIPS-19)

Symbolic Plans as High-Level Instructions for Reinforcement Learning (ICAPS-20)

Code: https://bitbucket.org/RToroIcarte/

Thanks! :)

Page 72: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Summary

If you are interested in KR ∩ RL, consider reading our papers:

Advice-Based Exploration in Model-Based Reinforcement Learning (Canadian AI-18)Teaching Multiple Tasks to an RL Agent using LTL (AAMAS-18)Using Reward Machines for High-Level Task Specification and Decomposition in RL (ICML-18)LTL and Beyond: Formal Languages for Reward Function Specification in RL (IJCAI-19)Learning Reward Machines for Partially Observable RL (NeurIPS-19)

Symbolic Plans as High-Level Instructions for Reinforcement Learning (ICAPS-20)

Code: https://bitbucket.org/RToroIcarte/

Thanks! :)

Page 73: Learning Reward Machines for Partially Observable ...rntoro/docs/learningRM_KR20.pdfLearning Reward Machines for Partially Observable Reinforcement Learning Rodrigo Toro Icarte Ethan

Summary

If you are interested in KR ∩ RL, consider reading our papers:

Advice-Based Exploration in Model-Based Reinforcement Learning (Canadian AI-18)Teaching Multiple Tasks to an RL Agent using LTL (AAMAS-18)Using Reward Machines for High-Level Task Specification and Decomposition in RL (ICML-18)LTL and Beyond: Formal Languages for Reward Function Specification in RL (IJCAI-19)Learning Reward Machines for Partially Observable RL (NeurIPS-19)

Symbolic Plans as High-Level Instructions for Reinforcement Learning (ICAPS-20)

Code: https://bitbucket.org/RToroIcarte/

Thanks! :)