RLPy: A Value-Function-Based Reinforcement Learning ... · [7] G. Neumann, “The Reinforcement Learning Toolbox , Reinforcement Learning for Optimal Control Tasks”. MSc thesis,

Simulator

SimulatorSimulator Simulator

Value Function Policy

�n ✓n

V ⇡(s) ⇡ ⇥(s)>�s

WeightsFeatures

Interest in Value-based RL with linear function approximation

Increased Specialization in RL ➙ More Granular Framework

Comparison with State-of-the-art Techniques in Various Domains

An easy to use RL framework for both research and education

Easy Installation

Rapid Prototyping in Python

Grid World

Lack of granularity to accommodate recent advances in RL [6]

Challenging for entry level due to programming languages (e.g. C++)[6-7]

Not Self-contained [5]

Existing Gap

Examples

Conclusion

RLPy is an object-oriented reinforcement learning (RL) software package with focus on value- function-based methods using linear function approximation and discrete actions. The framework was designed for both education and research purposes. It provides a rich library of fine-grained, easily exchangeable compo-nents for learning agents (e.g., policies or representations of value functions), fa-cilitating recent increased specialization in RL. RLPy is written in Python to al-low fast prototyping but is also suitable for large-scale experiments through its inbuilt support for optimized numerical libraries and parallelization. Code profil-ing, domain visualizations, and data analysis are integrated in a self-contained package available under the Modified BSD License. All these properties allow users to compare various RL algorithms with little effort.

RLPy is a new python based open-source RL framework.

Simplifies the construction of new RL ideas

Accessible for both novice and expert users

Realizes reproducible experiments

RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and ResearchAlborz Geramifard*, Christoph Dann, Robert H. Klein, William Dabney*, Jonathan P. How

Problem

Abstract

[1] A. Tamar, H. Xu, S. Mannor, “Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations”, In Proceedings of the 31t International Conference on Machine Learning (ICML), pages 127-135, 2014[2] D. Calandriello, A. Lazaric, M. Restelli, “Sparse Multi-Task Reinforcement Learning”, In Proceedings of Advances in Neural Information Processing Systems (NIPS), pages 819–827, Quebec, Canada, 2014[3] J. Z. Kolter and A. Y. Ng, “Regularization and Feature Selection in Least-squares Temporal Difference Learning”, In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 521–528, New York, NY, USA, 2009[4] A. Geramifard, T. Walsh, N. Roy, and J. How “Batch iFDD: A Scalable Matching Pursuit Algorithm for Solving MDPs”, The Conference on Uncertainty in Artificial Intelligence (UAI), 2013[5] D. Aberdeen, “LibPGRL: A high performance reinforcement learning library in C++”, 2007. URL https://code.google.com/p/libpgrl.[6] F. de Comite, “PIQLE: A Platform for Implementation of Q-Learning Experiments”, 2006. URL http://piqle.sourceforge.net.[7] G. Neumann, “The Reinforcement Learning Toolbox , Reinforcement Learning for Optimal Control Tasks”. MSc thesis, TU Graz, 2005* Author is currently employed at Amazon.

Value Function / Policy

Domain

Learning Agent

Representation

Policy⇡

st+1, rt+1RL Agent

Experiment

> pip install -U RLPy

Simulator

Cart Pole

Plotting

Learning Steps

Learning Episodes

Learning Time

Return

Number of Features

Termination

TOTAL TIME84.49 (s)

Pendulum:286:step6.90%

(0.72%)26559×

mlab:988:rk45.30%

(1.92%)26559×

5.30%26559×

numeric:256:asarray1.24%

(0.12%)134332×

1.18%106236×

Pendulum:347:_dsdt2.10%

(2.10%)106236×

2.10%106236×

Experiment:105:performanceRun26.30%(0.09%)

4.28%16559×

eGreedy:29:pi33.94%(0.11%)27047×

21.93%16559×

GeneralTools:178:randSet0.60%

(0.09%)27691×

0.59%27047×

Representation:259:bestActions35.40%(0.26%)35987×

33.21%25987×

backend_macosx:24:mainloop11.90%(0.00%)

~:0:<matplotlib.backends._macosx.show>11.90%

(11.74%)1×

11.90%1×

backend_bases:69:__call__11.90%(0.00%)

11.90%1×

~:0:<method 'sum' of 'numpy.ndarray' objects>5.81%

(0.52%)527988×

_methods:16:_sum5.57%

(0.83%)537989×

5.29%527988×

~:0:<method 'reduce' of 'numpy.ufunc' objects>9.02%

(9.02%)1137592×

4.74%537989×

tiles:74:_hash15.67%(8.59%)527988×

5.81%527988×

~:0:<numpy.core.multiarray.arange>2.15%

(2.15%)747267×

1.13%527988×

main:45:main100.00%(0.00%)

OnlineExperiment:54:run87.28%(0.12%)

87.28%1×

OnlineExperiment:125:save12.70%(0.00%)

12.70%1×

2.62%10000×

26.30%10×

12.01%10488×

Greedy_GQ:36:learn46.14%(1.45%)10000×

46.14%10000×

pyplot:123:show11.90%(0.00%)

11.90%1×

Representation:143:phi48.62%(0.13%)45987×

20.99%20000×

Representation:161:phi_sa5.25%

(3.84%)127961×

0.84%20000×

GeneralTools:117:count_nonzero20.30%

(19.91%)10000×

20.30%10000×

Representation:290:bestAction2.23%

(0.03%)10000×

2.23%10000×

~:0:<numpy.core._dotblas.dot>0.92%

(0.92%)128359×

0.15%20000×

Representation:112:Qs34.14%(0.47%)35987×

Representation:127:Q5.53%

(0.33%)107961×

5.53%107961×

27.63%25987×

~:0:<numpy.core.multiarray.array>2.50%

(2.50%)805132×

0.40%35987×

4.41%107961×

0.76%107961×

tiles:54:phi_nonTerminal48.30%

(17.30%)45063×

48.30%45063×

34.14%35987×

~:0:<method 'max' of 'numpy.ndarray' objects>0.54%

(0.04%)36047×

0.54%35987×

0.80%127961×

~:0:<numpy.core.multiarray.zeros>0.85%

(0.85%)228631×

0.61%127961×

2.18%10000×

_methods:28:_all4.41%

(0.57%)563331×

3.85%563331×

~:0:<method 'all' of 'numpy.ndarray' objects>4.96%

(0.54%)563331×

4.41%563331×

1.12%134332×

0.14%45063×

0.13%45063×

tiles:82:_physical_addr30.51%(7.53%)450630×

30.51%450630×

15.67%527988×

fromnumeric:1643:all7.31%

(0.94%)562496×

7.31%562496×

4.95%562496×

numeric:326:asanyarray1.52%

(0.68%)589084×

1.42%562496×

0.83%589084×

RLPy (rlpy.readthedocs.org)

RLPy: A Value-Function-Based Reinforcement Learning ... · [7] G. Neumann, “The Reinforcement Learning Toolbox , Reinforcement Learning for Optimal Control Tasks”. MSc thesis,

Documents

Tutorial: Deep Reinforcement Learning - Machine Learning...

Reinforcement Learning Lecture Inverse Reinforcement...

Reinforcement Learning - 4. Model-free reinforcement...

From Reinforcement Learning to Deep Reinforcement...

The Reinforcement Learning Toolbox – Reinforcement...

Bayesian Reinforcement Learning -...

Generalization in Reinforcement Learning: Successful...

Reinforcement Learning Introduction Passive Reinforcement...

Reinforcement Learning Das Reinforcement Learning-Problem...

Www.igi.tu-graz.ac.at/ril-toolbox The Reinforcement Learning...

Hierarchical Deep Reinforcement Learning: Integrating...

Reinforcement Learning Chapter 13 What is Reinforcement...

Eick: Reinforcement Learning. Reinforcement Learning...

Reinforcement Learning Seminar F Gerhard NEUMANN Helmut...

Multi-Objective Reinforcement Learning using Sets of Pareto....

Reinforcement Learning & Apprenticeship Learning