Top Banner
Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS http://www.pyoudeyer.com http://flowers.inria.fr
19

Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Dec 25, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Developmental Mechanisms for Life-Long Autonomous Learning

in RobotsPierre-Yves Oudeyer

Project-Team INRIA-ENSTA-ParisTech FLOWERS

http://www.pyoudeyer.comhttp://flowers.inria.fr

Page 2: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Sensorimotor and social learning:

•Autonomous

•Open, « life-long learning »

•Real world, physical and social Experimental validation

Developmental robotics

• Intrinsic Motivation

• Maturation

• Imitation, Social guidance

Fundamental understanding of the mechanisms of development

Application to assistive robotics

Page 3: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

“Engineered” robot learning

• Engineer shows, with fixed interaction protocol in the lab:

• Target:

Regression algorithms (e.g. LGP, LWPR, Gaussian Mixture Regression)

ActionState/context

Actionpolicy

• Engineer provides a reward/fitness function:

• Target:

Optimization algorithms (e.g. NAC, non-linear Nelder-Mead, …)

OROR

« Real » world 

Developmental approach

Which generic reward function for spontaneous curiosity

driven learning?

Axe 2

?

Behaviour of human (non-engineer)

?

Axe 1

Page 4: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Learning from interactions with non-engineers

Non-engineer human behaviour

?

1. Intuitive multimodal interfaces• Synthesis and recognition of

emotion in speech (IJHCS, 2001, 5 patents)

• Clicker-training (RAS, 2002; 1 patent)

• Physical human-robot interfaces (Humanoids 2011)

• User studies (Humanoids 2009, HRI 2011)

• Adaptation: learning flexible teaching interfaces (Conn. Sci., 2006, ICDL 2011, IROS 2010)

Page 5: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Spontaneous active exploration, artificial curiosity

in the vicinity of

Non-stationary function, difficult to model

Algorithms for empirical evaluation of de/dt with statistical regression

IAC (2004, 2007), R-IAC (2009), SAGG-RIAC (2010)

McSAGG-RIAC (2011), SGIM (2011)

Non !

Intrinsic MotivationBerlyne (1960), Csikszentmihalyi (1996)Dayan and Belleine (2002)

Quelle fonction de récompense générique

?

Page 6: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Exploring and learning generalized forward and inverse models

Parameterized byParameterized by

Page 7: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

simple

complexe

complexe

simple

complexe

complexe

Explore zones where:•Uncertainty/errors maximal•Least exploredAssume:•Spatial or temporal stationarity•Everything is learnable within lifetime

Which experiment ?

Developmental approach

Explore zones where empirically learning progress is maximal

Active learning of models

Page 8: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Sensori state

Sensori state

Actionstate

Actionstate

Contextstate

Contextstate

Classic machine learnerM

(e.g. neural net, SVM, Gaussian process)

Classic machine learnerM

(e.g. neural net, SVM, Gaussian process)

Meta machine learnermetaM

Progressive categorization

Local model of learning progress

. . .

Sensori state at t+1

Prediction

Error feedback

Action selection system

Action selection system Intrinsic reward

Local model of learning progress

IAC, IEEE Trans. EC (2007)R-IAC, IEEE Trans. AMD (2009)

Page 9: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

The Playground Experiments

(IEEE Trans. EC 2007; Connection Science 2006; AAAI Work. Dev. Learn. 2005)

Page 10: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Experimentations on Open Learning in the Real World

Playground Experiments

•Autonomous learning of novel affordances and and skills, e.g. object manipulation

IEEE TEC, 2007; IROS 2010; IEEE TAMD, 2009; Front. Neurorobotics, 2007; Connect. Sc., 2006; IEEE ICDL 2010, 2011

simple

complex

complex

• Self-organization of developmental trajectories, bootstrapping of communication New hypotheses for understanding infant development

Front. Neuroscience 2007, Infant and Child Dev. 2008, Connect. Science 2006

Page 11: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Active learning of inverse models SAGG-RIAC (RAS, 2012)

(Context, Movement)

Effect

Redundancy of sensorimotor spaces

From the active choice of action, followed by observation of effect …

… to the active choice of effect, followed by the search of a corresponding action policy through goal-directed optimization (e.g. using NAC, POWER, PI^2-CMA, …)

self-defined RL problem

Spontaneous active exploration of a space of fitness functions parameterized by where one iteratively chooses the which maximizes the empirical evaluation of:

Page 12: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Apprentissage de la locomotion omnidirectionnelle

Performance higher than more classical active learning algorithms in real sensorimotor spaces (non-stationary, non homogeneous) (IEEE TAMD 2009; ICDL 2010, 2011; IROS 2010; RAS 2012)

Experimental evaluation of active learning efficiency

Control Space: Task Space:

Page 13: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .
Page 14: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Maturational constraints

• Progressive growths of DOF number and spatio-temporal resolution

• Adaptive maturational schedule controlled by active learning/learning progress

(Bjorklund, 1997; Turkewitz and Kenny, 1985)

Page 15: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

McSAGG-RIACMaturationally constrained curiosity-driven learning

(IEEE ICDL-Epirob 2011a)

Page 16: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

SGIM: Socially Guided Intrinsic Motivation

(ICDL-Epirob, 2011b)

Page 17: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

« Life-long » Experimentation

Acroban(Siggraph 2010, IROS 2011, World Expo, South Korea, 2012)

• Experimentation of algorithms for « life-long » learning in the real world

Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap

Page 18: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Ergo-Robots(Exhibition « Mathematics, a beautiful elsewhere »,Fond. Cartier, 2011-2012)

• Experimentation of algorithms for « life-long » learning in the real world

Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap

Ergo-Robots

Mid-term: open-source distribution of the platform to the scientific community

« Life-long » Experimentation

Page 19: Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Baranes, A., Oudeyer, P-Y. (2012) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots, Robotics and Autonomous Systems.http://www.pyoudeyer.com/RAS-SAGG-RIAC-2012.pdf

Baranes, A., Oudeyer, P-Y. (2011a) The Interaction of Maturational Constraints and Intrinsic Motivation in Active Motor Development, in Proceedings of IEEE ICDL-Epirob 2011.http://flowers.inria.fr/BaranesOudeyerICDL11.pdf

Lopes, M., Melo, F., Montesano, L. (2009) Active Learning for Reward Estimation in Inverse Reinforcement Learning, European Conference on Machine Learning (ECML/PKDD), Bled, Slovenia, 2009.http://flowers.inria.fr/mlopes/myrefs/09-ecml-airl.pdf

Nguyen, M., Baranes, A., Oudeyer, P-Y. (2011b) Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in Proceedings of IEEE ICDL-Epirob 2011. http://flowers.inria.fr/NguyenBaranesOudeyerICDL11.pdf

Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007) Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2), pp. 265--286.http://www.pyoudeyer.com/ims.pdf

Baranes, A., Oudeyer, P-Y. (2009 )R-IAC: Robust intrinsically motivated exploration and active learning, IEEE Transactions on Autonomous Mental Development, 1(3), pp. 155--169.

Ly, O., Lapeyre, M., Oudeyer, P-Y. (2011) Bio-inspired vertebral column, compliance and semi-passive dynamics in a lightweight robot, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, US.

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , Manuel Lopes, Tobias Lang, Marc Toussaint and Pierre-Yves Oudeyer. Neural Information Processing Systems (NIPS 2012), Tahoe, USA. http://flowers.inria.fr/mlopes/myrefs/12-nips-zeta.pdf

The Strategic Student Approach for Life-Long Exploration and Learning, Manuel Lopes and Pierre-Yves Oudeyer. In Proceedings of IEEE ICDL-Epirob 2012, http://flowers.inria.fr/mlopes/myrefs/12-ssp.pdf