Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Developmental Mechanisms for Life-Long Autonomous Learning

in RobotsPierre-Yves Oudeyer

Project-Team INRIA-ENSTA-ParisTech FLOWERS

http://www.pyoudeyer.comhttp://flowers.inria.fr

Sensorimotor and social learning:

•Autonomous

•Open, « life-long learning »

•Real world, physical and social Experimental validation

Developmental robotics

• Intrinsic Motivation

• Maturation

• Imitation, Social guidance

Fundamental understanding of the mechanisms of development

Application to assistive robotics

“Engineered” robot learning

• Engineer shows, with fixed interaction protocol in the lab:

• Target:

Regression algorithms (e.g. LGP, LWPR, Gaussian Mixture Regression)

ActionState/context

Actionpolicy

• Engineer provides a reward/fitness function:

• Target:

Optimization algorithms (e.g. NAC, non-linear Nelder-Mead, …)

OROR

« Real » world

Developmental approach

Which generic reward function for spontaneous curiosity

driven learning?

Axe 2

?

Behaviour of human (non-engineer)

?

Axe 1

Learning from interactions with non-engineers

Non-engineer human behaviour

?

1. Intuitive multimodal interfaces• Synthesis and recognition of

emotion in speech (IJHCS, 2001, 5 patents)

• Clicker-training (RAS, 2002; 1 patent)

• Physical human-robot interfaces (Humanoids 2011)

• User studies (Humanoids 2009, HRI 2011)

• Adaptation: learning flexible teaching interfaces (Conn. Sci., 2006, ICDL 2011, IROS 2010)

Spontaneous active exploration, artificial curiosity

in the vicinity of

Non-stationary function, difficult to model

Algorithms for empirical evaluation of de/dt with statistical regression

IAC (2004, 2007), R-IAC (2009), SAGG-RIAC (2010)

McSAGG-RIAC (2011), SGIM (2011)

Non !

Intrinsic MotivationBerlyne (1960), Csikszentmihalyi (1996)Dayan and Belleine (2002)

Quelle fonction de récompense générique

?

Exploring and learning generalized forward and inverse models

Parameterized byParameterized by

simple

complexe

complexe

simple

complexe

complexe

Explore zones where:•Uncertainty/errors maximal•Least exploredAssume:•Spatial or temporal stationarity•Everything is learnable within lifetime

Which experiment ?

Developmental approach

Explore zones where empirically learning progress is maximal

Active learning of models

Sensori state

Sensori state

Actionstate

Actionstate

Contextstate

Contextstate

Classic machine learnerM

(e.g. neural net, SVM, Gaussian process)

Classic machine learnerM

(e.g. neural net, SVM, Gaussian process)

Meta machine learnermetaM

Progressive categorization

Local model of learning progress

. . .

Sensori state at t+1

Prediction

Error feedback

Action selection system

Action selection system Intrinsic reward

Local model of learning progress

IAC, IEEE Trans. EC (2007)R-IAC, IEEE Trans. AMD (2009)

The Playground Experiments

(IEEE Trans. EC 2007; Connection Science 2006; AAAI Work. Dev. Learn. 2005)

Experimentations on Open Learning in the Real World

Playground Experiments

•Autonomous learning of novel affordances and and skills, e.g. object manipulation

IEEE TEC, 2007; IROS 2010; IEEE TAMD, 2009; Front. Neurorobotics, 2007; Connect. Sc., 2006; IEEE ICDL 2010, 2011

simple

complex

complex

• Self-organization of developmental trajectories, bootstrapping of communication New hypotheses for understanding infant development

Front. Neuroscience 2007, Infant and Child Dev. 2008, Connect. Science 2006

Active learning of inverse models SAGG-RIAC (RAS, 2012)

(Context, Movement)

Effect

Redundancy of sensorimotor spaces

From the active choice of action, followed by observation of effect …

… to the active choice of effect, followed by the search of a corresponding action policy through goal-directed optimization (e.g. using NAC, POWER, PI^2-CMA, …)

self-defined RL problem

Spontaneous active exploration of a space of fitness functions parameterized by where one iteratively chooses the which maximizes the empirical evaluation of:

Apprentissage de la locomotion omnidirectionnelle

Performance higher than more classical active learning algorithms in real sensorimotor spaces (non-stationary, non homogeneous) (IEEE TAMD 2009; ICDL 2010, 2011; IROS 2010; RAS 2012)

Experimental evaluation of active learning efficiency

Control Space: Task Space:

Maturational constraints

• Progressive growths of DOF number and spatio-temporal resolution

• Adaptive maturational schedule controlled by active learning/learning progress

(Bjorklund, 1997; Turkewitz and Kenny, 1985)

McSAGG-RIACMaturationally constrained curiosity-driven learning

(IEEE ICDL-Epirob 2011a)

SGIM: Socially Guided Intrinsic Motivation

(ICDL-Epirob, 2011b)

« Life-long » Experimentation

Acroban(Siggraph 2010, IROS 2011, World Expo, South Korea, 2012)

• Experimentation of algorithms for « life-long » learning in the real world

Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap

Ergo-Robots(Exhibition « Mathematics, a beautiful elsewhere »,Fond. Cartier, 2011-2012)

• Experimentation of algorithms for « life-long » learning in the real world

Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap

Ergo-Robots

Mid-term: open-source distribution of the platform to the scientific community

« Life-long » Experimentation

Baranes, A., Oudeyer, P-Y. (2012) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots, Robotics and Autonomous Systems.http://www.pyoudeyer.com/RAS-SAGG-RIAC-2012.pdf

Baranes, A., Oudeyer, P-Y. (2011a) The Interaction of Maturational Constraints and Intrinsic Motivation in Active Motor Development, in Proceedings of IEEE ICDL-Epirob 2011.http://flowers.inria.fr/BaranesOudeyerICDL11.pdf

Lopes, M., Melo, F., Montesano, L. (2009) Active Learning for Reward Estimation in Inverse Reinforcement Learning, European Conference on Machine Learning (ECML/PKDD), Bled, Slovenia, 2009.http://flowers.inria.fr/mlopes/myrefs/09-ecml-airl.pdf

Nguyen, M., Baranes, A., Oudeyer, P-Y. (2011b) Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in Proceedings of IEEE ICDL-Epirob 2011. http://flowers.inria.fr/NguyenBaranesOudeyerICDL11.pdf

Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007) Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2), pp. 265--286.http://www.pyoudeyer.com/ims.pdf

Baranes, A., Oudeyer, P-Y. (2009 )R-IAC: Robust intrinsically motivated exploration and active learning, IEEE Transactions on Autonomous Mental Development, 1(3), pp. 155--169.

Ly, O., Lapeyre, M., Oudeyer, P-Y. (2011) Bio-inspired vertebral column, compliance and semi-passive dynamics in a lightweight robot, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, US.

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , Manuel Lopes, Tobias Lang, Marc Toussaint and Pierre-Yves Oudeyer. Neural Information Processing Systems (NIPS 2012), Tahoe, USA. http://flowers.inria.fr/mlopes/myrefs/12-nips-zeta.pdf

The Strategic Student Approach for Life-Long Exploration and Learning, Manuel Lopes and Pierre-Yves Oudeyer. In Proceedings of IEEE ICDL-Epirob 2012, http://flowers.inria.fr/mlopes/myrefs/12-ssp.pdf

http://www.pyoudeyer.com/RAS-SAGG-RIAC-2012.pdf

http://flowers.inria.fr/BaranesOudeyerICDL11.pdf

http://flowers.inria.fr/mlopes/myrefs/09-ecml-airl.pdf

Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS .

Documents

social learning

open learning

assistive robotics slide

lifelong learning real

developmental mechanisms

ieee icdl

ieee tamd

playground experiments