EE141 Motivated Learning based Motivated Learning based on Goal Creation on Goal Creation Janusz Starzyk School of Electrical Engineering and Computer Science, Ohio University, USA www.ent.ohiou.edu/~starzyk Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 4 December 2009.
34
Embed
EE141 Motivated Learning based on Goal Creation Janusz Starzyk School of Electrical Engineering and Computer Science, Ohio University, USA starzyk.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
EE141
Motivated Learning based on Motivated Learning based on Goal CreationGoal CreationJanusz StarzykSchool of Electrical Engineering and Computer Science, Ohio University, USA
www.ent.ohiou.edu/~starzyk
Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 4 December 2009.
EE141
Embodied Intelligence (EI) Embodiment of Mind How to Motivate a Machine Goal Creation Hierarchy GCS Experiment Motivated Learning
OutlineOutline
EE141
Design principles of intelligent systemsDesign principles of intelligent systemsfrom Rolf Pfeifer “Understanding of Intelligence”, 1999
Drawing by Ciarán O’Leary- Dublin Institute of Technology
EE141
Embodied Intelligence Embodied Intelligence
– Mechanism: biological, mechanical or virtual agent
with embodied sensors and actuators– EI acts on environment and perceives its actions– Environment hostility is persistent and stimulates EI to act– Hostility: direct aggression, pain, scarce resources, etc– EI learns so it must have associative self-organizing memory– Knowledge is acquired by EI
Definition
Embodied Intelligence (EI) is a mechanism that learns how to survive in a hostile environment
EE141
Embodiment
Actuators
Sensors
Intelligence core
channel
channel
Embodiment
Sensors
Intelligence core
Environment
channel
channelActuators
Embodiment
Actuators
Sensors
Intelligence core
channel
channel
Embodiment
Sensors
Intelligence core
Environment
channel
channelActuators
Embodiment of a MindEmbodiment of a Mind Embodiment is a part of environment under control
of the mind It contains intelligence core and sensory motor
interfaces to interact with environment It is necessary for development of intelligence It is not necessarily constant
EE141
Changes in embodiment modify brain’s self-determination
Brain learns its own body’s dynamics
Self-awareness is a result of identification with own embodiment
Embodiment can be extended by using tools and machines
Successful operation is a function of correct perception of environment and own embodiment
Embodiment of MindEmbodiment of Mind
EE141
How to Motivate a Machine ?How to Motivate a Machine ?
A fundamental question is what motivates an agent to do anything, and in particular, to enhance its own complexity?
What drives an agent to explore the environment and learn ways to effectively interact with it?
EE141
How to Motivate a Machine ?How to Motivate a Machine ? Pfeifer claims that an agent’s motivation should emerge
from the developmental process. He called this the “motivated complexity” principle. Chicken and egg problem? An agent must have a motivation to
develop while his motivation comes from development?
Steels suggested equipping an agent with self-motivation. “Flow” experienced when people perform their expert activity well
would motivate to accomplish even more complex tasks. But what is the mechanism of “flow”?
Oudeyer proposed an intrinsic motivation system. Motivation comes from a desire to minimize the prediction error. Similar to “artificial curiosity” presented by Schmidhuber.
EE141
How to Motivate a Machine ?How to Motivate a Machine ?
Exploration is needed in order to learn and to model the
environment. But is exploration the only motivation we need to develop EI? Can we find a more efficient mechanism for learning?
I suggest a simpler mechanism to motivate a machine.
Although artificial curiosity helps to explore the environment, it leads to learning without a specific purpose. It may be compared to exploration in
reinforcement learning.
EE141
How to Motivate a Machine ?How to Motivate a Machine ? I suggest that it is the hostility of the environment, in the
definition of EI that is the most effective motivational factor. It is the pain we receive that moves us. It is our intelligence determined to reduce this pain that motivates us
to act, learn, and develop.
Both are needed - hostility of the environment and
intelligence that learns how to reduce the pain. Thus pain is good. Without pain we would not be motivated to develop.
Fig. englishteachermexico.wordpress.com/
EE141
Motivated Learning Motivated Learning I suggest a goal-driven mechanism to motivate
a machine to act, learn, and develop. A simple pain based goal creation system. It uses externally defined pain signals that are
associated with primitive pains. Machine is rewarded for minimizing the primitive
pain signals.
Definition: Motivated learning (ML) is learning based on the self-organizing system of goal creation in embodied agent. Machine creates abstract goals based on the primitive pain signals. It receives internal rewards for satisfying its goals (both primitive and
abstract). ML applies to EI working in a hostile environment.
EE141
Pain-center and Goal CreationPain-center and Goal Creation
Sensory-motor pairs and their effect on the environment
PAIR #SENSORY MOTOR INCREASES DECREASES
1 Food Eat sugar level food supplies
8 Grocery Buy food supplies money at hand
15 Bank Withdraw money at hand spending limits
22 Office Work spending limits
job opportunities
29 School Study job opportunities
-
EE141
Goal Creation Experiment in MLGoal Creation Experiment in ML
Pain signals in GCS simulation
0 100 200 300 400 500 6000
1
Primitive Hunger
Pa
in
0 100 200 300 400 500 6000
0.5
Lack of Food
Pa
in
0 100 200 300 400 500 6000
0.5
Empty Gorcery
Pa
in
Discrete time
EE141
Goal Creation Experiment in MLGoal Creation Experiment in ML
Action scatters in 5 GCS simulations
0 100 200 300 400 500 6000
5
10
15
20
25
30
35
40Goal Scatter Plot
Go
al I
D
Discrete time
EE141
Goal Creation Experiment in MLGoal Creation Experiment in ML
The average pain signals in 100 GCS simulations
0 100 200 300 400 500 6000
0.5
Primitive Hunger
Pai
n
0 100 200 300 400 500 6000
0.10.2
Lack of FoodP
ain
0 100 200 300 400 500 6000
0.10.2
Empty Gorcery
Pai
n
0 100 200 300 400 500 6000
0.10.2
Lack of Money
Pai
n
0 100 200 300 400 500 6000
0.050.1
Lack of JobOpportunitites
Pai
n
Discrete time
EE141
Compare RL (TDF) and ML (GCS)Compare RL (TDF) and ML (GCS)
Mean primitive pain Pp value as a function of the number of iterations:
- green line for TDF - blue line for GCS.
Primitive pain ratio with pain threshold 0.1
EE141
Comparison of execution time on log-log scale TD-Falcon green GCS blue
Combined efficiency of GCS 1000 better than TDF
Compare RL (TDF) and ML (GCS)Compare RL (TDF) and ML (GCS)
Problem solved
Conclusion: embodied intelligence, with motivated learning based on goal creation is an effective learning and decision making system for dynamic environments.
EE141
Reinforcement LearningReinforcement Learning Motivated Learning Motivated Learning Single value function Measurable rewards
Can be optimized
Predictable Objectives set by
designer Maximizes the reward
Potentially unstable
Learning effort increases with complexity
Always active
Multiple value functions One for each goal
Internal rewards Cannot be optimized
Unpredictable Sets its own objectives Solves minimax problem
Always stable
Learns better in complex environment than RL
Acts when needed
EE141
Sounds like science fictionSounds like science fiction
If you’re trying to look far ahead, and what you see seems like science fiction, it might be wrong.
But if it doesn’t seem like science fiction, it’s definitely wrong.
From presentation by Feresight Institute
EE141
Questions?Questions?
EE141From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
Resources – Evolution of ElectronicsResources – Evolution of Electronics
EE141 By Gordon E. MooreBy Gordon E. Moore
EE141
EE141From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
Clock Speed (doubles every 2.7 years)
EE141
Doubling (or Halving) timesDoubling (or Halving) times
Dynamic RAM Memory “Half Pitch” Feature Size 5.4
years Dynamic RAM Memory (bits per dollar) 1.5
years Average Transistor Price 1.6 years Microprocessor Cost per Transistor Cycle 1.1
years Total Bits Shipped 1.1
years Processor Performance in MIPS 1.8
years Transistors in Intel Microprocessors 2.0 years Microprocessor Clock Speed 2.7
yearsFrom Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
EE141From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
EE141From Hans Moravec, Robot, 1999
EE141
Software or hardware?Software or hardware?
Sequential Error prone Require programming Low cost Well developed