Study on Genetic Network Study on Genetic Network Programming (GNP) with Programming (GNP) with Learning and Evolution Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field Graduate School of Information, Production and S ystems Waseda University
44
Embed
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Study on Genetic Network Study on Genetic Network Programming (GNP) with Programming (GNP) with Learning and EvolutionLearning and Evolution
• Solutions (programs) are represented by genes• The programs are evolved (changed) by selection, crossover and mutation
Structure of GNPStructure of GNP
Graph structure
0 0 3 4
0 1 1 6
0 2 5 7
1 0 8 0
1 0 0 4
1 5 1 2
… … … …
gene structure
• GNP represents its programs using directed graph structures.• The graph structures can be represented as gene structures.• The graph structure is composed of processing nodes and judgment nodes.
Khepera robotKhepera robot
• Khepera robot is used for the performance evaluation of GNP
obstacle
sensorFar from obstacles
Close to obstaclesClose to zero
Close to 1023
Sensor value
wheel
Speed of the right wheel VR
Speed of the left wheel VL
-10 (back) ~ 10 (forward)
-10 (back) ~ 10 (forward)
Node functionsNode functionsProcessing node
Judgment node
Each node determines an agent action
Each node selects a branch based on the judgment result
Set the speed of the right wheel at 10
Ex) khepera robot behavior
Judge the value of sensor 1
500 or more
Less than 500
An example of node transitionAn example of node transition
Judge sensor 1
Judge sensor 5
Set the speed of the right wheel at 5
The value is 700 or more
The value is less than 700
80 or more
Less than 80
Generate an initial population (initial programs)
Task executionReinforcement Learning
EvolutionSelection / Crossover / Mutation
Last generation
one generation
Flowchart of GNPFlowchart of GNP
stop
start
Evolution of GNPEvolution of GNPselection
Select good individuals (programs) from the population based on their fitness
Fitness indicates how much each individual achieves a given task
used for crossover and mutation
・・・
GNP population
Evolution of GNPEvolution of GNPcrossover
Some nodes and their connections are exchanged.
Individual 1 Individual 2
mutation
Change connections
Change node function
Speed of Right wheel: 5
Speed of Left wheel: 10
The role of LearningThe role of LearningExample)
Set the speed of the right wheel at 10
Collision!
Judge sensor 0
1000 or more
Less than 1000
1000 is changed to 500 in order to judge obstacle sensitively
Judgment node
10 is changed to 5 not to collide with the obstacle
Processing nodeNode parameters are changed by reinforcement learning
The aim of combining The aim of combining evolution and learningevolution and learning
• create efficient programs• search for solutions faster
Evolution uses many individuals and better ones are selected after task execution
Learning uses one individuals and better action rules can be determined during task execution
VI SimulationVI Simulation• Wall-following behavior
1. All the sensor values must not be more than 1000
2. At least one sensor value is more than 100
3. Move straight 4. Move fast
Simulation environment
Ctvtvtvtv
(t) LRLR
20
)()(1
20
)()(Reward
1000/Rewardfitness1000
1
t
(t)
: If the condition 1 and 2 is satisfied
0
1C
: otherwise
Node functionsNode functions
Processing node (2 kinds) Judgment node (8 kinds)
Determine the speed of right wheelDetermine the speed of left wheel
Judge the value of sensor 0
Judge the value of sensor 7
.....
0
0.2
0.4
0.6
0.8
0 200 400 600 800 1000
Simulation resultSimulation result
• conditions– The number of
individuals: 600
– The number of nodes: 34
• Judgement nodes: 24• Processing nodes: 10
fitn
ess
generation
GNP with learning and evolution
Standard GNP (GNP with evolution)
fitness curves of the best individuals averaged over 30 independent simulations
start
Track of the robot
startstart
Simulations in the Simulations in the inexperienced environmentsinexperienced environmentsSimulation on the generalization ability
The robot can show the wall-following behavior.
The best program obtained in the previous environment
Execute in the inexperienced environment
VII ConclusionVII Conclusion
• The algorithm of GNP using evolution and reinforcement learning is proposed.
– From the simulation results, the proposed method can learn wall-following behavior well.
• Future work
– Apply GNP with evolution and reinforcement learning to real world applications
• Elevator control system• Stock trading model
– Compare with other evolutionary algorithms
VIII other simulationsVIII other simulations
Example of tileworld
wall
floor
tile
agent
Agent can push a tile and drop it into a hole.
The aim of agent is to drop tiles into holes as many as possible.
tileworld
hole
Fitness = the number of dropped tilesReward rt = 1 (when dropping a tile into a hole)
Node functionsNode functions
Processing node Judgement node
go forward
turn right
turn left
stay
What is in the forward cell ? (floor, tile, hole, wall or agent) backward cell left cell right cell the direction of the nearest tile (forward, backward, left, right or nothing) the direction of the nearest hole the direction of the nearest hole from the nearest tile the direction of the second nearest tile
Example of node transitionExample of node transition