Top Banner
2D1431 Machine Learning Fuzzy Logic & Learning in Robotics
50
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2D1431 Machine Learning

2D1431 Machine Learning

Fuzzy Logic &

Learning in Robotics

Page 2: 2D1431 Machine Learning

Outline

n Fuzzy Logicn Learning Controln Evolutionary Robotics

Page 3: 2D1431 Machine Learning

Types of Uncertainty

n Stochastic uncertaintyn example: rolling a dice

n Linguistic uncertaintyn examples : low price, tall people, young age

n Informational uncertaintyn - example : credit worthiness, honesty

Page 4: 2D1431 Machine Learning

Classical Setyoung = { x ∉ P | age(x) ≤ 20 }

characteristic function:

µyoung(x) =1 : age(x) ≤ 200 : age(x) > 20

A=“young”

x [years]

µyoung(x)

1

0

{

Page 5: 2D1431 Machine Learning

Fuzzy SetClassical Logic

Element x belongs to set Aor it does not:

µ(x)∈{0,1}

A=“young”

x [years]

µA(x)

1

0

Fuzzy Logic

Element x belongs to set Awith a certaindegree of membership:

µ(x)∈[0,1]

A=“young”

x [years]

µA(x)

1

0

Page 6: 2D1431 Machine Learning

Fuzzy Set

Fuzzy Set A = {(x, µA(x)) : x ∈ X, µ A(x) ∈ [0,1]} • a universe of discourse X : 0 ≤ x ≤ 100• a membership function µA : X → [0,1]

A=“young”

x [years]

µA(x)

1

0

µ=0.8

x=23

Definition :

Page 7: 2D1431 Machine Learning

Types of Membership Functions

x

µ(x)1

0 a b c d

Trapezoid: <a,b,c,d>

x

µ(x)1

0

Gaussian: N(m,s)

m

s

x

µ(x)1

0 a b

Singleton: (a,1) and (b,0.5)

x

µ(x)1

0 a b d

Triangular: <a,b,b,d>

Page 8: 2D1431 Machine Learning

The Extension Principle

For arbitrary functions f:µf(A)(y) = max{µA(x) | y=f(x)}

f

x

µA(x)

y

µ f(

A)

(y

)

Assume a fuzzy set A and a function f:How does the fuzzy set f(A) look like?

f

x

µA(x)

y

µ f(

A)

(y

) max

Page 9: 2D1431 Machine Learning

Operators on Fuzzy SetsUnion

x

1

0

µA∧B(x)=min{µA(x),µB(x)}

µA(x) µB(x)

x

1

0

µA∨B(x)=max{µA(x),µB(x)}

µA(x) µB(x)

Intersection

x

1

0

µA∧B(x)=µA(x) • µB(x)

µA(x) µB(x)

x

1

0

µA∨B(x)=min{1,µA(x)+µB(x)}

µA(x) µB(x)

Page 10: 2D1431 Machine Learning

Complement

Negation: µ¬A(x)= 1 - µA(x)

µ¬A∨A(x) ≡ 1µ¬A∧A(x) ≡ 0

Classical law does not always hold:

Example : µA(x) = 0.6 µ¬A(x) = 1 - µA(x) = 0.4µ¬A∨A(x) = max(0.6,0.4) = 0.6 ¹ 1µ¬A∧A(x) = min(0.6,0.4) = 0.4 ¹ 0

Page 11: 2D1431 Machine Learning

Fuzzy Relations

classical relationR : X x Y defined by µR(x,y) = 1 if (x,y) ∈ R

0 if (x,y) ∉ R|{

fuzzy relationR : X x Y defined by µR(x,y) ∈ [0,1]

µR(x,y) describes to which degree x and y are related

It can also be interpreted as the truth value of the proposition x R y

Page 12: 2D1431 Machine Learning

Fuzzy Relations

X = { rainy, cloudy, sunny }

Y = { swimming, bicycling, camping, reading }

X/Y swimming bicycling camping reading

rainycloudy

sunny

0.0 0.2 0.0 1.0

0.0 0.8 0.3 0.3

1.0 0.2 0.7 0.0

Example:

Page 13: 2D1431 Machine Learning

Fuzzy Sets & Linguistic Variables

A linguistic variable combines several fuzzy sets.

linguistic variable : temperaturelinguistics terms (fuzzy sets) : { cold, warm, hot }

x [C]

µ(x)

1

0

µcold µwarm µhot

6020

Page 14: 2D1431 Machine Learning

Fuzzy Rules

n causal dependencies can be expressed in form of if-then-rules

n general form:if <antecedent> then <consequence>

n example:if temperature is cold and oil is cheap

then heating is high

linguistic values/terms (fuzzy sets)linguistic variables

Page 15: 2D1431 Machine Learning

if temperature is cold and oil price is low then heating is high

if temperature is hot and oil price is normal then heating is low

Fuzzy Rule Base

Temperature :cold warm hotOil price:

cheap

normal

expensive

high high medium

high medium low

medium low low

Heating

Page 16: 2D1431 Machine Learning

fuzzy knowledge base

Fuzzy Data-Base:Definition of linguistic input and output variables

Definition of fuzzy membership functions

Fuzzy Knowledge Base

Fuzzy Rule-Base:if temperature is cold and oil price is cheap

then heating is high….

x [C]

µ(x)1

0

µcold µwarm µhot

6020

Page 17: 2D1431 Machine Learning

Fuzzification

t

1

0

µcold(t)=0.5

If temperature is cold ...

15C p

1

0

µcheap(p)=0.3

and oil is cheap ...

$13/barrel

0.5 0.3

Determine degree of membership for each term of an input variable :

temperature : t=15 C oilprice : p=$13/barrel

1. Fuzzification

Page 18: 2D1431 Machine Learning

Fuzzy Combination2. Combine the terms in one degree of fulfillment for the entire

antecedent by fuzzy AND: min-operator

µante = min{µcold(t), µcheap(p)} = min{0.5,0.3} = 0.3

t

1

0

µcold(t)=0.5

if temperatur is cold ...

15C p

1

0

µcheap(p)=0.3

and oil is cheap ...

$13/barrel

0.5 0.3

Page 19: 2D1431 Machine Learning

Fuzzy Inference

3. Inference step: Apply the degree of membership of the antecedent to the consequent of the rule

µhigh(h)

... then heating is high

µconsequent(h)

h

1

0µante =0.3...

h

1

0

µhigh(h)

µante =0.3...

µconsequent(h)

min-inference:µcons. = min{µante , µhigh }

prod-inference:µcons. = µante • µhigh

Page 20: 2D1431 Machine Learning

Fuzzy Aggregation

h

1

0

... then heating is high

... then heating is medium

... then heating is low

4. Aggregation: Aggregate all the rules consequents using the max-operator for union

Page 21: 2D1431 Machine Learning

Defuzzification5. Determine crisp value from output membership function

for example using “Center of Gravity”-method:

h

1

0

µconsequent(h) COG

73

Center of singletons defuzzification:

h = Si mi • Ai • ci

Si mi • Ai

mi = degree of membership fuzzy set iAi = area of fuzzy set ici = center of gravity of fuzzy set i

Page 22: 2D1431 Machine Learning

Schema of a Fuzzy Decision

Fuzzification Inference Defuzzification

t

µcold µwarm µhot

measuredtemperature

0.2

0.7

if temp is coldthen valve is open

if temp is warmthen valve is half

if temp is hotthen valve is close

rule-base

µcold =0.7

µwarm =0.2

µhot =0.0

v

µopen µhalf µclose

crisp outputfor valve-setting

0.2

0.7

Page 23: 2D1431 Machine Learning

Machine vs. Robot Learning Machine Learning Learning in Robotics

Page 24: 2D1431 Machine Learning

Machine vs. Robot Learning

Machine Learningn Learning in vaccumn Statistically well-behaved

datan Mostly off-linen Informative feed-backn Computational time not an

issuen Hardware does not mattern Convergence proof

Robot Learningn Embedded learningn Data distribution not

homegeneousn Mostly on-linen Qualitative and sparse

feed-backn Time is crucialn Hardware is a priorityn Empirical proof

Page 25: 2D1431 Machine Learning

Methods of Robot Learningn Dynamic Programming / Reinforcement Learning:The desired behavior is expressed as an optimization

criterion r to be optimized over a temporal horizon, resulting in a cost function (long term accumulated reward)

J(xt) = Σt r(xt,ut)

n Problem: curse of dimensionality, large state spaces, large amount of exploration

n Idea: modularize control policy

Page 26: 2D1431 Machine Learning

Learning Taskn Learn a task specfic control policy π that maps the

continuous valued state vector s to a continuous valued control action u.

u = π(x,α,t)

Control policyπ(x,α,t)

Robot & environment

Learningsystem

u s

αα

DesiredBehavior

Page 27: 2D1431 Machine Learning

Learning Control with Sub-Policiesn Learn or design sub-policies and subsequently build the

complete policy out of the sub-policies

sub-policy π2

Robot & environment

Learningsystem

u s

DesiredBehavior

sub-policy π1

sub-policy π3

sub-policy π4

Page 28: 2D1431 Machine Learning

Indirect Learning of Control Policiesn Decompose task into planning and execution stage n Planning generates a desired kinematic trajectory n Execution transforms plan into appropriate motor commandn Learn inverse kinematic model for the execution module

Control policy

Robot & environment

Learningsystem

uDesiredBehavior

trajectoryplanning

feedbackcontroller

feedforwardcontroller

ΣΣ

Page 29: 2D1431 Machine Learning

Learning Inverse Models

n Learn inverse kinematic model for feed-forward control

n Kinematic function: x=f(u)n Inverse model: u = f-1(x)n Dynamic model: dx/dt = f(x,u)n Inverse dynamic model: u=g(xdesired,x)

Page 30: 2D1431 Machine Learning

Evolutionary Robotics in a Nutshellpopulation

1001

0011

0100 0110

1101

environment

u=f(s,αα)

0110 → α

evaluationrecombination

mutation

11010110

01 0111 10

selection

fitness( )01101101 0100 0110X

Page 31: 2D1431 Machine Learning

Evolutionary Behavior Design

Evolutionaryalgorithm

Evolutionaryalgorithm Evaluation

schemeEvaluationscheme

EnvironmentEnvironmentRoboticBehaviorRoboticBehavior

control action: a

observed state : s

fitness

observed reward : rbehaviorparameters

genotype

Page 32: 2D1431 Machine Learning

Evolving in Simulation vs. Reality

Simulation Reality• Requires model of thesensors and environment

• Real world is the model

• Brittleness of adapted behaviors

• Robust behaviors

• Identical test cases for all candidate controllers

• Difficult to initialize for a newcontroller under evaluation

• Time-consuming, manual,fitness evaluation

• automated fast fitness evaluation

Page 33: 2D1431 Machine Learning

EnvironmentReal time online evolution in an 200x100cm maze withabout 10-15 minutes per generation

Page 34: 2D1431 Machine Learning

Robot & Sensorsn 6 binary sensors (4 antenna + 2 bumpers)

n 1 rotation sensor

Page 35: 2D1431 Machine Learning

External vs. Internal Fitness External fitness n can not be measured by the robot itself (e.g. location in

world coordinates) n external observer perspectiven useful in simulations

Internal fitnessn directly accessible to the robot by means of sensors (e.g.

sensor readings, battery level)n useful when learning on the real robotn fitness function might be more difficult to design

Page 36: 2D1431 Machine Learning

Functional vs. Behavorial Fitness Functional:n measures directly the way in which the system functions,

observes the causes of a behaviorn Example: learn to generate a desired oscillatory pattern

of leg motion

Behavioral: n Measures the resulting behavior, observes the effects of

the behavior n Example: measure the absolute distance traveled by the

robot using the rotation sensor

Page 37: 2D1431 Machine Learning

Explicit vs. Implicit Fitnessn Explicit:

n Large number of constraintsn Actively steers the evolutionary system towards

desired behaviorsn Problem: weighting and aggregating multiple

constraintsn Implicit:

n Small number of constraintsn Allow evolution of emergent, novel behaviors n Problem: for complex behaviors (e.g. find cylinders,

pick up cylinders and drop them outside the arena) finding an initial behavior is like searching for a needle in the haystack

Page 38: 2D1431 Machine Learning

Behavior Representationn The robot is controlled by the duration and direction of left and

right motor command.n Sensory states :

n s1,…,s6 (26 possible states reduced to 9 different states)

n Control action :

n direction left, right motorn duration of left, right motor action

n Mapping:

n For each of the nine different sensory states, the direction and duration of left and right motor commands are encoded by one byte.

Page 39: 2D1431 Machine Learning

Sensor States to Motor Actions

S3: left bumper

S2: front bumper

S1: no contact

Right motor actionLeft motor actionSensor state

0 [ms] 0 [ms]

50 [ms] 50 [ms]

40 [ms] 70 [ms]

Page 40: 2D1431 Machine Learning

Sensor States to Motor Actions

S6: left antenna inward

S5: left antenna outward

S4: right bumper

Right motor actionLeft motor actionSensor state

30 [ms] 30 [ms]

60 [ms] 60 [ms]

30 [ms]Float 20 [ms]

(if black vertical axle is pressed this state is equivalent to S3)

Page 41: 2D1431 Machine Learning

Sensor States to Motor Actions

S9: left & right

antenna outward

S8: right antennaoutward

S7: right antennainward

Right motor actionLeft motor actionSensor state

60 [ms] 70 [ms]

70 [ms] 40 [ms]

20 [ms] 10 [ms]

(if black vertical axle is pressed this state is equivalent to S4)

Page 42: 2D1431 Machine Learning

Communication between RCX and PC

IR comunication tower

RCX IR port

Environment

Serial link

Host computer

Page 43: 2D1431 Machine Learning

Behavior Evaluation

n The parameters of the robotic behavior are downloaded on the LEGO robot.

n The robot performs behavior for one minute.n The number of rotations of the tracking wheel,

equivalent to the distance traveled is returned as the fitness.

n Based on the fitness the evolutionary algorithm, selects good behaviors and generates new candidate behaviors by means of recombination and mutation.

n Population size 10 individuals, 20 generations, one run of the evolutionary algorithm takes about 3-4 hours

Page 44: 2D1431 Machine Learning

Evolved Behavior

n ..\..\..\Movies\p90913g2.mov

Page 45: 2D1431 Machine Learning

Evolution of a Wall-Following Behavior

n 2 light sensorsn 2 bumpern 1 rotation sensor

Page 46: 2D1431 Machine Learning

Sensor Characteristicn Light sensor readings S1, S2 as a function of the distance to the

obstacle

Page 47: 2D1431 Machine Learning

Behavior Representation and Fitness

n Neural network: ω=f(S1, S2, wij, θi )n Turn rate ω → motor commands

n Genotype encodes: n 7 ANN parameters {wij , θi } : 8 bit/parametern Motor command for collision states left and right bumper n Fitness: absolute distance traveled#rotation

forwardforwardbackward

ω ∆T (1−ω) ∆T

Page 48: 2D1431 Machine Learning

Network Architectures

S1S2

H

ω

wij

Feed-forward network (purely reactive behaviors)

S2

S1

H

ω

ω H S1 S2

X(t)

X(t+1)

Recurrent Network(dynamic behaviors)

Page 49: 2D1431 Machine Learning

Evolved Behavior

..\..\..\Movies\PB251814.MOV

Page 50: 2D1431 Machine Learning

Distance Maximizationn Fitness function contains an additional penalty term for low

proximity to obstacles Si < Smin

without proximity penalty with proximity penalty