Top Banner
MULTI-AGENT REINFORCEMENT LEARNING 1 Sparse Interactions
59

MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

Sep 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

MULTI-AGENT REINFORCEMENT LEARNING

1

Sparse Interactions

Page 2: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

REINFORCEMENT LEARNING• Agent acting in an unknown environment,

learning to maximise a numerical reward signal

a(t)

s(t+1)

r(t+1)

Environment

s(t) s(t+1)a(t)

r(t+1)

s(t+2)

r(t+2)

a(t+1) a(t+2)...

2

Page 3: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

MARKOV DECISION PROCESS• SINGLE AGENT!!!

• States = set of states of the agent• Actions = set of actions the agent can take• Transition function• Reward function

M = �S, A, T,R⇥

S

A

T : S �A⇥ S

R : S ⇥A⇥ S ! R

3

Page 4: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

Q-LEARNING• model-free, reinforcement learning algorithm• Stores Q-values for every state-action pair• Update rule:

Q(s, a) = Q(s, a) + �

�rt + ⇥argmax

a�Q(s�, a�)�Q(s, a)

4

Page 5: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

SIMPLE EXAMPLE

5

Page 6: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

RL WITH BOLTZMANN EXPLORATION

0 200 400 600 800 1000

020

4060

8010

0

episodes

step

s to

goa

l

Agent 1Agent 2

0 1000 2000 3000 4000 5000

010

0020

0030

0040

00

episodes

step

s to

goa

l

Agent 1Agent 2

6

Page 7: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

RL WITH Ε-GREEDY (Ε = 0.9)

0 200 400 600 800 1000

020

4060

8010

0

episodes

step

s to

goa

l

Agent 1Agent 2

0 1000 2000 3000 4000 5000

020

4060

8010

0

episodes

step

s to

goa

l

Agent 1Agent 2

7

Page 8: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

MULTI-AGENT REINFORCEMENT LEARNING

• Agents influence each other• Possibly conflicting interests

Environment

a1

a2 joint action a(t)

...an

r1(t+1)

s(t+1)

s(t+1)

s(t+1)

r2(t+1)

rn(t+1)

joint state s(t+1)reward r(t+1)

8

• Observations• Expensive communication

Page 9: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

MARKOV GAMES

• the number of agents• a finite set of states• with Ak the action set of agent k• the transition function• the reward function of agent k

s(t) s(t+1)

a1(t)a2(t)

...an(t)

r1(t+1)r2(t+1)

...rn(t+1)

...s(t+2)

r1(t+2)r2(t+2)

...rn(t+2)

a1(t+1)a2(t+1)

...an(t+1)

a1(t+2)a2(t+2)

...an(t+2)

9

n

S = s1, . . . , sN

A = A1, . . . , AN

T = S ⇥A1 ⇥ . . .⇥AN ⇥ S ! [0, 1]

Rk = S ⇥A1 ⇥ . . .⇥AN ⇥ S ! R

Page 10: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

SPARSE INTERACTIONS1 agentTransitions & rewards are only dependent on 1 agent

2 agentsFar away and not interacting with each other Transitions & rewards are independent of state/action of other agents

2 agentsClose to each other and interacting!!!i.e. transitions & rewards are dependent

G

G2 G1

G2 G1

10

Page 11: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

SPARSE INTERACTIONS

2 agentsClose to each other and interacting!!!i.e. transitions & rewards are dependentG2 G1

Assumptions:Agents can do something useful aloneInteractions are sparsef.i. Air traffic control, automated warehouses, ...

10

Page 12: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

TAXONOMY BASED ON STRATEGIC INTERACTIONS

Local state Joint state

Independentactions

Joint action(view or selection)

Single agent RL

Nash-Q, CE-Q,...SuperAgent

JAL

MMDP-ILA (Vrancx et al. 2008) MG-ILA (Vrancx et al. 2008)

State and actions must be communicated among agentsState-action space is exponential in the number of agents

11

Page 13: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

TAXONOMY BASED ON STRATEGIC INTERACTIONS

Local state Joint state

Independentactions

Joint action(view or selection)

Single agent RL

Nash-Q, CE-Q,...SuperAgent

JAL

MMDP-ILA (Vrancx et al. 2008) MG-ILA (Vrancx et al. 2008)

Utile Coordination (Kok et al. 2005)

Learning of Coordination (Melo et al. 2009)2Observe (De Hauwere et al. 2009)

CQ-Learning (De Hauwere et al. 2010)FCQ-Learning (De Hauwere et al. 2011)

11

Page 14: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

INTUITION OF SPARSE INTERACTIONS

When should agents observe the state information of other agents to avoid coordination problems?

Can another agent influence me?

Act independently, as if single-agent. Use a multi-agent technique to coordinate.

No Yes

G G2 G1 G2 G1

12

Is there influence from another agent?Is there influence from another agent?

Page 15: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

MODELING INTERACTIONS• Dynamics of the system are a Markov game• Model sparse interactions as a DEC-SIMDP (Melo

et al., 2010)� =

�Mk, (M I,l, SI,l)

{MDP for each agent k in the

absence of other agents (containing local states)

G G2 G1

{

Team Markov game for the local interaction between K agents in L

interaction states (containing system states)

G2 G1

13

Page 16: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

OUTLINE

Learning of Coordination2Observe

CQ-LearningFCQ-Learning

Transfer learning

14

Page 17: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

Learning of Coordination

15

Page 18: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

LEARNING OF COORDINATION• Add Pseudo COORDINATE action• External Active Perception• Cost for coordination

16

Page 19: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

THE ALGORITHM

17

Page 20: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

RESULTS

18

Page 21: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

2Observe

19

Page 22: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

PROBLEM SETTING

• Learn when to act upon sensory input• Adaptive obstacle avoidance• Save energy

20

Page 23: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

INTERACTIONS AS A FUNCTION• State space contains sensor data• Sensor information is only partly relevant• Interaction area is relative to the agent• Special kind of sparse interactions,

modeled as a DEC-LIMDP (Section 4.2)• Ik: Sk → S1 x ... x SM• Approximating this function using a generalized

learning automaton: 2Observe

21

Page 24: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

Can another agent influence me?

Act independently, as if single-agent. Use a multi-agent technique to coordinate.

No Yes

SOLUTION METHOD: 2OBSERVE

22

GLA approximating theInteraction function

Single agent Q-learning selecting actions based on local state information

Communication protocol between the agents to avoid a collision in the next timestep

Page 25: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL SETTING

• Reach goal• Avoid collisions

23

Page 26: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL RESULTS (TUNNELTOGOAL)

0 2000 4000 6000 8000 10000

010

2030

4050

episodes

step

s to

goa

l

Independent Q−learningJoint−state learnersMMDP2Observe

0 2000 4000 6000 8000 10000

02

46

8

episodes

collis

ions

Independent Q−learningJoint−state learnersMMDP2Observe

0 2000 4000 6000 8000 10000

02

46

8

episodes

coor

dina

tions

2Observe coordinations

24

Page 27: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL RESULTS (TUNNELTOGOAL)

0 2000 4000 6000 8000 10000

010

2030

4050

episodes

step

s to

goa

l

Independent Q−learningJoint−state learnersMMDP2Observe

0 2000 4000 6000 8000 10000

02

46

8

episodes

collis

ions

Independent Q−learningJoint−state learnersMMDP2Observe

0 2000 4000 6000 8000 10000

02

46

8

episodes

coor

dina

tions

2Observe coordinations

24

Page 28: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL RESULTS (TUNNELTOGOAL)

0 2000 4000 6000 8000 10000

010

2030

4050

episodes

step

s to

goa

l

Independent Q−learningJoint−state learnersMMDP2Observe

0 2000 4000 6000 8000 10000

02

46

8

episodes

collis

ions

Independent Q−learningJoint−state learnersMMDP2Observe

0 2000 4000 6000 8000 10000

02

46

8

episodes

coor

dina

tions

2Observe coordinations

24

Page 29: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL RESULTS (2) (TUNNELTOGOAL)

• Interactions are relative to the agent• GLA can approximate this interaction area

25

Page 30: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

CQ-Learning

G2 G1

26

Page 31: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

PROBLEM SETTING

• Agents only interact where their policies interfere• Locally adapt policy

G G2 G1 G2 G1

27

Page 32: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

REPRESENTATION IDEA

Expand

Generalise

32

7 98

5

1

4 6

4-1 4-2 4-3 6-1 6-2

28

Page 33: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

Can another agent influence me?

Act independently, as if single-agent. Use a multi-agent technique to coordinate.

No Yes

SOLUTION METHOD: CQ-LEARNING

Statistical test on the rewards

29

Single agent Q-learning selecting actions based on local state information

Q-learning, based on the combination of local state information and the state information of another

agent

Page 34: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

CQ-LEARNING : STATISTICAL TESTS

30

• Agents have been learning alone in the environment

• Agent k acts independently using only local state information (sk) in a multi-agent environment

• Perform statistical test against baseline

• Samples its rewards, based on the state information of other agents & performs the same test

sk1 sk2 sk3 sk411.010.09.010.011.0

20.020.020.019.020.0

15.015.014.815.014.9

10.019.09.020.020.0

... ... ... ...

10.0 20.0 15.0 20.0

20.019.020.0

20.020.020.0

10.0

9.010.0

... ... ...

i

sk ⇒ � sk , sl �4 4 3

sk4

sl1 sl

2 sl3

Expand

Expected reward:

30

Page 35: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

CQ-LEARNING BASELINE FOR STATISTICAL TESTS

0 200 400 600 800 1000

020

4060

8010

0

episodes

step

s to

goa

l

Agent 1Agent 2

31

Page 36: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

CQ-LEARNING BASELINE FOR STATISTICAL TESTS

Initial rewards (sliding window)

for a particular state action pair :

Wk1

Compare Wk against Wk1 2

0 200 400 600 800 1000

020

4060

8010

0

episodes

step

s to

goa

l

Agent 1Agent 2

31

Wk2

Page 37: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL RESULTS (1)Env Alg #states #actions #coll #steps

Grid game 2 Indep 9 4 2.7 22.2± 17.9JS 81 4 0.1 4.0± 0.2

(min steps: 3) JSA 81 16 0.0 4.7± 0.1LOC 9.9± 0.5 5 0.1 4.0± 0.4CQ 10± 0.0 4 0.0 3.6± 0.3

CQ NI 10.9± 2.0 4 0.1 4.0± 0.3

Env Alg #states #actions #coll #steps

ISR Indep 43 4 0.4 9.3± 44.8JS 1849 4 0.1 5.7± 1.6

(min steps: 4) JSA 1849 16 0.0 7.6± 1.4LOC 51.3± 82.3 5 0.2 6.7± 7.5CQ 49.0± 2.3 4 0.1 5.1± 0.7

CQ NI 49.9± 7.8 4 0.1 6.0± 1.9

32

Page 38: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL RESULTS (2)• Sample run

G2 G1

33

Page 39: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

FCQ-Learning

G2 G1

34

Page 40: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

PROBLEM SETTING

• Reflected in immediate reward signal• Too late to solve the problem

1 2 2 1

Reward: +20 Reward: +10

35

Page 41: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

DETECTING RELEVANT STATES

• Changes in reward signal are reflected in the Q-values

0 50 100 150 200 250 300 350 400 450 5000

2

4

6

8

10

12

14

16

18

20

Episodes

��������

(12,4)(13,4)(14,4)(15,4)(10,3)(9,2)(8,2)(7,2)(6,2)(1,3)

36

G

Page 42: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

FCQ-LEARNING STATISTICAL TESTS

37

sk1 sk2 sk3 sk411.110.911.011.111.0

20.019.919.920.020.0

15.015.014.815.014.9

20.018.817.416.115.9

... ... ... ...11.0 20.0 15.0 20.0

20.019.020.0

20.020.020.0

10.0

9.010.0

... ... ...

sk4

sl1 sl

2 sl3

• Agent k has been learning alone, and its Q-values have converged

• Agent k acts independently using only local state information (sk) in a multi-agent environment

• Performs statistical test against the single agent Q-values

• Samples rewards monte carlo and perform a comparison test to determine what information should be included

sk ⇒ � sk , sl �4 4 3

Expand

Learned Q-value:

Page 43: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL RESULTSEnvironment Algorithm #states #actions #collisions #steps reward

Grid game 2 Indep 9 4 2.4± 0.0 22.7± 30.4 �24.3± 35.6JS 81 4 0.1± 0.0 6.3± 0.3 18.2± 0.6

LOC 9.0± 0.0 5 1.8± 0.0 10.3± 2.7 �6.8± 8.0FCQ 19.4± 4.4 4 0.1± 0.0 8.1± 13.9 17.6± 3.7

FCQ NI 21.7± 3.1 4 0.1± 0.0 7.1± 6.9 17.9± 0.7

Environment Algorithm #states #actions #collisions #steps reward

Bottleneck Indep 43 4 n.a. n.a. n.a.JS 1849 4 0.0± 0.0 23.3± 30.8 13.1± 36.1

LOC 54.0± 0.8 5 1.7± 0.6 167.2± 19, 345.1 �157.5± 10, 327.0FCQ 124.5± 32.8 4 0.1± 0.0 17.3± 1.3 16.6± 0.4

FCQ NI 135.0± 88.7 4 0.2± 0.0 19.2± 5.6 15.4± 2.3

38

Page 44: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

EXPERIMENTAL RESULTS

• Order to reach the goal:• Red Agent• Blue Agent• Green Agent

+20+20+20

39

Page 45: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

Transfer LearningGeneralized learning

automaton

Single agent Q-learning Coordination through communication

Coordination is not needed

Coordination is needed

Generalized learning automaton

Single agent Q-learning Coordination through communication

Source agent Target agent

2Observe algorithm 2Observe algorithm

Coordination is not needed

Coordination is needed

40

Page 46: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

TRANSFER LEARNING

“Transfer of learning occurs when learning in one context enhances (positive transfer) or undermines (negative transfer) a

related performance in another context.”

(D. Perkins, G. Salomon, Transfer of Learning, 1992, International Encyclopedia of Education)

41

Page 47: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

MOTIVATIONS FOR TRANSFER LEARNING

• Learning tabula rasa can be extremely slow• Lots of data / time may be needed• Every algorithm has biases: why use an

uninformed bias?• Humans always use past knowledge

• What knowledge is relevant?• How can it be effectively leveraged?

42

Page 48: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

TRANSFER LEARNING WITH 2OBSERVE

43

Generalized learning automaton

Single agent Q-learning Coordination through communication

Coordination is not needed

Coordination is needed

Generalized learning automaton

Single agent Q-learning Coordination through communication

Source agent Target agent

2Observe algorithm 2Observe algorithm

Coordination is not needed

Coordination is needed

Can another agent influence me?

Act independently, as if single-agent. Use a multi-agent technique to coordinate.

No Yes

Is there influence from another

Single agent Q-learningselecting actions based on

local state information

Communication protocolbetween the agents to avoid

a collision in the next timestep

GLA approximating theInteraction function

Page 49: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

RESULTS

0 2000 4000 6000 8000 10000

05

1015

20

Iterations

# st

eps

to g

oal

Agent 1Agent 2Agent 3

0 2000 4000 6000 8000 10000

05

1015

2025

Iterations

# st

eps

to g

oal

Agent 1Agent 2Agent 3

0 2000 4000 6000 8000 10000

05

1015

2025

Iterations#

step

s to

goa

l

Agent 1Agent 2Agent 3

44

Page 50: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

RESULTS (COORDINATION)

0 2000 4000 6000 8000 10000

02

46

810

12

Iterations

# C

ollis

ions

/Coo

rdin

atio

ns

CollisionsCoordinations

0 2000 4000 6000 8000 10000

02

46

810

Iterations

# C

ollis

ions

/Coo

rdin

atio

ns

CollisionsCoordinations

0 2000 4000 6000 8000 10000

02

46

810

Iterations#

Col

lisio

ns/C

oord

inat

ions

CollisionsCoordinations

45

Page 51: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

GENERALISATION WITH CQ-LEARNING

Neural network

Δ(x)

Δ(y)

a1

a2

0 | 1

32

7 98

5

1

4 6

4-1 4-2 4-3 6-1 6-2

Expand

Generalise

Generalisation learned with 2Observe

Local state space

Page 52: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

GENERALISATION WITH CQ-LEARNING

Neural network

Δ(x)

Δ(y)

a1

a2

0 | 1

32

7 98

5

1

4 6

4-1 4-2 4-3 6-1 6-2

Expand

Generalise

Generalisation learned with CQ-learning

Local state space

Page 53: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

GENERALISATION WITH CQ-LEARNING (2)

Safe initialisation Danger initialisation

47

Page 54: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

GENERALISATION WITH CQ-LEARNING (2)

EASTWEST

NORTH

SOUTH

48

Page 55: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

TRANSFER LEARNING WITH CQ-LEARNING

Augment

32

7 98

5

1

4 6

32

7 98

5

1

4 6

State space Agent k State space Agent l

Generalise

Rule learningsystem (Ripper)

Transfer trained classi er

+ Qaug-table

Source task

CQ-learning

Target task

Trained classi er

Single agent Q-learningQ-learning initialised

with Qaug-table

Coordination is not needed

Coordination is needed

49

Page 56: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

TRANSFER LEARNING WITH CQ-LEARNING (2)

50

Page 57: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

RESULTS

0 200 400 600 800 1000

050

100

150

200

250

300

episodes

step

s to

goa

l

CQ−learningTransfer learning

0 200 400 600 800 1000

050

100

150

200

250

300

episodesst

eps

to g

oal

CQ−learningTransfer learning

0 200 400 600 800 1000

050

100

150

200

250

300

episodes

step

s to

goa

l

CQ−learningTransfer learning

51

Page 58: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

RESULTS (2)

0 200 400 600 800 1000

0.0

0.5

1.0

1.5

2.0

episodesco

llisio

ns

CQ−learningTransfer learning

0 200 400 600 800 1000

0.0

0.5

1.0

1.5

2.0

episodes

collis

ions

CQ−learningTransfer learning

0 200 400 600 800 1000

0.0

0.5

1.0

1.5

2.0

episodes

collis

ions

CQ−learningTransfer learning

52

Page 59: MULTI-AGENT REINFORCEMENT LEARNINGCQ-LEARNING : STATISTICAL TESTS 30 • Agents have been learning alone in the environment • Agent k acts independently using only local state information

CONCLUSIONS• In multi-agent environments with sparse interactions,

learning these interaction states improves the learning process

• Interaction states can be learned through increased penalties for miscoordination

• GLA can approximate interaction areas relative to the agent• Interaction states can be identified using statistical tests on

the reward signal (immediate + future)• Information about interaction states can be generalized and

transferred between agents and environments

53