Top Banner
Adaptive Routing Rienforcement Learning Approaches
30

Adaptive Routing Rienforcement Learning Approaches.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Adaptive Routing Rienforcement Learning Approaches.

Adaptive Routing

Rienforcement Learning Approaches

Page 2: Adaptive Routing Rienforcement Learning Approaches.

Contents

Routing Protocols Reinforcement Learning Q-Routing PQ-Routing Ant Routing Summary

Page 3: Adaptive Routing Rienforcement Learning Approaches.

Routing Classification

Centralized Distributed

A Main controller updates all node’s routing tables.

Fault Tollerent.

Suitable for small networks.

Route computation shared among nodes by exchanging.

Widley used.

Page 4: Adaptive Routing Rienforcement Learning Approaches.

Routing Classification…

Static Adaptive

Routing based only on source and destination.

Current network state - Ignored

Adapt policy to time and trafic.

More attractive.

Ossilations in path.

Page 5: Adaptive Routing Rienforcement Learning Approaches.

Routing Classification Based on Optimization.

Minimal Non-Minimal

Shortest Path Optimal

DV LS

Page 6: Adaptive Routing Rienforcement Learning Approaches.

Short falls of Static Routing

Dynamic networks are subjected to the following changes.Topologies changes, as nodes are added

and removedTraffic patterns change cyclicallyOverall network load changes

So, routing algorithms that assume that the network is static don’t work in this setting

Page 7: Adaptive Routing Rienforcement Learning Approaches.

Tackling Dynamic Networks

Periodic Updates? Routing traffic? When to update?

Adaptive Routing’s the Answer?

Page 8: Adaptive Routing Rienforcement Learning Approaches.

Reinforcement Learning

Agent Playing against a player- Chess and Tic-Tac-ToeLearning a Value Function

Page 9: Adaptive Routing Rienforcement Learning Approaches.

Learning Value Function

For K = 0.4 We have V(e) = 0.5 + 0.2 = 0.7

1

0.5

Temporal DifferenceV(e) = V(e) + K [ V(g) – V(e) ]

Exploration Vs Exploitatione and e*

Page 10: Adaptive Routing Rienforcement Learning Approaches.

Rienfocement Learning -Networks

RR

S

R R R R

D

V(s) = V(s) + K [ V(s’) – V(s) ]

Page 11: Adaptive Routing Rienforcement Learning Approaches.

Q-Routing Qx(d, y) is the time that node x estimates it will take to

deliver a packet to node d through its neighbor y

When y receives the packet, it sends back a message (to node x), containing its (i.e. y’s) best estimate of the time remaining to get the packet to d, i.e. t = min(Qy(d, z)) over all z neighbors( y )

x then updates Qx(d, y) by: [Qx(d, y)]NEW = [Qx(d, y)]OLD + K.(s+q+t - [Qx(d,y)]OLD )

Where • s = RTT from x to y• q = Time spent in queue at x• T = new estimate by y

DQ

Page 12: Adaptive Routing Rienforcement Learning Approaches.

Q-Routing…

x

y

to d

min(Qy(d, zi)) = 13;RTT = s = 11

messageQy(d, z1) = 25

21wd

d y 20

Dest Next Qx(d, y)

Qy(d, z2) = 17

Qy(d, ze) = 70

[Qx(d, y)] += (0.25).[(11+17) - 20 ]22

w

estimated RTT = 3

message to d

Page 13: Adaptive Routing Rienforcement Learning Approaches.

Results

Page 14: Adaptive Routing Rienforcement Learning Approaches.

Short falls

Shortest path algorithm – better than Q Routing under low load.

Failure to converge back to shortest paths when network load decreases again.

Failure to explore new shortcuts

Page 15: Adaptive Routing Rienforcement Learning Approaches.

Short falls…

x

y

to dmessageQy(d, z1) = 25

Dest Next Qx(d, y)

d y 22

d w 21

Qy(d, z2) = 17

Qy(d, ze) = 70

w

message to d

20 Even if route via y reduces later,It never gets used untill route viaW gets cunjusted

Page 16: Adaptive Routing Rienforcement Learning Approaches.

Predictive Q-Routing

DQ = s+q+t - [Qx(d,y)]OLD

[Qx(d, y)]NEW = [Qx(d, y)]OLD + K.DQ Bx(d,y) = MIN[Bx(d,y), Qx(d,y)]

If(DQ < 0) //Path is improving DR = DQ/(currentTime – lastUpdatedTime) Rx(d,y) = Rx(d,y) + B.DR //Decrease in R

Else Rx(d,y) = G.Rx(d,y) //Increase of R

End If

lastUpdatedTime = currentTime

Page 17: Adaptive Routing Rienforcement Learning Approaches.

PQ-Routing Policy…

Finding neighbour y

For each neighbour y of x Dt = currentTime – lastUpdatedTime Qx-pred(d,y) = Qx(d,y) + Dt.Rx(d,y) Choose y with MIN[Qx-pred(d,y)]

Page 18: Adaptive Routing Rienforcement Learning Approaches.

PQ-Routing Results

Performs better than Q-Routing under low, high and varying network loads.

Adapts faster if “probing inactive paths” for shortcuts introduced.

Under high loads, behaves like Q-Routing. Uses more memory than Q-Routing.

Page 19: Adaptive Routing Rienforcement Learning Approaches.

Comparision – Low Load

Page 20: Adaptive Routing Rienforcement Learning Approaches.

Ant- Routing Stigmergy - Inspirations From Nature…

Sorts brood and food items Explore particular areas for food, and preferentially

exploits the richest available food source Cooperates in carrying large items Leaves pheromones on their way back Always finds the shortest paths to their nests or food

source Are blind, can not foresee future, and has very limited

memory

Page 21: Adaptive Routing Rienforcement Learning Approaches.

Ants Each router x in the network maintains for each

destination node d a list of the form: <d, <y1, p1>, <y2, p2>, …, <ye, pe>>, where y1, y2, …, ye are the neighbors of x, and p1 + p2 + …+ pe = 1

This is a parallel (multi-path) routing scheme

This also multiplies the number of degrees of freedom the system has by a factor of |E|

Page 22: Adaptive Routing Rienforcement Learning Approaches.

Ants…

Every destination host hd periodically generates an “ant” to a random source host hs

An “ant” is a 3-tuple of the form: < hd, hs, cost>

cost is a counter of the cost of the path the ant has covered so far

Page 23: Adaptive Routing Rienforcement Learning Approaches.

Ant Routing Example

1

3

2

4

0

Next Node 0 2 3 0 0.33 0.33 0.33 2 0.33 0.33 0.33 3 0.26 0.48 0.26

Dest. Node

4 0.33 0.33 0.33

Routing Table for 1

< 4,0,cost >

Page 24: Adaptive Routing Rienforcement Learning Approaches.

Ants: Updation

When a router x receives an ant < hd, hs, cost> from neighbor yi, it:

1. Updates cost by the cost of traversing the link from x to yi (i.e. the cost of the link in reverse)

2. Updates entry for host (<hd, <y1, p1>, <y2, p2>, …, <ye, pe>>)

pi = pi + p

p = k / cost, for some k

for j i, pj = pj

1 + p 1 + p

normalizing sum of probabilities to 1

Page 25: Adaptive Routing Rienforcement Learning Approaches.

Ants: Propagation

Two sub-species of ant: Regular Ants:

P( ant sent to yi ) = pi

Uniform Ants:

P( ant sent to yi ) = 1 / e

Regular ants use learned tables to route ants

Uniform ants explore randomly

Page 26: Adaptive Routing Rienforcement Learning Approaches.

Ants: Comparision

Regular Ants Uniform AntsExplore best paths very thoroughly; others hardly at all

Explore all paths equally

Propagate “bad news” extremely quickly, “good news” extremely slowly

Propagate “good” and “bad” news equally fast

Tends to find shortest paths Natural parallel (multi-path) routers

Converges to Q-Routing in a static network

Does not converge to Q-Routing

Page 27: Adaptive Routing Rienforcement Learning Approaches.

Q-Routing vs. Ants

Q-Routing only changes its currently selected route when the cost of that route increases, not when the cost of an alternate route decreases

Q-Routing involves overhead linear in the volume of traffic in the network; ants are effectively free in moderate traffic

Q-Routing cannot route messages by parallel paths; uniform ants can

Page 28: Adaptive Routing Rienforcement Learning Approaches.

Ants with Evoperation

Evaporation is a real life scenario - Where pheromone laid by real ants evaporates.

Link usage statistics are used to evaporate (E(x)).

It is the proportion of number of ants from node x over the total ants received by the current node.

)()(_

)(_1)(

0

xxPisendant

xsendantxE N

i

xixEiPiP ),()()(

xiN

xEiPiP

,

1

)()()(

Page 29: Adaptive Routing Rienforcement Learning Approaches.

Summary

Routing algorithms that assume a static network don’t work well in real-world networks, which are dynamic

Adaptive routing algorithms avoid these problems, at the cost of a linear increase in the size of the routing tables

Q-Routing is a straightforward application of Q-Learning to the routing problem

Routing with ants is more flexible than Q-Routing

Page 30: Adaptive Routing Rienforcement Learning Approaches.

Reference Boyan, J., & Littman, M. (1994). Packet routing in dinamically changing networks: A rein-

forcement learning approach. In Advances in Neural Information Processing Systems 6 (NIPS6), pp. 671-678. San Francisco, CA:Morgan Kaufmann.

Di Caro, G., & Dorigo, M. (1998). Two ant colony algorithms for best-eort routing in datagram networks. In Proceedings of the Tenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS'98), pp. 541-546. IASTED/ACTA Press.

Choi, S., & Yeung, D.-Y. (1996). Predictive Q-routing: A memory-based reinforcement learning approach to adaptive trac control. In Advances in Neural Information Processing Systems 8 (NIPS8), pp. 945-951. MIT Press.

Dorigo, M., Maniezzo, V., & Colorni, A. (1996). The ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B, 26 (1), 29-41.