Reinforcement Learning Based Mobility Adaptive Routing for Vehicular Ad-Hoc Networks Jinqiao Wu 1 • Min Fang 1 • Xiao Li 1 Published online: 7 May 2018 Ó Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract Vehicular ad-hoc networks (VANETs) is drawing more and more attentions in intelligent transportation system to reduce road accidents and assist safe driving. However, due to the high mobility and uneven distribution of vehicles in VANETs, multi-hops communication between the vehicles is still particularly challenging. Considering the distinctive characteristics of VANETs, in this paper, an adaptive routing protocol based on reinforcement learning (ARPRL) is proposed. Through distributed Q-Learning algorithm, ARPRL constantly learns and obtains the fresh network link status proactively with the periodic HELLO packets in the form of Q table update. Therefore, ARPRL’s dynamic adaptability to network changes is improved. Novel Q value update functions which take into account the vehicle mobility related information are designed to reinforce the Q values of wireless links by exchange of HELLO packets between neighbor vehicles. In order to avoid the routing loops caused in Q learning process, the HELLO packet structure is redesigned. In addition, reactive routing probe strategy is applied in the process of learning to speed up the convergence of Q learning. Finally, the feedback from the MAC layer is used to further improve the adaptation of Q learning to the VANETs environment. Through simulation experiment result, we show that ARPRL performs better than existing protocols in the form of average packet delivery ratio, end-to-end delay and number hops of route path while network overhead remains within acceptable ranges. Keywords VANET Adaptive routing Reinforcement learning Q-Learning & Min Fang [email protected]Jinqiao Wu [email protected]Xiao Li [email protected]1 School of Computer Science and Technology, Xidian University, No. 2, South Taibai Street, Xi’an 710071, Shanxi, People’s Republic of China 123 Wireless Pers Commun (2018) 101:2143–2171 https://doi.org/10.1007/s11277-018-5809-z
29
Embed
Reinforcement Learning Based Mobility Adaptive Routing for ...static.tongtianta.site/paper_pdf/7636a2a0-a86d-11e9-afe7-00163e08… · A novel mobility adaptive routing protocol suitable
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Reinforcement Learning Based Mobility AdaptiveRouting for Vehicular Ad-Hoc Networks
Jinqiao Wu1 • Min Fang1 • Xiao Li1
Published online: 7 May 2018� Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract Vehicular ad-hoc networks (VANETs) is drawing more and more attentions in
intelligent transportation system to reduce road accidents and assist safe driving. However,
due to the high mobility and uneven distribution of vehicles in VANETs, multi-hops
communication between the vehicles is still particularly challenging. Considering the
distinctive characteristics of VANETs, in this paper, an adaptive routing protocol based on
reinforcement learning (ARPRL) is proposed. Through distributed Q-Learning algorithm,
ARPRL constantly learns and obtains the fresh network link status proactively with the
periodic HELLO packets in the form of Q table update. Therefore, ARPRL’s dynamic
adaptability to network changes is improved. Novel Q value update functions which take
into account the vehicle mobility related information are designed to reinforce the Q values
of wireless links by exchange of HELLO packets between neighbor vehicles. In order to
avoid the routing loops caused in Q learning process, the HELLO packet structure is
redesigned. In addition, reactive routing probe strategy is applied in the process of learning
to speed up the convergence of Q learning. Finally, the feedback from the MAC layer is
used to further improve the adaptation of Q learning to the VANETs environment. Through
simulation experiment result, we show that ARPRL performs better than existing protocols
in the form of average packet delivery ratio, end-to-end delay and number hops of route
path while network overhead remains within acceptable ranges.
Fig. 6 Average packet delivery ratio versus number of vehicles
Reinforcement Learning Based Mobility Adaptive Routing for… 2159
123
increases as the vehicle density increases when the number of nodes is less than 350 (for
QROUTING it is 300). This is because the connectivity of the network increases as the
node density increases. However, the APDR decreases slightly with the vehicle density
increases when the number of vehicles is higher than 300 (for QROUTING it is 250). The
reason is that the higher the node density, the greater the possibility of channel collisions.
In general, ARPRL outperforms all of the other protocols because of that ARPRL con-
siders the link reliability and vehicle mobility in the dynamic Q learning process. GPSR
only rely on the location of the neighbor vehicles to select the next hop, which easily fall
into the local optimal. Therefore, GPSR have the lowest APDR when the number of
vehicles is less than 300. QROUTING have the lowest APDR when the number of vehicles
is less than 300 because of routing loop and hence excessive conflicts. QLAODV performs
well than AODV in low and medium vehicle density by continuously learning the network
status through broadcast of hello packets. For high vehicle density, the result is reversed
because of high overhead of QLAODV in high vehicle density. ARPRL performs well than
QLAODV in all cases. This is because ARPRL optimizes QLAODV through periodic
learning, on-demand routing probe and the feedback of MAC layer; hence it shows an
advantage over QLAODV. On average, ARPRL improves the APDR by 23.4 and 22.6%
compared with that of QLAODV and AODV, respectively.
Figure 7 shows the Average End-to-End Delay(AEED) of each protocol for the suc-
cessfully delivered CBR packets with varying the number of vehicles. For AODV and
QLAODV, the AEED decreases as the number of vehicles increases from 50 to 350. This is
because the lower the vehicle density, the higher the probability of network partition, in
which packets need to be stored for further forwarding and thus the AEED is increased.
AODV shows highest AEED because of excessive route discoveries incurred by fast
movement of vehicles. For QLAODV, slow convergence and route loops introduced in the
learning process increase the AEED. ARPRL and QROUTING show similar AEED with
GPSR which has the lowest AEED. This is because the proactive Q-table maintenance in
ARPRL and QROUTING can switch the sub-optimal route while AODV and QLAODV
will not change to better routes until current active one breaks. For GPSR and QROUT-
ING, they perform better than ARPRL in the form of less AEED by 6.9 and 2.6 ms ,
respectively, as the result of route probe mechanism of ARPRL which introduces slight
additional delay. However, Compared with QLAODV and AODV, ARPRL reduces the
AEED by 162.2 and 384.8 ms, on average, respectively.
Figure 8 shows the Average Hops Count(AHC) of each protocol for the successfully
delivered CBR packets with varying the number of vehicles. In most cases, the AHC
decreases with the increase of vehicle density for all five protocols when the number of
vehicles is more than 50. This is because frequent network partitions results in increased
routing breaks and loops. In addition, frequent network topology changes also contribute to
this case happening. For QROUTING and ARPRL, the average hops increases with the
increase of vehicle numbers varying from 50 to 150 due to more and more vehicles
participating in the forward of packets. When the vehicle numbers varies from 150 to 350,
the average hops decreases due to more and more vehicles congesting at the intersections,
which contributes to finding a shorter route. In addition, AODV, QLAODV and ARPRL
adopt the route discovery strategy and accordingly have less AHC than that of QROUT-
ING and GPSR in most cases. More importantly, ARPRL shows significant fewer hops
than AODV and QLAODV at high vehicle density due to the routing probe and process of
packet loss notification of MAC layer. Compared with QLAODV and AODV, ARPRL
reduces the AHC by 3.58 and 4.44 hops on average, respectively.
2160 J. Wu et al.
123
5.3.2 Performance for Varying Maximum Velocity
In this part, We evaluate the performance of each protocol by varying the vehicle maxi-
mum velocity from 1 to 30 m/s, while the number of vehicles and CBR packet interval is
fixed to 200 and 1 s, respectively. The statistic results are described below.
Fig. 7 Average end-to-end delay versus number of vehicles
Fig. 8 Average hops count versus number of vehicles
Reinforcement Learning Based Mobility Adaptive Routing for… 2161
123
Figure 9 shows the average packet delivery ratio (APDR) of each protocol with varying
the maximum allowable velocity. From Fig. 9, it can be concluded that the APDR of all
five protocols decreases when the maximum vehicle velocity varies from 1 to 30 m/s. This
is because the increase in vehicle velocity causes more changing network topology and
network partitions in which the number of packets dropped increases due to high vehicle
movement. As the velocity varies from 25 to 30 m/s, the packet delivery ratio of five
protocols tends to increase. The reason is that the packet carry time decreases when the
velocity varies from 25 to 30 m/s. Therefore, the number of packet dropped deceases due
to packet timeout. In ARPRL, we consider not only the number of hops like AODV, but
also overcomes the slow convergence and routing loops of QROUTING. In addition, LET
is also considered in the learning process, which further enhances route reliability. Thus, it
performs better than the other four protocols. On average, ARPRL increases the APDR by
20.3 and 24.8%, compared with that of QLAODV and AODV, respectively.
Figure 10 shows the Average End-to-End Delay(AEED) of each protocol for the suc-
cessfully delivered CBR packets with varying the maximum allowable velocity. Fig. 10
indicates that the AEED of all five protocols increases as the maximum vehicle velocity
varies from 5 to 30 m/s. This is because high mobility leads to rapid changes in network
topology, which further increases the possibility of selection of sub-optimal routing path,
hence increases the delay. In addition, High mobility also aggravates the network parti-
tions, which incurs packet carry and increases delay. When the maximum vehicle velocity
varies from 25 to 30 m/s, The duration of the network partition becomes shorter. Thus, the
packet carry time introduced by network partition is reduced and the AEED also tend to
decrease for all the five protocols. GPSR and QROUTING perform better than ARPRL in
the form of less AEED by 9.4 and 1.8 ms, respectively, due to route probe mechanism of
ARPRL. However, on average, ARPRL reduces the AEED by 112.3 and 284.6 ms,
compared with that of QLAODV and AODV, respectively.
Figure 11 shows the Average Hops Count(AHC) of each protocol for the successfully
delivered CBR packets with varying the maximum allowable velocity. The result shows
that the AHC increases as the maximum vehicle velocity varies from 1 to 10 m/s. This is
because with the velocity increasing, the frequency of route break increases. Since ARPRL
Fig. 9 Average packet delivery ratio versus maximum allowable velocity
2162 J. Wu et al.
123
considers the number of hops and link expire time, it performs better than QLAODV and
AODV. As the maximum vehicle velocity varies from 15 to 30 m/s, the average hops
increases slightly for five protocols. The reason is that high velocity improves the network
connection and reduces the probability of network partition, which resulting in shorter
length of route path. On average, ARPRL reduces the AHC 3.3 and 3.9 hops, compared
with that of QLAODV and AODV, respectively.
Fig. 10 Average end-to-end delay versus maximum allowable velocity
Fig. 11 Average hops count versus maximum allowable velocity
Reinforcement Learning Based Mobility Adaptive Routing for… 2163
123
5.3.3 Performance for Varying Data Generation Interval
After analyzing the effect of vehicle velocity on protocol performance, in this part, we
evaluate each protocol by varying the data generation interval from 0.1 to 6 s, while the
maximum allowable velocity and the number of vehicles is fixed to 15 m/s and 200,
respectively. The statistic results are described below.
In Fig. 12, we evaluate the Average Packet Delivery Ratio(APDR) of each protocol
with varying the data generation interval. As shown in Fig. 12, the APDR of ARPRL,
QLAODV and AODV decreases as the Packet Interval(PI) varies from 0.1 to 6 s. This is
because the increase in PI causes less frequency of route discovery in which the number of
packets dropped increases due to more invalid route path. For QROUTING and GPSR, the
APDR remain approximately constant in all configurations. This is mainly because the
routing path is maintained only through periodic HELLO packets in QROUTING and
GPSR. Fig. 12 also shows that ARPRL achieves the highest APDR in all cases of PI. This
can be explained by the fact that ARPRL combines the advantages of proactive routing
learning through distributed Q-Learning algorithm and reactive routing probe mechanism.
On average, ARPRL delivers 19.0 and 24.0% more packets than QLAODV and AODV,
respectively.
Figure 13 shows the Average End-to-End Delay(AEED) of each protocol for the suc-
cessfully delivered CBR packets with varying the data generation interval. As shown in
Fig. 13, ARPRL achieve a much lower AEED than QLAODV and AODV in all config-
urations of Packet Interval(PI). This is because QLAODV and AODV adopt route dis-
covery mechanism which introduces longer AEED. However, in APRRL, the number of
triggers of route discovery is much less than QLAODV and AODV due to the periodic
routing learning and hence the AEED is further reduced. For GPSR and QROUTING, on
average, they perform better than ARPRL in the form of less AEED by 1.5 and 7.8 ms,
respectively, due to route probe mechanism of ARPRL. However, ARPRL reduces the
AEED by 216.2 and 503.4 ms, on average, compared with that of QLAODV and AODV,
respectively.
Fig. 12 Average packet delivery ratio versus data generation interval
2164 J. Wu et al.
123
Figure 14 shows the Average Hops Count (AHC) of each protocol for the successfully
delivered CBR packets with varying the data generation interval. The AHC of AODV and
QLAODV is much higher than other three protocols as the Packet Interval (PI) varies from
0.1 to 6 s. The higher the CBR data rate, the more difference of AHC. This is expected
since because in both AODV and QLAODV, new better route will not be discovered
immediately until the current active one breaks. For ARPRL, QROUTING and GPSR,
their AHC stays approximately constant in all cases while GPSR has lowest AHC among
them. This is because GPSR adopts greedy forwarding strategy which always progressively
Fig. 13 Average end-to-end delay versus data generation interval
Fig. 14 Average hops count versus data generation interval
Reinforcement Learning Based Mobility Adaptive Routing for… 2165
123
forward packets directly toward the direction of the destination. Although GPSR has the
minimal AHC, it is at the high expense of the cost of packet loss due to the local optimal
caused by greedy forwarding, which can be proved by Fig. 12. In addition, ARPRL has
lower AHC than QROUTING because of the route probe mechanism of ARPRL. On
average, ARPRL reduces the AHC 0.11, 3.4 and 4.3 hops, compared with that of
QROUTING, QLAODV and AODV, respectively.
6 Analysis of ARPRL
In this part, the Average Routing Overhead (ARO) is firstly evaluated and compared with
related some existing protocols. The ARO is defined as the ratio of average number of
bytes of non-data packets broadcast by vehicles for the routing maintenance to the average
number of bytes of data packets received by the destinations. This metric reflects the extra
communication overhead introduced by the routing protocols. In addition, the complexity
of ARPRL is also analyzed.
6.1 Routing Overhead Analysis
In ARPRL, the non-data packets consist of two categories: (1) periodical proactive HELLO
packets; (2) On-demand reactive Learning Probe REQuest/REPly (LPREQ and LPREP)
Packets. Periodical HELLO packets are the main source of Routing Overhead (RO)
introduced by ARPRL, however, it is imperative by ARPRL for real time sensing of
network changes. The significant difference pertain to RO between ARPRL and other
existing protocols under consideration (except for QROUTING) is the dynamic variant part
of HELLO packet of ARPRL, which is used to exchange Q table information between
neighbors.
Figure 15a presents the ARO of each protocol with varying the number of vehicles. As
shown in Fig. 15a, AODV, QLAODV and GPSR remain almost similar constant for all
configurations of number of vehicles. This is because the ARO mainly depends on the
average bytes of broadcast of control packets. For AODV and QLAODV, it is determined
by the number of RREQ packets broadcast in the routing discovery process. This is why
the ARO of AODV and QLAODV increase slightly with the increase of number of
vehicles as the CBR data rare is fixed at one packet per second. For GPSR, fixed length of
periodic HELLO packets are the mainly source of RO for neighbors position maintenance.
Therefore, it’s ARO stays constant with the increase of number of vehicles. For
QROUTING, the ARO is determined by the variable length of periodic broadcast of
HELLO packets for routing learning, which increases linearly with the increase of number
of vehicles. For ARPRL, the LPREQ packets also contribute to part of ARO besides the
HELLO packets which are the same as that of QROUTING. This is why the ARO of
ARPRL and QOURING increases linearly with the increase of number of vehicles
meanwhile ARPRL has slightly more ARO than QROUTING at high density of vehicles.
In Fig. 15b, it can be observed that the ARO of AODV and QLAODV increases with
the increase of maximum allowable vehicle velocity. This is expected since the increase in
vehicle velocity causes more changing network topology and hence the number of triggers
of route discovery increases. In contrast, GPSR remain constant of ARO in all cases and
has lowest ARO in all five protocols. This is due to the length and number of periodic
HELLO packets in GPSR are independent of changing network topology. In general,
2166 J. Wu et al.
123
ARPRL and QROUTING have approximately the same ARO at low and medium velocity.
At high velocity, ARPRL has slightly more ARO than QROUTING due to the increase of
the number of broadcast of LPREQ packets.
Figure 15c shows that the ARO of AODV and QLAODV decreases with the increase of
packet interval since the number of route discovery is proportional to the number of
packets to transmit. Meanwhile, ARPRL, QROUTING and GPSR have almost constant of
ARO in all cases. This can be explained by the fact that periodic HELLO packets are
independent of data generation rate. At high data generation rate, ARPRL has slightly more
ARO than QROUTING due to the adoption of learning probe mechanism in ARPRL.
In summary, ARPRL shows higher routing overhead than the other four protocols as
shown in Fig. 15. This is expected since the combination of proactive routing learning
algorithm and reactive routing probe strategy inevitably causes higher overhead but
improves comprehensive performance. However, how to efficiently further reduce the
ARO of ARPRL will be considered as our future work.
6.2 Complexity Analysis
Through the above experimental results, we can conclude that ARPRL is more suitable in
VANET environment for higher data delivery success rate, lower delay and fewer routing
hops since a variety of optimizing strategies is adopted based on AODV and QLAODV.
(a) (b)
(c)
Fig. 15 Average routing overhead versus. a Number of vehicles. b Maximum velocity. c Packet interval
Reinforcement Learning Based Mobility Adaptive Routing for… 2167
123
However, it is also necessary to analyze the time and space complexity of A when applying
ARPRL to VANET. For ARPRL, the time complexity depends mainly on the maintenance
of the Q table, which consists of three parts. The time required for each part is O(1). For a
network with N vehicle nodes, it is clear that the time complexity of ARPRL is O(N). The
spatial complexity of ARPRL depends mainly on the memory space required to build the Q
table. In the network with N vehicle nodes, obviously, the space complexity of ARPRL is
OðN3Þ in the worst case, which is higher than other four protocols. Fortunately, however,
for VANET, this is acceptable because each vehicle can be equipped with a computing
device with enough high processing capability.
7 Conclusion
In this paper, we proposed ARPRL, a reinforcement learning based heuristic routing
protocol for VANETs. ARPRL employs Q Learning to dynamically learn the optimal
stable and reliable route through a variety of strategies to update the Q table maintained by
each vehicle node. Periodic exchange of HELLO messages between neighbour vehicles,
forwarding of DATA packets and MAC layer feedback mechanism are used to assist in the
updating of the Q table. In order to boost the convergence of learning process, LPREQ and
LPREP messages were used at the begin of learning process. We also designed the
structure of HELLO message for the exchange of optimal part of Q table contends and
avoid the occurrence of route loops at some extent. More importantly, we proposed a novel
Q value update function which takes into consideration the distinguish features of
VANETs. ARPRL forwards data packets according to the Q table which is updated
through Q learning algorithm and takes the number of hops, vehicle mobility and link
expired time into account, thus it performs better and is more suitable for packet loss and
delay sensitive applications.
Acknowledgements We would like to appreciate the editors and the anonymous reviewers for their helpfulcomments and suggestions. This work is supported by National Natural Science Foundation of China (GrantNo. 61472305), Aeronautical Science Foundation of China (Grant No. 20151981009) and Science ResearchProgram, Xian, China (Grant No. 2017073CG/RC036(XDKD003)).
References
1. Campolo, C., Molinaro, A., & Scopigno, R. (2015). Vehicular ad hoc networks standards, solutions andresearch. Berlin: Springer.
2. Li, F., & Wang, Y. (2007). Routing in vehicular ad hoc networks: A survey. IEEE Vehicular Tech-nology Magazine, 2(2), 12–22.
3. Yun-Wei, L. I. N., Yuh-Shyan, C. H. E. N., & Sing-Ling, L. E. E. (2010). Routing protocols in vehicularad hoc networks: A survey and future perspectives. Journal of Information Science and Engineering, 26,913–932.
4. Chen, W., Guha, R. K., Kwon, T. J., Lee, J., & Hsu, Y.-Y. (2011). A survey and challenges in routingand data dissemination in vehicular ad hoc networks. Wireless Communications and Mobile Computing,11(7), 787–795.
5. Zeadally, S., Hunt, R., Chen, Y.-S., Irwin, A., & Hassan, A. (2012). Vehicular ad hoc networks(VANETs): Status, results and challenges. Telecommunication Systems, 50(4), 217–241.
6. Survey, A., Sharef, B. T., Alsaqour, R. A., & Ismail, M. (2014). Vehicular communication ad hocrouting protocols. Journal of Network and Computer Applications, 40, 363–396.
7. Sutton, R. S., & Barto, A. G. (2011). Reinforcement learning: An introduction (Vol. 1). Cambridge:Cambridge Univ Press.
2168 J. Wu et al.
123
8. Kiumarsi, B., Lewis, F. L., Modares, H., Karimpour, A., & Naghibi-Sistani, M.-B. (2014). Rein-forcement Q-learning for optimal tracking control of linear discrete-time systems with unknowndynamics. Automatica, 50(4), 1167–1175.
9. Clausen, T., & Jaqcquet, P. (2003). Optimized link state routing (OLSR). IETF Networking Group,RFC, 3626, 1–75.
10. Alslaim, M. N., Alaqel, H. A, & Zaghloul, S. S. (2014). A comparative study of MANET routingprotocols. In 2014 third international conference on technologies and networks for development(ICeND) (pp. 178–182). IEEE.
12. Beijar, N. (2002). Zone routing protocol (ZRP). Networking Laboratory, Helsinki University of Tech-nology, Finland, 9, 1–12.
13. Fonseca, A., & Vazao, T. (2013). Applicability of position-based routing for VANET in highways andurban environment. Journal of Network and Computer Applications, 36(3), 961–973.
14. Kumar, S., & Verma, A. K. (2015). Position based routing protocols in VANET: A survey. WirelessPersonal Communications, 83(4), 2747–2772.
15. Liu, J., Wan, J., Wang, Q., Deng, P., Zhou, K., & Qiao, Y. (2016). A survey on position-based routingfor vehicular ad hoc networks. Telecommunication Systems, 62(1), 15–30.
16. Goel, N., Sharma, G., & Dhyani, I. (2016). A study of position based VANET routing protocols. In 2016international conference on computing, communication and automation (ICCCA) (pp. 655–660). IEEE.
17. Kenichi, M. A. S. E. (2016). A survey of geographic routing protocols for vehicular ad hoc networks asa sensing platform. IEICE Transactions on Communications, 99(9), 1938–1948.
18. Karp, B., & Kung, H.-T. (2000). GPSR: Greedy perimeter stateless routing for wireless networks. InProceedings of the 6th annual international conference on mobile computing and networking (pp.243–254). ACM.
19. Sood, M., & Kanwar, S. (2014). Clustering in MANET and VANET: A survey. In 2014 internationalconference on circuits, systems, communication and information technology applications (CSCITA) (pp.375–380). IEEE.
20. Yang, P., Wang, J., Zhang, Y., Tang, Z., & Song, S. (2015). Clustering algorithm in VANETs: Asurvey. In 2015 IEEE 9th international conference on anti-counterfeiting, security, and identification(ASID) (pp. 166–170). IEEE.
21. Cooper, C., Franklin, D., Ros, M., Safaei, F., & Abolhasan, M. (2016). A comparative survey ofVANET clustering techniques. IEEE Communications Surveys & Tutorials, 19(1), 657–681.
22. Sucasas, V., Radwan, A., Marques, H., Rodriguez, J., Vahid, S., & Tafazolli, R. (2016). A survey onclustering techniques for cooperative wireless networks. Ad Hoc Networks, 47, 53–81.
23. Anupama, M., & Sathyanarayana, B. (2011). Survey of cluster based routing protocols in mobile ad-hocnetworks. International Journal of Computer Theory and Engineering, 3(6), 806.
24. Lin, C. R., & Gerla, M. (1997). Adaptive clustering for mobile wireless networks. IEEE Journal onSelected Areas in Communications, 15(7), 1265–1275.
25. Chatterjee, M., Das, S. K., & Turgut, D. (2002). WCA: A weighted clustering algorithm for mobile adhoc networks. Cluster Computing, 5(2), 193–204.
26. Jaiswal, S., & Adane, D. D. S. (2013). Hybrid approach for routing in vehicular ad-hoc network(VANET) using clustering approach. International Journal of Innovative Research in Computer andCommunication Engineering, 1(5), 1211–1219.
27. Kakkasageri, M. S., & Manvi, S. S. (2014). Connectivity and mobility aware dynamic clustering inVANETs. International Journal of Future Computer and Communication, 3(1), 5.
28. Boyan, J. A., Littman, M. L., et al. (1994). Packet routing in dynamically changing networks: Areinforcement learning approach. In Advances in Neural Information Processing Systems, pp. 671–678.
29. Dowling, J., Curran, E., Cunningham, R., & Cahill, V. (2005). Using feedback in collaborative rein-forcement learning to adaptively optimize MANET routing. IEEE Transactions on Systems andCybernetics-Part A: Systems and Humans, 35(3), 360–372.
30. Celimuge, W. U., Kumekawa, K., & Toshihiko, K. A. T. O. (2010). Distributed reinforcement learningapproach for vehicular ad hoc networks. IEICE Transactions on Communications, 93(6), 1431–1442.
31. Plate, R., & Wakayama, C. (2015). Utilizing kinematics and selective sweeping in reinforcementlearning-based routing algorithms for underwater networks. Ad Hoc Networks, 34, 105–120.
32. Santhi, G., Nachiappan, A., Ibrahime, M. Z., Raghunadhane, R., & Favas, M. K. (2011). Q-learningbased adaptive QoS routing protocol for MANETs. In 2011 international conference on recent trends ininformation technology (ICRTIT) (pp. 1233–1238). IEEE.
33. Royer, E. M., & Perkins, C. E. (2000). Multicast ad hoc on-demand distance vector (MAODV) routing.IETF Draft, 1, 10–25.
Reinforcement Learning Based Mobility Adaptive Routing for… 2169
123
34. Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming.Hoboken: Wiley.
35. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). Cambridge:MIT Press.
36. SNT. (2014). QualNet 7.1. http://web.scalable-networks.com.37. Harri, J., Fiore, M., Filali, F., & Bonnet, C. (2011). Vehicular mobility simulation with VanetMobiSim.
Simulation, 87(4), 275–300.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institu-
tional affiliations.
Jinqiao Wu received the MS degree in 2014 from the Xi’an Universityof Post & Telecomunications, Xi’an, China. He is currently a Ph.D.candidate in computer science at Xidian University, Xi’an, China. Hisresearch interests include machine learning, networking architectures,and routing protocols.
Min Fang received her B.S. degree in computer control, M.S. degreein computer software engineering and Ph.D. degree in computerapplication from Xidian University, Xi’an, China, in 1986, 1991 and2004, respectively, where she is currently a professor. Her researchinterests include intelligent information process, multi-agent systemand network technology.
Xiao Li received BS degree from Xi’an University of Finance andEconomics, Xi’an, China, in 2012. She is currently a Ph.D. candidatein computer science at Xidian University, Xi’an, China. Her researchinterests include pattern recognition, machine learning and computervision.
Reinforcement Learning Based Mobility Adaptive Routing for… 2171