Reinforcement learning based traffic optimization at an intersection with GLOSA Master Thesis Submitted in Fulfillment of the Requirements for the Academic Degree M.Sc. Dept. of Computer Science Chair of Computer Engineering Submitted by: Rajitha Jayasinghe Student ID: 456470 Date: 07.01.2019 Supervising tutor: Prof. Dr. W. Hardt Prof. Dr. Uranchimeg Tudevdagva Dr. Leonhard Lücken, DLR - Berlin
101
Embed
Reinforcement learning based traffic optimization at an ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Reinforcement learning based traffic
optimization at an intersection with GLOSA
Master Thesis
Submitted in Fulfillment of the
Requirements for the Academic Degree
M.Sc.
Dept. of Computer Science
Chair of Computer Engineering
Submitted by: Rajitha Jayasinghe
Student ID: 456470
Date: 07.01.2019
Supervising tutor: Prof. Dr. W. Hardt
Prof. Dr. Uranchimeg Tudevdagva
Dr. Leonhard Lücken, DLR - Berlin
1
Abstract
Traffic flow optimization at an intersection helps to maintain a smooth urban traffic
flow. It can reduce travel time and emission. Regularly, new algorithms are
introduced to control approaching vehicles and traffic light phases. Reinforcement
learning and traffic optimization is a novel combination that is used by the research
community. This thesis suggests a methodology to reduce travel time and emission
of vehicles for a specific intersection design. Here the author provides a clear
solution by considering the driving route of approaching the vehicle to the
intersection. By using reinforcement learning and route information, this research
suggests a vehicle ordering mechanism in order to improve the throughput of the
intersection. Before proposing the solution, the author gives a thorough research
review of previous studies. Various findings regarding various Reinforcement
learning algorithms and how it has used to traffic optimization are explained in
Literature review. Further, the author is using GLOSA as a baseline to evaluate the
new solution. Several types of GLOSA variations are discussed in this report. A new
approach, which can be seen as an extension of the existing GLOSA algorithms, is
described in the concept chapter. A deep Q network approach and a rule-based
policy are introduced as the solution. The proposed solution was implemented and
evaluated. The author was able to achieve promising results from a rule-based policy
approach. Further, the issues related to both approaches were discussed in detail
and solutions were given to further improve the proposed solutions.
Speed change 0 to 14 ms-1 Indicate acceleration or deceleration. Speed
59
change is decided by extended GLOSA
function.
Possible values: 0 (full stop) to Max speed
Lane changes 0 or 1 Two Boolean values for represent left and
right lane.
Table 10 : Actions - rule-based policy approach
4.3.4 Rules
Following set of rules define previously declared actions. Please note that each
vehicle should execute these rules in each simulation step to find the best possible
action.
4.3.4.1 Rule 1
If a left turn vehicle is traveling in the right lane, it should change to the left lane
Figure 36: Rule 1 expected result
Figure 37 : Rule 1 algorithm
60
4.3.4.2 Rule 2
If a straight going vehicle is following a left turn vehicle in the left lane, it should
change to the right lane.
Figure 38 : Rule 2 expected results
Figure 39 : Rule 2 algorithm
61
4.3.4.3 Rule 3
This rule is only valid for straight going vehicles which are traveling on the right lane.
If there are no leading left turners in the left lane and if there are free SLOTS
available in the left lane, the vehicle can do a lane change from right to left. The
number of SLOTS in the left lane is a parameter, which is defined according to the
scenario. The number of vehicles which can pass during a Green light phase and
extended green light phase for left turners is considered when defining slots.
SLOTS= maximum desired number of leading straight going vehicles in front of the
first left turner which is traveling on the left lane.
Figure 40 : Rule 3 expected results
Several flags are set after changing the lanes to indicate that the ego vehicle already
executed particular rule and thereafter the program keeps the vehicle in the changed
lane.
62
Figure 41 : Rule 3 algorithm
63
4.3.4.4 Extended GLOSA
This is an extension for traditional GLOSA algorithm which decides the approaching
speed of the vehicle. The extension is to establish the connection between the
Traditional GLOSA and Rules. Several new modifications were added mainly to
control the left turners and to figure out when the vehicle reaches the intersection.
The Following diagram shows the activity flow of the proposed extended GLOSA
function.
Following equations were introduced by above extended-GLOSA algorithm
If dMax > distance to junction condition is true, the following equation is valid. In other
word, when vehicle tries to reach the Max speed and the distance it needs to reach
the MAX_SPEED is larger than the current distance to the junction, following
equation (EQ1) should execute.
(
) √(
∗
∗ ) ∗
Eq 2 : Arrival time calculation equation 1
If the above condition is false, the next equation(EQ2) is valid.
Eq 3 : Arrival time calculation equation 2
Following equation(EQ3) should execute when the acceleration is 0 and speed is not
0
Eq 4 : Arrival time calculation equation 3
All 3 above equations are to find the arrival time to the junction from current vehicle
position in different circumstances.
64
Figure 42 : Extended-GLOSA algorithm
65
Final equation(EQ4) is to find the suitable speed needs for a vehicle in order to fulfill
GLOSA constraints.
( ∗
)
Eq 5 : Advised sped calculation
GLOSA algorithm is executed for all RL agents and every simulation step.
4.3.5 Rewards
Following equations are used to find the Reward value for training rule-based
policies. Here the author is explaining each and every equation step-by-step. The
final reward is constructed from accumulated total travel time and accumulated
emission. Several parallel runs are running for optimizing values which are chosen by
the optimizer.
The author explains how to measure accumulated total travel time.
∑ ∑
Eq 6 : travel time calculation
Equation 5 explains the travel time for all vehicles, for overall simulation in each
parallel run. Duration represents how long the particular vehicle runs in a simulation.
Depart delay indicates the delay causes when a vehicle enters into the simulation in
the beginning due to no space in the lane.
∑
Eq 7 : Accumulated travel time calculation
Equation 6 derives from equation 5. Here route length means the total distance a
vehicle travels from start to finish. Further, this is used to find the accumulated travel
time figure per km.
Next is the accumulated emission which is the other factor relevant for constructing
reward.
66
∑ ∑
Eq 8 : Acumulated emission calculation (per g)
Equation 7 calculated accumulated emission (CO2) for all vehicles, for the whole
simulation in each and every parallel run.
∑
Eq 9 : Accumulated emission calculation (per km)
Equation 8 is used to find the accumulated emission figure per km. At last, the author
introduces reward function. Here author is using previously calculated accumulated
travel time(per km) and accumulated emission (per km). Also, two normalization
coefficients are used separately for travel time and emission.
Travel time normalization = 1/120 which is used to normalize the accumulated travel
time figure.Emission normalization figure is 1/150 which again is used to normalized
accumulated emission figure. Values for the constants were found experimentally.
Alpha is used to control the final influence of travel time and emission to the Reward.
In another word Alpha is a weighting factor. Alpha = 1 means only travel time matters
for final reward.
∗ ∗
∗
Eq 10 : Reward calculation
4.3.6 Policy parameters
Optimizing variable Description
Margin Extra added time for traffic light cycle which is used to
control the vehicles (for all vehicles). Extended
GLOSA uses Margin as a parameter when calculating
arrival time to the intersection.
Extra delay for the first left
turner
This duration is added for the first left turner’s targeted
arrival time in the GLOSA algorithm in order to slow it
down. This allows straight going vehicles to overtake
left turner easily
Table 11 : Optimizing variables
67
The suitable optimization algorithm is selected depending on the nature of the
problem. Here the problem is a stochastic discontinuous function. The author uses
the Broyden-Fletcher-Goldfarb-Shanno algorithm (BFGS) for the optimization [45] .
Optimizer runs until it achieves the best suited margin and left turner delay by gaining
highest reward value. If it does not find the best suited values optimizer run a specific
number of turners and find the best suited value out of all turns. Further, it is
impossible that is to minimize both objective quantities simultaneously. Due to that
reason, the author has used Alpha has a weighting factor to combine both
accumulated figures and create one single figure The author needs to provide
starting and bound values for margin and delay.
Bounds values : Margin from 0.1 to 5 and delay from 0 to 40
Starting values : Margin = 2 and delay = 31
4.4 Summary
The author introduced DQN and Rule-based policy approach. Mainly it discusses
Major steps creating RL algorithms. Further, the section discussed design,
components and major processes/functions of both solutions. Next, the report
defined the architecture of the solution, observation, and action space, and Reward
functions. For designing DQN two architectures were used. Mainly the author divided
it into single agent and multi-agent based designs. Seven neurons were placed and
two in output layer in multi-agent based DQN. However single-agent architecture was
more complicated than the multi-agent architecture due to many parameters. For the
rule-based approach, the author introduced the rules and the expected results.
Mainly the rule based policy consists of three rules to decide the best suited lane for
each vehicle during all simulation steps. The Extended GLOSA was used to calculate
the speed for each vehicle. Extended GLOSA has several modifications in order to
support rules. At last, the chapter discussed the optimization process which uses to
improve rules and extended-GLOSA.
68
5 Implementation
This chapter focuses on introducing the implementation aspects of proposed
solutions. First, it introduces the software stack the author uses for implementing the
solution. Next, discuss the steps need to follow for both approaches. FLOW has
several implementation steps in order to create a RL prototype. Rule-based policies
also consist of several components. Code snippets are provided when explaining the
implementation aspects.
5.1 Technical details
5.1.1 Deep Q network approach
Software Description
FLOW [11] FLOW is using as a connector for linking Reinforcement library and
SUMO microscopic traffic simulator. FLOW is using several other
3rd party libraries.
Theano [46] , OpenAI Gym [37], SUMO Traci
SUMO [10] Microscopic traffic simulator which provides an environment for RL
agents
Rllab [36] Reinforcement learning library which uses to train Deep Q network.
Python All above frameworks are written in python.
Table 12 : Software - DQN approach
5.1.2 Rule-based policy approach
Software Description
Python All components are written in Python
SUMO Similar to the above approach SUMO is the simulation environment
Sumo-Traci
[47]
Traci is able to change the state of the simulation in runtime. By
using Traci user can pass commands to the simulation and retrieve
data from it.
SciPy [48] Scipy provides python based optimization with various optimization
algorithms.
Table 13 : Software - rule-based policy approach
69
5.2 Implementation of Deep Q network approach
5.2.1 Flow configuration
Following diagram shows the steps the user needs to follow in order to create FLOW
based SUMO simulation
5.2.2 Dynamic SUMO network configuration
Figure 43 : FLOW steps
Please note that here author only explains how to create a FLOW simulation. More
information regarding FLOW’s architecture, functionality was explained in the
Literature Review chapter.
5.2.2.1 FLOW Generator creation
The user needs to create a custom Generator class which extends from base
Generator and overrides several inherited methods. Usually in SUMO context user
needs to provide node, edges and route information in order to generate the network
configuration file. Similar to that, here the FLOW creators have given several
methods in order to provide node, edges and route information to create the net file.
Specify_nodes() which states the location of the nodes relative to one specific node.
Here the author only considers an intersection scenario. Due to that center node
which creates the intersection is considered as the main node(junction node) and all
the other nodes are relatively positioned to the main node.
Following code snippet shows the node array. Proposed intersection scenario
consists of 9 nodes and coordinates for “center” node is 0,0.
70
Figure 44 : Specify nodes code snippet
Specift_edges(): give edge details. Mainly type of the edge, length, which nodes
contribute to creating the edge... etc. Following code shows the edge array which
consists of 9 edges.
Figure 45 : Specify edges code snippet
Specify_routes(): which states the route names and edges contribute to creating the
route.
Figure 46 : Specify route code snippet
Above code shows the routing array. As an example route “left” consists of 3 edges (
“left”,”altleft1” and “right”) which creates the road from left to right through the
intersection.
71
5.2.2.2 FLOW Scenario creation
Similar to Generator creation, the user needs to create custom Scenario class which
override several methods of base Scenario class
Specify_edge_starts(): Here the user needs to provide the starting coordinates of an
edge relative to one specific edge. Following code snippet shows a part of Edge
array. Here the “bottom“ edge is taken as the main edge and other edges like “top“
have given the starting position relative to the “bottom“ edge.
Figure 47 : Speify edge starts code snippet
Specifiy_intersection_edge_starts() and specify_internal_edge_starts() are another 2
functions similar to specify_edge_starts(). Only difference is that internal_edge
fuction targeting internal edges of an intersection which allows vehicle to pass
intersection for various directions.
5.2.2.3 FLOW environment creation
The user needs to create a custom environment class, extends from Base
Environment. As usual, there are several inherited methods which need to override.
Action_space(): Following functions declare the possible actions the agent can
execute at a given time. The agent can only change lane and speed of the vehicles.
Also, the author has set upper and lower bounds to acceleration and deceleration. As
FLOW only supports single agent scenarios currently, actions for all vehicles live in
the simulation should provide by a single DQN.
As an example, if a simulation has 30 RL agents, DQN should provide 60
(2*30)actions.
72
Figure 48 : Action space code snippet
Observation_space() : Here the author provides observations/inputs to the DQN
Following code snippet shows all inputs to DQN. For 30 RL agents, DQN should
support (30*8) inputs/observations. The user needs to provide lower and upper
bounds for all inputs. More information regarding inputs was already mentioned in the
Design chapter
Figure 49 : Observation space code snippet
Get_state() is to retrieve observations of the following state after execution of actions.
Here the user can call getters which were already defined by FLOW
Figure 50 : Get state code snippet
There are several other functions like apply_rl_actions() which executes previously
defined actions and compute_reward() which calculate the reward as specify in the
Design chapter.
73
5.2.2.4 Master configuration creation
The main configuration connects previously defined generator, scenario and
experiment classes. The further user needs to enter additional information like the
length of vertical and horizontal lanes, number of agents which needs to create,
velocity bounds for vehicles and many other technical information
Figure 51 : Master configuration code snippet
Following code snippet shows how the author has used predefined Gaussian MLP
Policy which consists a DQN wit 2 hidden layers (64*64). Further FLOW is using
TRPO algorithm to adjust DQN weights.
Figure 52 : DQN policy
74
5.3 Implementation of Rule-based policy approach
5.3.1 Data extraction
The first step is data extraction using SUMO-Traci. It allows extracting data in runtime
for each and every simulation step. Traci provides a wide range of functions as seen
in following code snippet. Here the author collects information related to vehicle and
neighboring vehicles for a specific vehicle. The author is using a python dictionary to
store information for all agents and several lists (eg: leaders_rightlane_straight_list)
to save information regarding neighboring vehicles. Further, there are several flags
which indicate whether the rule is already set for a specific vehicle.
Figure 53 : Data extraction code snippet
5.3.2 Rules implementation
Rules are already introduced in the last chapter. Following code snippet shows rule 1
and 2. The author uses Traci commands to control agents in runtime. Mainly
changeLane() which changes vehicles from one lane to another. Here rule 1 change
left turners from the right lane to left (lane 0 to 1 according to SUMO environment)
and rule2 changes straight going vehicle, if it is following a left turner from left to right
lane (0 to 1).
75
Figure 54 : Rule 1 and 2 code snippet
Code snippet Relevant to rule3 can be found in Appendix
5.3.3 Extended-GLOSA implementation
All equations and conditions which needed for creating extended-GLOSA were
already introduced in Concept chapter. Full algorithm is added to the appendix.
But here the author is discussing above the implementation aspect of it.
The proposed Sumo traffic light has 8 phases for one single cycle. Following code
snippet shows available phases. Phase 1 which is a Green phase for all vehicles
traveling from left to right and vice versa. Phase 3 shows the extended Green phase
for left turners which are not available for straight going vehicles. Following code
snippet tries to find the next green light phase for a given current phase.
Figure 55 : Traffic cycle
Next code snippet shows how the author handles to slow down vehicles during Red
phases. The program mainly has 2 ways to handle a slowdown of a vehicle. First, it
checks whether the vehicle can arrive in the next green phase or the following one
76
using previously calculated arrival time. Further, it considers the extended green
phase to left turners. The program has an extreme slowdown technique for first left
turner which is controlled by “slow_down_flag”.
For slowing down a vehicle, the author uses Sumo slowdown() which is able to slow
down a specific vehicle to a given speed and a time duration.
Figure 56 : Extended GLOSA code snippet
5.3.4 Optimizer implementation
Following implementation shows how the author has given starting values, bounds
for Arrival margin and delay for first left turner as mentioned in Concept. The author
uses Scipy framework and minimize() to call the inbuilt optimization process. Here it
is using BFGS-B optimization algorithm. But during the optimization process author
has used various optimization algorithms to find the best fit. After executing this, it
runs until optimizer finds the lowest accumulated travel time and emission figure.
Figure 57 : Optimizer code snippet
77
5.3.5 Reward calculation
This is part of the reward function which is called by above minimize() function. After
every simulation, Sumo generates 2 separate files which contain information of
emission figures and travel time for all vehicles. Also, the program runs several
simulations parallel and each simulation creates above files separately. Programs
read all files and calculate accumulated travel time and emission. Then considering
normalization factors as mentioned in following code final reward is calculated.
Figure 58 : Reward calculation code snippet
5.4 Summary
This chapter explained how the author implemented both solutions. FLOW was used
to implement the DQN. According to the guidelines provided by FLOW development
team [11], the author implemented those required classes (Generator, Scenario, and
Experiment). Further, the reward was introduced and it was based on the
accumulated travel time and emission for each vehicle. Important code snippets were
provided when necessary. Next, rule based approach was also described. The
author has introduced necessary inputs to rules. Input vector was similar to the
earlier approach. But it has several other inputs like data related to leading vehicles.
Further chapter explained the implementation aspects of rules and how SUMO-Traci
was used with important functions. Finally, the optimizer was introduced and it was
built by using Scipy.
78
6 Evaluation
This chapter explains the results and main observations the author received after the
execution of previously implemented prototypes. Further issues which occurred
during the testing phase also discussed. Next, solutions to the previously discovered
problems are also explained briefly.
6.1 DQN based approach
6.1.1 Tests
As mentioned in the above chapters, the author is using FLOW to create DQN.
During the training phase of DQN, the author has completed following subtasks.
Preliminary tests: Initially ran the simulation for a very short time. Here the simulation
steps and a number of iterations were low. Idea was to see how FLOW performs
under test circumstances. Started with 10 minutes runs (execution time) and later
extended to 1 hour. Duration of a single simulation is 5 minutes.
Next step was to increase simulation time (up to15 minutes) and a number of
iterations (up to 1000). Total execution time is approximately 20 hours for these set
of experiments.
Change the number of hidden layers and neurons in the hidden layer: Experiments
were started with 32 neurons in the hidden layer (single hidden layer) and then tried
two other variations.
32*32: 2 hidden layers and 32 neurons in each
64*64: As similar to previous structure 2 hidden layers. Instead of 32 here, the author
is using 64 neurons in each layer.
Ran parallel runs: Experimented up to 4 parallel runs for 20 hours.
6.1.2 Results and discussion of DQN approach
Preliminary tests: This is to check whether FLOW is generating the intersection
scenario as expected. There were no errors with FLOW’s capabilities.
79
Single-agent architecture: Author was able to see how agents are reacting for RL.
Starting positions of all vehicles during the first simulation step were varied as FLOW
is using a custom algorithm introduced by the author to generate starting positions of
vehicles on lanes. That means every simulation starts with different starting positions.
One advantage of this approach is this can create a new scenario, instead of
vehicles starting at same positions in every simulation.
In the first few iterations, movement of vehicles was extremely slow. Lane changes
were not much. As mentioned in Literature review RL takes more time to learn best
actions for each state. Experience gains from the first set of iterations are not good
enough to find the best action for a given state. Due to its slowness, vehicles did not
reach the intersection. Still, all the vehicles were structured as the initial partition. The
following figure illustrates the observed output.
Figure 59 : FLOW results1
After 5 iterations, was able to see vehicles are creating groups (partitions) which
mean it broke from the original partition as leading vehicles try to reach the
intersection with a higher speed than initial times. The author was able to see higher
speed from leading vehicles of the partition and one or two vehicles passes the
intersection. Further, there were many overtakes in each and every step.
80
Figure 60 : Flow results2
To run 50 iterations, it took around 20 hours and still saws similar output. Mainly from
the results, the author was able to see that FLOW is trying various actions to find the
best actions for all vehicles. But after 5th iteration, the changes between iterations
were not much.
When adding more neurons to hidden layers (eg 64*64) training was much slower.
Above results are only valid for single-agent approach.
6.1.3 Issues observed
Extreme slowness of single-agent approach: As mentioned above, after initial
iterations, results between iterations were very similar. There were not many
variations even after it ran for 20 hours. The author was able to see lane changes
and speed changes. But the improvement is very slow. In another words the issue
here is training time. Training needs more time than expected. Due to the timeline of
the project, resources and the preliminary inputs author received from rule-based
policy approach, the author experimented on rule-based approach than DQN
FLOW based multi-agent approach: Due to the extremely slow response from the
above tests, the author took the next steps to extend FLOW for a multi-agent
approach. But due to the design of FLOW, it was not possible to convert FLOW to a
multi-agent system with minor improvements.
81
6.2 Rule-based policy approach
6.2.1 Tests
SUMO route files were used to generate various traffic flows. By changing attributes
like “period”, “number of vehicles per hour” and “probability” various traffic scenarios
were generated [49]. The author was able to change a number of vehicles for each
direction/route. All these scenarios were tested with the following variations.
GLOSA only scenario: Here the main goal is to see how traditional GLOSA
algorithm performs for a given scenario.
Extended-GLOSA with Rule-based system: Instead of traditional –GLOSA
here author used Extended- GLOSA algorithms with newly developed rules.
Due to that system equipped with speed and lane advisory system here.
Estimation: This is a test step in order to verify whether the proposed
approach is functioning as expected. Here vehicles enter with the already
arranged order. According to the proposed approach straight going vehicles
need to arrive earlier than left turners. Due to that, order was already created
when the vehicle enters. Further, no speed and lane change advisory was
used here.
Random SUMO runs: Here no speed or lane advisory is provided. Further, no
order was provided like the above estimation step. All the vehicles were
controlled by built-in models (Kraus Model [49]) in SUMO.
Another important factor is that SUMO has the ability to create random behavior
when loading vehicles. This means it does not run the same simulation again.
Further tests have been done for above variants with various simulation times
between 10 and 60 minutes simulation time per run.
By calculating the accumulated travel time and emission the author evaluates the
final results for the above variants
6.2.2 Results and discussion
First the author explains how accumulated travel time changes with the above
variants.
82
Figure 61 : Travel time evaluation
Category 1: When no of approaching straight going vehicles in left lane = 11 and left
turn vehicles = 5 for a single Green phase.
Category 2: no of approaching straight going vehicles in left lane = 11 and left turn
vehicles = 4 for a single Green phase.
Category 3: no of approaching straight going vehicles in left lane = 11 and left turn
vehicles = 3 for a single Green phase.
Category 4: no of approaching straight going vehicles in left lane = 13 and left turn
vehicles = 3 for a single Green phase.
Further, this figure is constructed after conducting above SUMO experiments.
According to the traffic light cycle used for the experiments, a maximum number of
vehicles which can pass in green and extended green phase (just for left turners) is
15. For first 3 categories, number of straight going vehicles was fixed as 11 as only
11 can pass the intersection during whole green phase. But number of left turners
has changed to check how rules react to it. Up to 5 left turners can pass the
intersection during the extended green phase. Last category is just to see the delay
when number of vehicles exceeds 15. In another words this is definitely create a
traffic jam and the author wanted to see how rules react for this.
Next, explains how accumulated emission changes with the same 4 categories and
variants. All above conditions are valid for this experiment too.
75
80
85
90
95
100
105
Category 1 Category 2 Category 3 Category 4
Accumulated travel time (%)
Category
Random
Estimation
GLOSA
GLOSA+rules
83
Figure 62 : Emission evaluation
According to the above results, this section can conclude as following
There is up to 10% improvement of newly introduce Extended GLOSA+rules
than traditional GLOSA comparing accumulated travel time.
According to the estimation this can be further reduced.
When compare accumulated travel time with random flows, new prototype has
an improvement of 15%
Up to 12% improvement of accumulated emission when compared new
prototype and GLOSA
When adding more straight going vehicles, both accumulated travel time and
emission percentage reduces significantly.
6.2.3 Optimization results and discussion
Optimization process was carried out as mentioned in Concept chapter. Even though
optimizer ran for several hours with different optimization algorithms, it was unable to
find the most optimum values for arrival margin and left turner delay. Due to that the
author ran a parameter grid scan.
The idea was to examine a range of values for the algorithm parameters and check
monitoring values- accumulated travel time and emission figures. Further several
parallel simulations (15 runs) were run with various simulation times.
0
20
40
60
80
100
120
Category 1 Category 2 Category 3 Category 4
Accuumulated emission (%)
Category
Random
Estimation
GLOSA
GLOSA+rules
84
Arrival margin : 0.1 to 5
Left turner delay = 0 to 40
After running several grid tests with various traffic flows, the following results were
achieved.
Figure 63 : Optimizer traveltime results
Figure 64 : Optimizer emission results
As seen in the above diagrams, there were no clear patterns found from Grid test
too. Another main observation was the optimization process was slower than
expected. But the author discusses the reasons for this in the next section.
6.2.4 Further discussion of Grid test
Here the author is explaining further regarding reasons for the above results.
85
One major issue was created by newly designed rules. This does not mean rules are
not working properly. As shown in the above results rules are working as expected.
In fact, rule-based policy approach was much better than existing GLOSA. When
closely examining the random scenarios generated by SUMO, there were situations
where no gaps were found in order to do the lane changes. As an example, when
rule 3 applied for a certain straight going vehicle in the right lane, but no immediate
space was created in the left lane to do the lane change until it reached the
intersection (marked vehicle on the following figure), proposed order will not be
created. When running parallel runs with randomness, there were so many various
situations where the vehicle was unable to do the lane change as expected. This was
the major reason for unclear results during Grid test and optimization.
Figure 65 : Gap creation issue
The slowness of simulation execution: the author observed an unexpected slowness
when running rule-based algorithm and optimization. All rules and Extended-GLOSA
was running for each and every simulation step for all monitored vehicles. This was
the main reason for the slowness.
6.2.5 Solutions and improvements
86
Need to introduce a novel gap creation strategy to create space/gap when there are
no possible spaces in the opposite lane. For this cooperative decision making is
needed for neighboring vehicles.
To reduce the slowness of the simulation: further experiments need to carry out in
order to find the optimum rule and GLOSA execution frequency.
Even after introducing a proposed rule set, there were situations where few left
turners were unable to cross the intersection. Due to that reason left turners to have
to stop next to the intersection when the light is red. This happens when numbers of
left turners are higher than expected by rules. Special rules need to introduce to
address this situation.
6.3 Summary
This chapter explained the results obtained during testing. First, the author explained
the results obtained from FLOW based solution. First, preliminary tests were run in
order to find whether it created the designed SUMO road network. Later, several
tests were run to test the DQN with few variations. The custom algorithm was created
to generate vehicles for each lane. The author was able to create the proposed
scenario as expected. But FLOW only supported Single-agent mechanism yet. But
the development team is currently developing a multi-agent support toolkit too. Due
to the timeline of this project, the author was unable to use the multi-agent version of
FLOW. The model was trained up to 20 hours with several parallel simulations. It
started to respond well during the first few hours of running and showed promising
results. FLOW was able to order vehicles and adjust speed. But after preliminary
runs, the author did not see any further progress even though it ran for several hours
(nearly a day). The main issue of this approach was discussed in detail in this
chapter.
According to the currently available results, the Rule-based policy was more
successful than FLOW. Implemented rules worked as expected. The proposed
solution was compared with traditional GLOSA and performed better. But the
optimizer started to output unexpected results. A parameter grid scan was carried out
in order to further investigate the results. The results were examined thoroughly and
issues which caused the unexpected outcomes were identified. Finally, suitable
solutions were suggested in order to retreive better results in the future.
87
7 Conclusion
This chapter summarizes the work that has been done during the research. The
challenges that the author has faced and future work to improve the proposed
solution are outlined.
7.1 Challenges
The author faced several challenges during the research.
FLOW initial simulation setup: FLOW framework is a newly introduced toolkit
to the research community. In fact, the development is still ongoing. Further,
FLOW is only supported for simple scenarios at the moment. The author has
to spend a considerable amount of time to create the proposed road network
and traffic flow.
Finalization of GLOSA: The author has found various approached to create
GLOSA. This research used GLOSA algorithm suggested by [5]. But during
the implementation phase, it did not work as expected and the author had to
take extra steps to solve issues. Further, extensive tests have been done to
check whether GLOSA is functioning properly.
FLOW multi-agent approach setup: As mentioned during the evaluation
chapter, FLOW only supports single-agent architecture. Recently, FLOW
development team started to implement a multi-agent supported toolkit too.
However, the author was unable to use the new FLOW solution due to the
research timeline. Due to that reason, the author investigated the possibility of
converting existing FLOW to a multi-agent architecture.
Initial development of rules: After unexpected issues with FLOW, the research
focused on developing a rules-based RL policy. Due to the nature of the
problem, the initial designing phase of rules was complicated. Mainly the
problem was how to arrange the approaching vehicles and to see what the
best arrangement was.
Investigating optimization issues: This issue was already discussed in the
evaluation chapter. Optimization produced unexpected results even after
running it several times. After running a comprehensive parameter scan, the
author was able to identify the issues for the unexpected results.
7.2 Future improvements
88
DQN approach with multi-agent architecture: FLOW based DQN solution was
very slow due to the single-agent architecture. This should run on a GPU
environment and needs to provide necessary hardware. One disadvantage of
DQN is the longer training times. Further, DQN needs more training time when
it tried to solve a complex problem due to its reward, trial and error
mechanism.
Extension of rules: The author has seen several issues during the optimization
phase. Especially, the gap creation between vehicles when it changes the
lane is necessary. This version does not handle this issue and rules only
execute a lane change when there is an available space.
Speed up simulations; The simulation was slow and this needs to speed up by
reducing the calling frequency of rules and GLOSA.
7.3 Concluding remarks
The research introduces a reinforcement learning based approach in order to
optimize the traffic flow at an intersection. It focuses on reducing travel time and
emission of vehicles. This approach is very successful when the scenario consists of
more straight going vehicles than left turners. The author proposes a vehicle
reordering mechanism that establishes a specific sequence of vehicles in the traffic
flow before it reaches the intersection.
A separate chapter describes the basics of “Reinforcement learning”. It explains a
theoretical aspect of reinforcement learning. Mainly how reinforcement learning
works, components of RL scenario and existing algorithms.
Before introducing the new approach, a literature review was carried out. The
literature review section pointed out the most important findings. First, it introduced
the basic of traffic engineering. Before stating about GLOSA variants, the author
mentioned about Car2X system. The structure of Ca2x and how it works in the real
environment was discussed. GLOSA is the only car2x application discussed here as
it is considered as the baseline for the project. Next, the literature review focuses on
various RL algorithms that have been used to optimize the traffic flow and the
intersection. Simple DQN, CNN, More complex DQNs are predominant in the existing
research body.
The concept chapter introduced the DQN based solution and rule-based policy. DQN
structure, observation space, action space, reward functions were described in detail
89
for both solutions. DQN network had 7 neurons in the input layer and 2 in the output
layer.
The hidden layer had 32*32 (2 hidden layers, each layer has 32 neurons) and 64*64
architecture. Next solution was the rule-based policy. It consists of several rules
which control an approaching vehicle in each simulation step. Rules were designed
to slow down left turners and allow more straight going vehicles to overtake when
possible. All possible vehicles pass the intersection during the next green phase.
More improved GLOSA was introduced as the speed advisory and it can support
newly introduced rules which provide lane changes. GLOSA allows vehicles to reach
to the intersection exactly on time (when the green phase starts). Due to that reason,
the vehicle will not come to a full stop next to the intersection. An optimizer was
responsible for optimizing rules further.
The implementation chapter explains the technical aspects of the solution. All the 3rd
party frameworks were introduced here. The author used SUMO [10] , SUMO-Traci
[47], FLOW [11] to implement the first solution. Rules were fully implemented using
Traci[47]. Also, it consists of important code snippets for creating SUMO road
network, rules and extended-GLOSA.
The final step was to evaluate the solutions and point out the issues. FLOW based
solution did not provide the expected results. After running simulation up to 20
hours, the progress was slow. But still, the author was able to see a small
development. It was able to do lane changes and speed changes. Due to the single-
approach setup, the simulation was much slower than thought. But the FLOW- multi-
agent setup was not available during the development phase of this research.
Next approach, rules were evaluated with GLOSA. Rules were 10 to 12% percent
more efficient than existing GLOSA. But the optimizer showed unexpected results
again. The author troubleshoots the issue and described extensively in the
Evaluation chapter. Further, the promising solutions were explained. As mentioned in
the Introduction chapter the initial proposal was to reorder approaching left turners
and straight going vehicles in order to improve the intersection throughput. It is
successful as the new rule based policy uses rules to dynamically change the speed
and the lane of each vehicle in order to achieve a better sequence. Further, the
output was more efficient than the currently existing GLOSA.
90
Bibliography
[1] Jakob Erdmann, “Combining Adaptive Junction Control with Simultaneous Green-Light-Optimal-Speed-Advisory,” presented at the 2013 IEEE 5th International Symposium on Wireless Vehicular Communications (WiVeC), Dresden,Germany, 2013.
[2] Yang, Kaidi, Isabelle Tan, and Monica Menendez., “A reinforcement learning based traffic signal control algorithm in a connected vehicle environment,” presented at the 17th Swiss Transport Research Conference (STRC 2017)., Lausanne, 2017.
[3] B. Otkrist and Naik, Nikhil and Raskar, Ramesh Bowen, “Designing neural network architectures using reinforcement learning,” ArXiv Prepr., 2016.
[4] Harding, Y. Gregory, and J. Wang, “Vehicle-to-vehicle communications: Readiness of V2V technology for application,” NHTSA, Technical HS 812 014, 2014.
[5] K. K. Mehrdad Dianati, David Riecky Ralf Kernchen, “Performance study of a Green Light Optimized Speed Advisory (GLOSA) Application Using an Integrated Cooperative ITS Simulation Platform,” presented at the 2011 7th International Wireless Communications and Mobile Computing Conference, Turkey, 2011, p. 6.
[6] R. S. ; A. T. Reinhard German ; David Eckhoff, “Multi-hop for GLOSA Systems: Evaluation and Results From a Field Experiment,” presented at the 2017 IEEE Vehicular Networking Conference (VNC), Torino,Italy, 2017.
[7] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second. MIT Press, 2017.
[8] DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning, “DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning,” ArXiv Prepr. ArXiv, Jan. 2018.
[9] S. Ricchard S, “Introduction,” in The challenge of reinforcement learning, Boston: Springer, 1992.
[10] DLR, “SUMO - Simulation of Urban Mobility,” Company web, 2018. . [11] C. W. Kanaad Parvate, Eugene Vinitskyz, Alexandre M Bayen Aboudy Kreidieh,
“Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control,” 2017.
[12] M. Riedmiller, M. V. .. Kavukcuoglu, K., “Playing atari with deep reinforcement learning,” ArXiv Prepr., vol. 1312.5602, 2013.
[13] M. Coggan, “Exploration and exploitation in reinforcement learning,” McGill Univ., vol. Research supervised by Prof. Doina Precup, CRA-W DMP Project at McGill University, 2004.
[14] L. Riedmiller, Martin Martin, “An algorithm for distributed reinforcement learning in cooperative multi-agent systems,” in In Proceedings of the Seventeenth International Conference on Machine Learning}, Citeseer, 2000.
[15] Thomas Simonini, “Diving deeper into Reinforcement Learning with Q-Learning,” Apr-2018. .
[16] Mathew, Tom V., and KV Krishna Rao, Fundamental parameters of traffic flow. NPTE, 2016.
[17] C. F. D. Daganzo, Carlos, “Fundamentals of transportation and traffic operations,” in Fundamentals of transportation and traffic operations, vol. 30, Oxford: Pergamon, 1997.
91
[18] B. D. M. Sven Maerivoet, “Traffic flow theory,” in Physics and Society, 2005, p. 33.
[19] D. Stephens and J. Schroeder, “Vehicle-to-infrastructure (V2I) safety applications performance,” US Dept.of Transportation, Technical FHWA-JPO-16-253, 2013.
[20] IEEE Standards Association, “IEEE 802.11p.” [21] H. Stübing, “Car-to-X Communication: System Architecture and Applications,” in
Multilayered Security and Privacy Protection in Car-to-X Networks, Wiesbaden: Springer, 2013, pp. 9–19.
[22] German Association of the Automotive Industry., “SimTD,” Projrct SimTD, 2018. . [23] roberto Baldessari and W. Zhang, “CAR-2-X Communication SDK – A Software
Toolkit for Rapid Application Development and Experimentations,” presented at the nternational Conference on Communication, Dreseden, 2009.
[24] D. E. Bastian Halmos ; Reinhard German, “Potentials and Limitations of Green Light Optimal Speed Advisory Systems,” presented at the 2013 IEEE Vehicular Networking Conference, Boston,USA, 2013.
LIGHT OPTIMIZATION WITH USE OF REINFORCEMENT LEARNING,” presented at the Intelligent Transportation Systems (ITS), 2014.
[27] PTV grou[, “PTV Vissim,” 2010. . [28] W. W. Mengqi LIU, Jiachuan DENG, “Cooperative Deep Reinforcement Learning
for Trafic Signal,” presented at the International Workshop on Urban Computing, canada, 2017.
[29] US Department of Transportation, “Federal Highway aministration,” 2018. . [30] H. W. Zhenhui Li, “IntelliLight: A Reinforcement Learning Approach for Intelligent
Traffic Light Control,” presented at the 8: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, 2013, p. 10.
[31] A. and S. Van Hasselt, Hado and Guez David, “Deep Reinforcement Learning with Double Q-Learning,” Cornell Univ., vol. 2, p. 5, 2016.
[32] M. and V. H. Wang, Ziyu and Schaul, Tom and Hessel Hado and Lanctot, “Dueling network architectures for deep reinforcement learning,” arXiv, 2015.
[33] J. G. Minoru Ito, Norio Shiratori Yulong Shen, Jia Liu, “Adaptive Traffic Signal Control: Deep Reinforcement Learning Algorithm with Experience Replay and Target Network,” 2017, p. 10.
[34] D. I. Kaushik Subramanian, Kikuo Fujimura Reza Rahimi, Akansel Cosgun, “Navigating Occluded Intersections with Autonomous Vehicles using Deep Reinforcement Learning,” presented at the IEEE International Conference on Robotics and Automation, 2017.
[35] Ankur Mehta ; Eugene Vinitsky, “Framework for control and deep reinforcement learning in traffic,” presented at the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 2017.
[36] J. S. Yan Duan, Xi Chen, Rein Houthooft Pieter Abbeel, “Benchmarking Deep Reinforcement Learning for Continuous Contro,” presented at the Proceedings of the 33rd International Conference on Machine Learning, 2016.
[37] L. and S. Brockman, G., Cheung, V., Pettersson, L Jonas and Schulman, “OpenAI Gym,” arXiv, 2016.
[38] E. J. and A. S. Mutiara Maulida,Herman Y. Sutarto, “Queue Length Optimization of Vehicles at Road Intersection Using Parabolic Interpolation Method,” presented
92
at the International Conference on Automation, Cognitive Science, Optics, Micro Electro-Mechanical System, and Information Technology, Indonesia, 2015.
[39] R. K. yin Min Keng, Helen Chuo Kenneth Tze, “Genetic algorithm based signal optimizer for oversaturated urban signalized intetsection,” presented at the IEEE Inteernational Conference on Consumeer Electronics, Malaysia, 2016.
[40] Guo, X., Song, Y, “Research of traffic assignment algorithm based on adaptive genetic algorithm,” presented at the Computing, Control and Industrial Engineering (CCIE), 2011 IEEE 2nd International Conferenc, 2011.
[41] Leng, Junqiang, and Yuqin Feng., “Research on the Fuzzy Control and Simulation for Intersection Based on the Phase Sequence Optimization,” in Measuring Technology and Mechatronics Automation, 2009, 2009.
[42] P. B. Vipul Vilas Sawake1, “Review of Traffic Signal Timing Optimization based on Fuzzy Logic Controller,” presented at the International Conference on Innovation in Information Embedded and Communication System, 2017.
[43] M. and H. Abdelhameed, Magdy M and Abdelaziz S. and Shehata, Omar M., “A hybrid fuzzy-genetic controller for a multi-agent intersection control system,” presented at the Engineering and Technology (ICET), 2014 International Conference, 2014.
[44] A. and C. Choi, Myungwhan and Rubenecia Hyo Hyun, “Reservation-based cooperative traffic management at an intersection of multi-lane roads,” presented at the Information Networking (ICOIN), 2018 International Conference, 2018.