ORIGIN-DESTINATION MATRIX ESTIMATION USING BIG TRAFFIC DATA: A STRUCTURAL PERSPECTIVE Krishna Nikhil Sumanth Behara Master of Civil (Transportation) Engineering Birla Institute of Technology and Science (BITS), Pilani, India, 2012 Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy (PhD) School of Civil Engineering and Built Environment Science and Engineering Faculty Queensland University of Technology 2019
250
Embed
n9351833 krishna nikhil sumanth behara thesis · 2019. 9. 3. · ORIGIN-DESTINATION MATRIX ESTIMATION USING BIG TRAFFIC DATA: A STRUCTURAL PERSPECTIVE Krishna Nikhil Sumanth Behara
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ORIGIN-DESTINATION MATRIX
ESTIMATION USING BIG TRAFFIC DATA:
A STRUCTURAL PERSPECTIVE
Krishna Nikhil Sumanth Behara
Master of Civil (Transportation) Engineering Birla Institute of Technology and Science (BITS), Pilani, India, 2012
Submitted in fulfilment of the requirements for the degree of
Doctor of Philosophy (PhD)
School of Civil Engineering and Built Environment
Science and Engineering Faculty
Queensland University of Technology
2019
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective i
Keywords
Bi-level; Bluetooth; subpaths; Brisbane city; BSTM; clustering OD matrices;
DBSCAN; gradient descent; Mean geographical window based SSIM (GSSI); Mean
Levenshtein distance for OD matrices (NLOD); non-assignment-based; local sliding
window; origin destination (OD) matrix; OD matrix estimation; OD matrix structure;
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xvii
List of Publications
JOURNALS
Behara, K. N., A. Bhaskar, and E. Chung. Levenshtein distance for the
structural comparison of origin-destination matrices (Chapter 3 of thesis and under
review in Transportation Research Part C: Emerging Technologies).
Behara, K. N., A. Bhaskar, and E. Chung. Geographical window based
structural similarity index for OD matrices comparison (Chapter 3 of thesis and under
review in Journal of Intelligent Transportation Systems).
Behara, K. N., A. Bhaskar, and E. Chung. OD matrix estimation using observed
traffic counts and Bluetooth subpath flows (Chapter 4 of thesis and to be submitted to
Transportation Research Part C: Emerging Technologies by 31st July 2019).
Behara, K. N., A. Bhaskar, and E. Chung. A non-assignment-based approach to
estimate OD matrices using observed turning proportions and structural knowledge of
Bluetooth trips (Chapter 5 of thesis and to be submitted to IEEE Transactions on
Intelligent Transportation Systems by 7th Aug 2019).
Behara, K. N., A. Bhaskar, and E. Chung. Clustering multi-density OD matrices
datasets using structural proximity measures: A case study on Brisbane Bluetooth
based OD (Chapter 6 of thesis and to be submitted to a Q1 journal by 21st Aug 2019).
CONFERENCES
Behara, K. N., Bhaskar, A., & Chung, E. (2017). Insights into geographical
window based SSIM for comparison of OD matrices. In 39th Australasian Transport
Research Forum (ATRF), 27-29 November 2017, Auckland, New Zealand (abridged
version).
Behara, K. N., Bhaskar, A., & Chung, E. (2017). Classification of typical
Bluetooth OD matrices based on structural similarity of travel patterns- Case study on
Brisbane city. In Transportation Research Board 97th Annual Meeting, 7th-11th
January 2018, Washington D.C., USA.
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xviii
Behara, K. N., Bhaskar, A., & Chung, E. (2018). Novel approach for OD
estimation based on observed turning proportions and Bluetooth structural
information: Proof of the concept. In 40th Australasian Transport Research Forum
(ATRF), 30-31 October 2018, Darwin Convention Centre, Darwin, Australia
(abridged version).
Behara, K. N., Bhaskar, A., & Chung, E. (2018). Levenshtein distance for the
structural comparison of OD matrices. In 40th Australasian Transport Research Forum
(ATRF), 30-31 October 2018, Darwin Convention Centre, Darwin, Australia
(abridged version).
Behara, K. N., Bhaskar, A., & Chung, E. (2019). Estimating OD matrices from
observed trajectories and link counts. In World Conference on Transport Research -
WCTR 2019, 26-31 May 2019, Mumbai, India (abridged version).
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xix
Notations
It refers to the origin zone number e.g. oth origin
Number of zones which serve as origin points
It refers to the destination zone number e.g. dth destination
Number of zones which serve as destination locations
It refers to the OD pair e.g. wth OD pair
It is the number of OD pairs in the OD matrix; w ϵ W
OD vector to be estimated
Target OD vector
True OD matrix
Prior OD matrix
OD matrix in Aimsun format
The flows of wth OD pair in
The flows of wth OD pair in General dimensions of OD matrix whenever expressed in matrix form
Trips produced from oth zone
Origin flows vector to be estimated
Trips attracted to dth zone
It refers to the link number e.g. lth link
It is the total number of selected links in the network
It is the simulated/estimated flow on lth link
It is the observed flow on lth link
It is the estimated link flows vector of size L*1
It is the observed link flows vector of size L*1
It refers to the path number connecting wth OD pair
It refers to the path number connecting lth link with oth origin
It refers to the number of paths connecting wth OD pair
It refers to the number of possible paths connecting lth link with oth
origin It is flow on kth path
Kronecker Delta function. It is equal to 1, if lth link is present in kth
path, and 0 otherwise.
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xx
Weight factor of OD flows deviation from target OD matrix in the
objective function
Weight factor of link flows deviation in the objective function
It refers to the travel cost for lth link.
It is the path cost though kth path between wth OD pair
It is the cost on the shortest route for wth OD pair,
It represents the observed Bluetooth OD vector
It represents the observed Bluetooth subpath flows vector
It represents the consolidated vector of Bluetooth subpath flows
observed from several days of similar travel patterns Path flows from the model (Aimsun)
It represents the vector of OD flows that are Bluetooth connected
It represents the vector of true OD flows that are Bluetooth connected
Incidence matrix that converts X to X*
It is the proportional assignment matrix linking link flows with OD
User equilibrium assignment (link-proportion) matrix (either analytical
or simulated) User equilibrium path-proportion matrix (either analytical or
simulated) The proportion of Xw flowing in lth link
Local window ID Number of local windows
Likelihood
Error term for the OD matrix (difference between and X)
Error term for the link flows (difference between and Y)
Error term for the link flows (difference between and AX) It is a dispersion parameter to describe road users’ perception of travel
costs Link OD matrix
It is the incidence matrix; that is, a network-based information
Time slice
It is the trips generated from oth origin during tth time slice
It is the proportion of trips generated from oth origin to dth destination
It is the OD flow between oth origin and dth destination during time-
slice, t
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxi
Correlation coefficient between and Y Scale factor expressed as a sum of ratios of and
compares the mean values ( ) of the group of OD pairs (i.e. x
and y) from both matrices, X and Y. compares the standard deviations ( of the group of OD pairs compares the structure by computing correlation between the
normalised group of OD pairs (i.e. x and y) from both matrices, X and
Y. Sequence of Levenshtein edit operations to transform strings or sorted
kth Levenshtein edit operation
The Levenshtein matrix for comparing strings
It is set including that is ith preferred destination from oth origin
It is set including that is the corresponding demand value of
from oth origin It is the sorted set of destination IDs ( ) and the corresponding
demand from oth origin ( )
Penetration rate of Bluetooth inferred trips
Percentage of Bluetooth connected OD pairs
Step length at kth iteration
Objective function value
Step length parameter to scale-up by times Step length parameter to scale-down by times Turning Proportion matrix developed from observed turning
proportions It refers to the intersection number
It is the turning proportion observed at intersection present along
(kl,o)th path
It refers to the probability of origin flows passing through (kl,o)th path
and observed at lth link It is the total probability of trips generated from oth origin observed at
lth link
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxii
Abbreviations
ABS Australian bureau of statistics
ANOVA Analysis of variance
ATAP Australian transport assessment and planning
AVI Automatic vehicle identification
BCC Brisbane City Council
BMS Bluetooth media access control scanner
B-OD Bluetooth based Origin Destination matrix
BPR Bureau of Public Roads
B-SP Bluetooth based subpaths
BSTM Brisbane Strategic Transport Model
CBD Central Business District
CDA Combined distribution and assignment
DBSCAN Density-based spatial clustering of applications with noise
EBM Eigenvalue-based measure
EM Entropy maximisation
GEH Geoffrey E. Havers statistic
GPS Global positioning system
GLD Generalised Levenshtein distance
GLS Generalised least squares
GU Global Theil measure of fit
HTS Household travel survey
IM Information minimisation
ITS Intelligent transport systems
KF Kalman filter
LOD Levenshtein distance for OD matrices
LSQR Least squares
LW Long weekend
MAE% Mean absolute error percent
MAER Mean absolute error ratio
MAPE% Mean absolute percent error
GSSI Mean geographical window based structural similarity index
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxiii
ML Maximum likelihood
MLPP Most likely possible paths
NLOD Mean Normalised Levenshtein distance for OD matrices
Miska, 2015), trajectories identification (Michau, Nantes, et al., 2017), and OD
demand estimation (Michau, et al., 2016). The validity of Bluetooth OD data has been
confirmed in the past by using data from other sources, such as video and automatic
license plate recognition OD data (Blogg, Semler, Hingorani, & Troutbeck, 2010) and
vehicle tracking using time lapse aerial photography (TLAP) (Chitturi, et al., 2014).
Interested readers can refer to Bhaskar and Chung (2013) for a fundamental
understanding of Bluetooth MAC scanner (BMS) data as complementary transport
data.
2.7 SUMMARY OF LITERATURE REVIEW
In summary, this comprehensive review of the literature identified the following
major research gaps:
1. Most studies have focused on developing formulations and solution
algorithms for improving the quality of OD matrix estimates.
Specifically, these studies adopted bi-level framework for OD
estimation. The focus has also shifted from static to dynamic, and
recently, to quasi-dynamic formulations. However, there has been less
focus on exploiting the higher-dimensions of OD flows; that is, the
structural information of OD matrices that cannot be neglected in either
the OD matrix estimation process or in the formulation of statistical
performance measures.
Chapter 2: Literature Review 55
2. Most studies are entirely dependent on traffic count-based observations
because loop detectors are the dominant source of traffic data. Although
advancements in technology seem to provide additional data sources,
their integration and contribution into the existing transport models
seems to be still challenging.
By addressing these gaps, this study aims to develop statistical methods to
exploit the structural information of OD matrices for the comparison of OD matrices
and develop methods to incorporate the structural knowledge of Bluetooth trips into
the OD matrix estimation process in the forth-coming chapters.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 57
Chapter 3: Development of Statistical
Metrics for the Structural
Comparison of OD Matrices
This chapter begins with a background (Section 3.1); introduces and discusses
the limitations of SSIM (Section 3.2); develops GSSI (Section 3.3); introduces
traditional Levenshtein distance, extends its formulation for the comparison of OD
matrices (NLOD), and compares it with Wasserstein metric (Section 3.4); performs a
sensitivity analysis for the proposed GSSI and NLOD (Section 3.5); and finally
provides summary of the chapter in Section 3.6.
3.1 BACKGROUND
Mathematical formulations of some of the widely used traditional metrics for
comparison of OD matrices were previously discussed in Section 2.5. These metrics
compare the individual cells of OD matrices and compute a single statistic value by
aggregating/averaging the deviation over individual cells. However, they lack the
ability to capture structural information about the matrices. To demonstrate this,
consider an example of comparing OD matrices M1 and M2 with a reference OD
matrix MR (Figure 3.1). Here, M1 is simply 1.1 times MR, and M2 is chosen randomly.
The results of comparing matrices M1 and M2 with MR using traditional metrics (MSE,
RMSE, GU, and MAE) are presented in Table 3.1. The first column of Table 3.1
presents the metrics, and the second and third columns are the values from metrics for
both cases, respectively. When compared with the same reference matrix, visual
representation illustrates that the demand distributions (or structure) of M1 are closer
than that of M2. This is obvious, because matrix M1 is a scaled version (1.1 times) of
the reference matrix. In this example, it is demonstrated that traditional metrics yield
the same results for both cases (Table 3.1) and fail to capture the structural differences
between OD matrices. The importance of structural comparison therefore demands the
need for new metrics in addition to existing traditional ones. Addressing this need, the
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 58
Structural Similarity index (SSIM) is applied in the literature, the details for which are
presented in the following section.
Figure 3.1: Comparison of MR with OD matrices M1 and M2
Table 3.1: Comparison results using the traditional metrics
Traditional
Metrics
Comparison of
(M1, MR)
Comparison of
(M2, MR)
MSE 17370 17370
RMSE 131.8 131.8
GU 0.05 0.05
MAE 0.10 0.10
3.2 STRUCTURAL SIMILARITY (SSIM) INDEX
The SSIM is borrowed from the field of image processing. Wang et al. (2004)
discussed the limitations of traditional metrics to capture structural differences in
images. They proposed the SSIM index as a quantitative measure to compare the
quality of two natural images and observed that statistical measures such as MSE may
fail to measure the structural degradation of one image with respect to another. As
shown in Figure 3.2a, two images estimated from two different algorithms, namely
gradient ascent and gradient descent, can each have the same MSE of 2,500 but
different SSIM values of 0.9337 and -0.5411, respectively.
Djukic, et al. (2013) applied the SSIM rationale in the context of an OD matrix
and demonstrated that two OD matrices can have same MSE value but different SSIM
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 59
values. For instance, Figure 3.2b shows an MSE of 69 each, while the SSIM values
are 0.8724 and 0.9702.
Figure 3.2: (a) Comparison of Images (source Wang et al., 2004) vs (b) comparison of OD matrices (source Djukic et al., 2013)
The formulation for local SSIM, as provided by Djukic, et al. (2013), is based
on the product of three individual formulations (Equations (37, 37a and 37b) related
to the mean, standard deviations, and coefficient correlations between the groups of
OD pairs.
(37)
(37a)
(37b)
;
> 0; (37c)
Assuming and C3=C2/2
; [-1<=SSIM<=1]; (37d)
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 60
; [-1<=MSSIM<=1] (37e)
Where,
and represent the two OD matrices to be compared; while and
represent the group of OD pairs within th local window in both matrices. The concept
of local windows is further explained in Section 3.2.1.
compares the mean values ( ) of the group of OD pairs in
both matrices;
compares the standard deviations ( of the group of OD pairs
in both matrices;
compares the structure by computing correlation between the
normalised group of OD pairs in both matrices. Normalised and with unit
standard deviation and zero mean are equal to and , respectively;
are constants to stabilise the result when either the mean or
standard deviation is close to zero. is generally assumed to be . Previous
studies have suggested values of and for and , respectively (Pollard,
Taylor, van Vuren, & MacDonald, 2013). For the analysis conducted in this research,
the OD values in the SSIM window were not all zero; hence, both and were
assumed to be zero.
The parameters are used to adjust relative importance of mean,
standard deviation and structural components, respectively. Generally, they are
assumed to be equal to 1.
is the structural similarity of the local windows from both
matrices.
is the overall similarity of OD matrices, X and , computed by
taking the average of the SSIM values of number of local windows.
The range of values for SSIM or MSSIM can be between -1 and 1. The value of
1 implies that matrices are the same, while the reverse is true when value is -1.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 61
3.2.1 Local sliding window
The local window is generally a square box of size far less than that of OD
matrix. It is often referred as “local sliding window” because the traditional SSIM
computes statistics on the local window (consisting group of pixels or OD pairs) that
slides pixel by pixel or cell by cell over the entire image or OD matrix. The concept of
sliding was originally used for the comparison of images where it would allow SSIM
to compute local statistical characteristics so that local image distortions were better
accounted for (Brooks, Zhao, & Pappas, 2008). For ease of explanation, consider the
example presented in Figure 3.3. Here, two 4 × 4 OD matrices, X and Y, are presented
in columns one and two, respectively. These two OD matrices are compared using
SSIM. The local sliding window of 2 × 2 sub-matrix is considered and represented as
coloured cells. This window slides cell by cell over the entire OD matrix, and in the
current example, results in 9 matrix comparison pairs, as illustrated in Figure 3.3 5.3.
The local SSIM computes the structural similarity between the sub-matrices
corresponding to the windows from both OD matrices. The final SSIM value,
represented as mean SSIM (MSSIM), is computed by averaging all local SSIM values
computed for all sliding windows. In the example, the SSIM value for local window
in Figure 3.3a is 0.5963 and MSSIM over all local 9 SSIMs is 0.6777.
Figure 3.3: An example of sliding window for SSIM calculation.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 62
The differences between the structural comparison in images and OD matrices
include:
In images, the nearby pixels are correlated with respect to the contrast
and other features. However, in an OD matrix, the correlations between
the OD pairs depend on many factors. Generally, OD pairs sharing
similar activities, trip attractions, trip productions, distances, travel cost
or similar geographical locations, etc., are correlated. According to
Djukic (2014) correlations between OD pairs are reflected in their
demand volumes (especially if volumes are high) and by matrix
reordering, correlated OD pairs can lie in the same neighbourhood; that
is, all high volume OD pairs on one side and remaining on the other side.
Djukic (2014) proposed to re-order the OD matrix (i.e., sorting each row
of OD matrix in the order of OD pair volumes). However, if the
arrangement of zonal IDs in both matrices are different upon re-ordering,
then reordering is avoided.
The cell of an OD matrix is equivalent to the pixel of an image. However,
the pixels values range between 0 and 255 for greyscale images, but the
range of OD flows is large, and it depends on many factors such as
activities, distance etc.
Although the formulation of SSIM seems to be holistic, its existing application
still has the following shortcomings.
Firstly, SSIM is sensitive to the size of the local window, and as such, there is
no clear consensus on the final MSSIM value. To circumvent this ambiguity, Djukic
(2014) suggested computing the SSIM over the entire OD matrix without using any
local window. However, doing so will result in a statistical estimation that is less
sensitive to structural changes within the OD matrix. According to law of large
numbers, the variance of the sample tends to decrease if the sample size increases.
Since larger window dimensions imply a greater number of OD pairs to be compared,
the variance (distortion) and covariance (correlation distortion) parameters that capture
structural changes within and between OD matrices should be reduced. In other words,
SSIM is less sensitive to correlation distortions when the covariance is captured for
larger window sizes.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 63
To demonstrate the sensitivity of SSIM towards window size, consider a mean
SSIM (MSSIM) value computed using different window sizes (3×3 to 20×20) for
Monday and Sunday, and Monday and Tuesday OD matrices pairs constructed from
the BCC data. Figure 5.4 presents the results, where the blue line is for Monday and
Sunday and the orange line is for the Monday and Tuesday comparison. The x-axis
represents the size of the local window and y-axis shows the MSSIM value. The order
of OD pairs is the same in the matrices for Sunday, Monday, and Tuesday. As the size
of sliding window increases, the sensitivity of SSIM towards subtle differences within
the OD matrix decreases. The MSSIM values increase as the sliding window size
increases. Similar results were observed by Brooks et al. (2008) when comparing
images using different window sizes. The rate of increment of MSSIM values was less
for the Monday and Tuesday pair compared to the Monday and Sunday pair. This is
due to similar travel patterns between Monday and Tuesday (both working days) and
less similar patterns between the Monday and Sunday pair. There is no clear consensus
reported in the literature regarding the level of acceptability of the sliding window size
and the resulting SSIM values.
Figure 3.4: Sensitivity of MSSIM towards local window size
Second, the local SSIM value computed on a group of OD pairs does not have
any physical meaning or significance attached to it unless they are correlated. The
group of OD pairs sharing similar structural properties or travel patterns are generally
correlated. Djukic (2014) tried to capture these correlations among the OD pairs from
their flow values (especially if volumes are high) by matrix reordering (i.e. sorting
each row of OD matrix in the order of OD pair volumes). However, the structural
properties of OD matrix include many other underlying factors such as the distribution
(3X3) (6X6) (8X8) (15X15) (20X20)Mon and Sun 0.7337 0.7892 0.807 0.8164 0.8292Mon and Tue 0.9939 0.9975 0.9985 0.9986 0.9986
0.7
0.75
0.8
0.85
0.9
0.95
1
MSS
IM va
lues
Size of the sliding window
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 64
of trips, geographical integrity, network topology etc., if accounted, could capture
better OD structural information.
To this end, this study develops mean geographical window-based SSIM (GSSI)
as an extension to Djukic (2014)’s SSIM approach. It is further discussed in the
following section.
3.3 MEAN GEOGRAPHICAL WINDOW-BASED SSIM (GSSI)
The application of the SSIM was undertaken in this study by first arranging the
origins and destinations of the OD matrix in order of geographical similarity, and
subsequently defining the windows for a SSIM analysis consistent with the
geographical boundaries. Here, the window size varied with the geographical
boundaries considered in the rearranged OD matrix. This is different from the
traditional SSIM application, where the size of the window is fixed. The window
associated with the geographical boundary is termed as a geographical window and
the SSIM computed over the geographical windows is termed as geographical window
based SSIM, hereafter. This process is explained with the help of an example from the
Brisbane City Council (BCC), as detailed below.
The proposed geographical window has a physical significance associated with
it, to ensure geographical integrity and capture spatial correlation by computing
statistics on all lower zonal level OD pairs belonging to the same higher zonal level
OD pair. For instance, the higher zonal level is SA4, with SA3 as the lower level for
the BCC region. The size and shape of a geographical window is defined by the
number of SA3 zonal pairs present within the respective SA4 zonal pair. Therefore, in
this approach, the local geographical window need not always be a square matrix.
Figure 3.5 shows that each cell of the OD matrix represents a SA3 level OD
pair. Here, the OD matrix is rearranged so that the SA3 level origins (rows) and
destination (columns) can be grouped into respective SA4 level. For instance, SA3 (1)
to SA3 (j) from SA4 (1) level are arranged together. The SA4 level boundaries now
define the geographical SSIM windows. The yellow shaded region represents a
window covering OD pairs from SA4 (1) to SA4 (2).
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 65
Figure 3.5: An example to illustrate the proposed geographical window-based approach
Figure 3.6 demonstrates the application of the SA4 based geographical windows
for comparing SA3 (20 × 20) OD matrices of a Monday (Figure 3.6a) and a Sunday
(Figure 3.6b). The SA4 zones used in designing geographical windows are: Brisbane
East, Brisbane North, Brisbane South, Brisbane West, and Brisbane Inner. For
example, consider the geographical window of SA4 OD pair Brisbane East and
Brisbane North, which consists of SA3 OD pairs 30,101 to 30,201, 30,202, 30,203,
and 30,204; 30,103 to 30,201, 30,202, 30,203, and 30,204. These SA3 OD pairs are
geographically correlated because they belong to same SA4 origin (Brisbane East) and
SA4 destination (Brisbane North). Here, Brisbane East and Brisbane North consist of
2 and 4 lower level (SA3) zones, respectively. The size of corresponding local
geographical window is 2 × 4.
The local SSIM values are calculated for all geographical windows exclusively,
and the mean geographical window based SSIM (GSSI) was the average of all local
SSIM values. In the above example, the total number of geographical windows
considered is equal to the number of higher order OD pairs, which is 25. GSSI for
Sunday-Monday matrices pair is 0.7231. See Table 3.2 for the local geographical
window based SSIM and GSSI.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 66
(a)
(b)
Figure 3.6: Splitting (a) Monday and (b) Sunday OD matrices into geographical (SA4) windows
Note that the afore-mentioned example is explained from the perspective of the
statistical zones used in Australia. However, the proposed geographical window-based
approach holds good for any other study region with its own hierarchical zonal
structure. Although the method demonstrated geographical windows using SA4 zones
on SA3 OD pairs, any combination of higher and lower level OD pairs can be used for
the same purpose; for instance, SA3 OD pairs can be used as higher level geographical
windows for SA1 OD pairs, etc. The geographical window based SSIM approach has
the following advantages over traditional SSIM.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 67
3.3.1 Structural comparison of local travel patterns
While the GSSI value provides the overall structural comparison, the local
geographical window based SSIM value has its own practical significance. For
instance, it provides opportunities to compare the local travel demand distribution
(travel patterns) between different suburbs of a region that a sliding local window is
not capable of determining. Figure 3.7 illustrated that Sunday travel patterns differed
majorly for the suburb pair Brisbane South to Brisbane North. This is reflected by a
local SSIM value of 0.4653 (see Figure 3.7 (left) and the bold value in Table 3.2). On
the other hand, for another suburb pair Brisbane South to Brisbane West the Sunday
travel patterns are similar (if not exact) to that of Monday, with a SSIM value of 0.8037
(see Figure 3.7 (right) and the bold value in Table 3.2).
Figure 3.7: Insights into local travel patterns using geographical local window: (left) Brisbane South to Brisbane North and (right) Brisbane South to Brisbane West
Table 3.2: GSSI and local SSIM values: Monday vs Sunday B-OD matrices
2008) etc. While the trajectory-based information provides more mobility detail than
that of OD flows, the latter is computationally effective for analysing larger spatio-
temporal dimensions of travel patterns (say daily mobility of any large-scale city)
(Guo, et al., 2012).
Very limited studies are found in the literature in regards to classification of days
based on traffic data such as speed/occupancy (Rakha & Van Aerde, 1995); travel time
series (Chung, 2003); traffic load profiles (Friedrich, Immisch, Jehlicka, Otterstätter,
& Schlaich, 2010) and OD flows (Andrienko, Andrienko, Fuchs, & Wood, 2017; Guo,
et al., 2012; Yang, Yan, & Xu, 2017). With respect to OD flows related patterns, graph
partitioning methods have becoming more popular. For example Guo, et al. (2012)
applied dynamic graph partitioning to represent day-of-the-week patterns using smart
card and Bluetooth data; and Naveh and Kim (2018) used trips ends of taxi trajectory
data to spatially cluster the GPS points and analyse their patterns across space and
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 161
time. However, representation of OD flows in dynamic graphs is computationally
expensive due to huge spatio-temporal dimensions of OD. Addressing this, previous
studies have proposed dimensionality reduction methods such as Principal Component
Analysis (PCA), Singular Value Decomposition (SVD) (Yang, et al., 2017); Non-
Negative Tensor Factorization methods (Guo, et al., 2012); and spatial abstraction
methods (converting graphs into multi-dimensional vectors) (Andrienko, et al.,
2017).While these methods can capture most of OD flow information they might miss
the subtle differences within the underlying patterns. For instance, PCA and SVD may
not be appropriate if the data points lie in different subspaces/density regimes
(Steinbach, Ertöz, & Kumar, 2004); and in spatial abstraction methods discretization
of flows and distances might fit different values within the same class.
In regards to exploiting hidden structure of OD matrices, Laharotte, et al. (2015)
presented Latent Dirichlet Allocation (LDA) approach to identify temporal patterns of
the Brisbane network based on different LDA templates such as high level of traffic,
even peak (high) or leisure etc. While Laharotte, et al. (2015) had reduced the B-OD
matrices into LDA (B-OD) templates and clustered those B-OD pairs, the study
proposed to cluster daily B-OD matrices to identify day-to-day variations. Although
past studies (Djukic, et al., 2013; Ruiz de Villa, et al., 2014) proposed structural
similarity metrics to compare OD matrices, clustering of daily OD matrices and
identifying typical OD matrices based on their structural proximity has not been
addressed before.
With respect to the travel patterns, many questions were raised in Section 1.2.5.
To answer these questions, this chapter explores a clustering-based approach to
classify an individual B-OD matrix into specific groups, where OD matrices within
the same group should have similar travel patterns. Raw Bluetooth data from 845
BMSs (Figure 1.9) were obtained for 415 days (June, July, August and December
months of 2015 and all months excluding April of 2016). In the following section, a
detailed methodology is discussed to cluster high-dimensional (Osorio (2017)
emphasizes that dimension of 200 is generally high dimensional) and multi-density B-
OD matrices and identify typical OD matrices of typical travel patterns.
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 162
6.2 METHODOLOGY TO CLUSTER B-OD MATRICES AND
IDENTIFY TYPICAL TRAVEL PATTERNS
The following sections discuss the traditional DBSCAN approach followed by
the proposed three-level approach and distance measures for the clustering algorithm.
6.2.1 Traditional DBSCAN approach
A density-based spatial clustering of applications with noise (DBSCAN)
algorithm was selected for the current application. The algorithm, originally proposed
by (Ester, Kriegel, Sander, & Xu, 1996), is widely used to cluster data points based on
their density. The advantage of a DBSCAN algorithm is that it does not require any
predetermined number of clusters and the size of a cluster is not fixed (Kieu, et al.,
2015). The following sections provide a conceptual framework for the algorithm,
where the data point in the current application should be read as a B-OD matrix.
The algorithm first marks all of the data points as “non-visited”, starting with an
arbitrary selection of a “non-visited” point and identifying all other data points within
ε distance (distance threshold). These data points, if any, are termed as neighbourhood
points. If the number of neighbourhood points is at least MinPts (size threshold) then
the data point under consideration becomes the first point of a new cluster where the
neighbourhood points are part of the same cluster; otherwise, the data point is labelled
as noise. In either case, the data point is now marked as “visited”. If a cluster is
identified, then the above process for defining neighbourhood points is repeated for all
of the new points identified as neighbourhoods in the current cluster and the number
of points in the cluster is extended. Thereafter, a new “non-visited” point is selected,
and the process is repeated until all of the points are marked as “visited”. This leads to
each point being defined as either a cluster or a marked as noise.
From the above, it is clear that the algorithm does not require a number of pre-
determined clusters, as in k-NN (Altman, 1992), and is able to define clusters with
varying density. It also identifies outliers as noise. However, as the algorithm is
sensitive to the setting of its parameters (ε and MinPts), the algorithm does not perform
well for multi-density data sets (Huang, Yu, Li, & Zeng, 2009). Moreover, in the
current application, where data points are high dimensional matrices, a relevant
indicator is required to define the ε. To address these needs, the following sub-sections
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 163
discuss setting DBSCAN parameters and distance measures for B-OD clustering.
Following this, the experiments and results are discussed.
6.2.1.1 Setting DBSCAN parameters
The optimum DBSCAN parameters in the traditional approach are identified using
a simple and interactive heuristic proposed by Ester et al. (1996), as discussed below
(see Figure 6.1):
Step 1: First, a k-dist function is defined to maps each data point, p, to the
distance values (k-dist (p)) corresponding to their kth-nearest neighbour.
Step 2: For a given value of k, choose the kth neighbourhood of every point in
the database and plot the points (x-axis) in the descending order of k-dist values
(y-axis). The graph resulting from this distribution is referred to as sorted k-
dist graph.
Step 3: The shape of the sorted k-dist graph further helps to identify the
threshold point. The parameter MinPts is set to k and is chosen corresponding
to the valley of the sorted k-dist graph. The valley points are identified through
a visual observation, and as such, this technique is an interactive approach. All
points on the left side of the threshold point (i.e., higher k-dist value) are
considered noise and the remaining points are assigned to some clusters.
Figure 6.1: Typical shape of sorted k-dist graph
For ease of explanation, the above technique is presented with an example.
Figure 6.2 (left) shows five data points (P1, P2, P3, P4, and P5) that need to be
clustered using their k-dist values corresponding to their kth nearest neighbour. Here,
the values presented on the link joining the points is the distance between the points.
The kth nearest neighbour and k-dist (within brackets) of all points are shown in Figure
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 164
6.2 (right) for k=1, k=2, and k=3. For instance, the 1st, 2nd, and 3rd nearest neighbours
of P3 are P2, P4, and P1, respectively. The sorted k-dist plots for k=1, k=2, and k=3
along with corresponding valley points are illustrated in Figure 6.3. Here, the y-axis
represents k-dist values and the x-axis shows the order of points. It should be noted
that the order of points changes as k changes. After setting MinPts equal to k, the
optimal values are nothing but the k-dist values corresponding to the valley points of
sorted k-dist plots. For instance, the optimal values for MinPts=1 is 3, MinPts=2 is
3.2, and MinPts=3 is 7. The points on the left side of the valley correspond to noise
and the rest form clusters, as shown in Figure 6.3. As can be seen, for MinPts=1,
clusters can be formed using points that are in proximity (i.e., P1, P2, P3, and P4) while
considering one point (P5) as noise. Similarly, for MinPts=2 and MinPts=3, clusters
can be formed using P1, P3, and P4 while considering P4 and P5 as noise.
Alternatively, it can also be observed from Figure 6.2 that P2 and P5 are slightly away
from rest of the points. Thus, they have a higher possibility of forming noise as
compared to others.
Ester et al. (1996) identified that k-dist graphs for k > 4 did not significantly
differ from the 4-dist graph. Thus, they fixed MinPts to be 4 and identified the
threshold corresponding to the valley of 4-dist graph.
Figure 6.2: Sample data points (left) along with kth nearest neighbour and k-dist of all points (right)
Figure 6.3: Sorted k-dist graphs for k=1, k=2 and k=3 and the resulting clusters
P4
P5
P1
P3
P2
3.2
1
2
3
4
5
6
P5 P2 P3 P1 P4
k=1
Clusters
1
3
5
7
9
P2 P5 P1 P3 P4
k=2
Noise
1
4
7
10
13
P2 P5 P4 P1 P3
k=3
Noise NoiseClusters
Clusters
Valley Valley Valley
1-D
ist
2-D
ist
3-D
ist
Order of the points Order of the points Order of the points
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 165
A traditional DBSCAN algorithm performs poorly if the data points are of varied
density (multi-density data sets). To address this, some researchers have suggested
dividing datasets into different density levels (referred as subspaces) prior to the
clustering process (Elbatta & Ashour, 2013; Parsons, Haque, & Liu, 2004). The
difference in density levels can be observed from sorted k-dist plots. For instance,
Figure 6.4 shows a typical sorted k-dist plot if there are two density levels in the data
points. Thus, the decision to consider subspace clustering is made based on the density
distribution. As such, major subspaces/clusters are initially identified within the
datasets and the clustering process is then performed within the subspaces.
Figure 6.4: Demonstration of two density levels through sorted k-dist plot
6.2.2 Three-level approach for identifying DBSCAN parameters
This section discusses the methodology developed to identify the optimum
DBSCAN parameters using a three-level approach. Figure 6.5 illustrates the overview
of the three-level approach based on DBSCAN clustering algorithm. It describes the
methodology adopted to estimate typical OD matrices of typical travel patterns by
clustering β (in the study β= 415) B-OD matrices. The step by step approach is
described as follows.
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 166
Figure 6.5: Three level approach to cluster B-OD matrices
First level: Identify the possible subspaces
o Step 1: First, the density distribution of data points is observed from
sorted k-dist plots for k=1 to k=15. Based on the experiments in this
study, for k>15 the number of clusters formed were less than or equal
to 2. Thus, an upper limit of k to be 15 was selected. If plots show v
distinct valleys, then it is a v-density dataset. Thus, the data points are
further split into v subspaces for subspace clustering. If the plots
represent only one valley, then no subspace clustering is undertaken.
Second level: Identifying the initial set of DBSCAN parameters
o Step 2: Unlike the approach adopted by Ester, et al. (1996); that is,
visually inferring threshold from the valley of sorted k-dist plots, it is
proposed in this study that the shortest distance from origin criterion is
to identify the initial set of DBSCAN parameters represented by
),… ) )]. According to this
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 167
criterion, the valley of a sorted k-dist graph corresponds to point at the
shortest distance from the origin of axes formed by k-dist values in the
y-axis and sorted data points (OD matrices) in the x-axis.
Third level: Identifying the optimum set of DBSCAN parameters ( ) and the
resulting clusters.
o Step 3: DBSCAN clustering is now performed using the set of
parameters ( ) identified in the second level.
o Step 4: Although a good number of clusters is required, at the same time
unimportant clusters are not wanted. Thus, those parametric
combinations of and MinPts that result in c homogeneous clusters
where, cl <= c <= cu. The lower and upper limits are analyst’s
discretion, and in this study 3 <= c <= 6 is considered. The selected
parameters are referred to as ( ) and the rest of parametric
combinations are ignored. The homogeneous clusters belonging to
these parametric combinations are the final clusters.
Section 6.3 explains the above process further, with an example from the real
data.
6.2.3 Distance measures for clustering B-OD matrices
The two statistical metrics proposed in Chapter 3; that is, the GSSI and NLOD,
were deployed as the structural proximity measures for comparison of OD matrices.
In this research, the applicability of these metrics were independently tested as distance
measures for clustering B-OD matrices. First, the formulations and characteristics of
both statistical metrics are discussed and thereafter the distance measures are defined.
Since the DBSCAN algorithm considers a distance matrix for clustering process,
GSSI values are initially converted into distance values (dGSSI) using Equation (74).
The pre-computed 415*415 GSSI matrix is multiplied by 1,000 so that the distance
value is close to one decimal place.
(74)
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 168
The NLOD in itself is a distance value; thus, it requires no further conversion.
However, to be consistent with GSSI the 415*415 NLOD matrix is multiplied by 1000
as shown in Equation (75).
dNLOD = 1000*(NLOD( )) (75)
To compare the results of experiments based on the structural proximity measures
with a traditional metric that does not account the OD matrix structural information,
normalized root mean square error (RMSN) is chosen. The formulation for RMSN is
taken from (Antoniou, et al., 2004) and is shown in Equation (76). To be consistent
with other distance measures, the equivalent distance measure for RMSN is obtained
by multiplying Equation (76) with 1000 as shown in Equation (77).
RMSN ( ) =
(76)
dRMSN = 1000*(RMSN ) (77)
6.3 EXPERIMENTS AND RESULTS
This section details the conduct of experiments using dGSSI (Experiment-1) and
dNLOD (Experiment-2) as proximity measures and their corresponding results are
compared against Experiment-3 that is based on dRMSN..
The initial observations from sorted k-dist plots indicated a possibility of two
different density regimes in the datasets for all three experiments (Figure 6.6, Figure
6.7 and Figure 6.8). Thus, based on Step-1 of the three-level approach (Section 6.2.2),
all data points were first divided into two different subspaces. It was observed that the
first 129 points (in the order shown by x-axis) defined subspace-1 and consisted of
Saturdays, Sundays, public holidays, and long weekends. The rest of the data points
were pre-classified as subspace-2, which consisted of regular weekdays (WDR) and
weekday school holidays (WDSH). The experiments for individual subspaces are
described in the following subsections.
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 169
Figure 6.6: Sorted k-dist plots for experiment-1
Figure 6.7: Sorted k-dist plots for experiment-2
Figure 6.8: Sorted k-dist plots for experiment-3
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 170
6.3.1 Experiment-1: dGSSI as proximity measure
6.3.1.1 Subspace-1 analysis
Here, the analysis was performed on 129 data points of subspace-1. The initial
set of DBSCAN parameters; that is, were identified based on the shortest
distance from origin criterion. Figure 6.9a presents the number of clusters formed for
different MinPts. The pie-chart represents a consistent proportion of clusters
(homogeneous clusters) for (MinPts =4 to MinPts =9). The relationship between the
optimum parameters ( ) was observed to be linear, with an R2 value of 0.8932 (see
Figure 6.9b).
The clusters of subspace-1 from Experiment-1 were:
Cluster-1 (C1) included weekends, Public Holidays, Long Weekends,
January to June 2016.
Cluster-2 (C2) included Sundays of Spring and summer, 2016;
Cluster-3 (C3) included Saturdays of Spring and summer, 2016;
Cluster-4 (C4) included Sundays of Winter, 2015; and
Cluster-5 (C5) included Saturdays of Winter, 2015.
Figure 6.9: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for Subspace-1 of experiment-1
6.3.1.2 Subspace-2 analysis
Similar to the last analysis, the graphs presented in Figure 6.10a indicate the
number of clusters formed for different MinPts and Figure 6.10b indicates the linear
R² = 0.8932
30.00
31.00
32.00
33.00
34.00
35.00
36.00
3 4 5 6 7 8 9
MinPts ( )
0
2
4
6
8
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber
of c
luste
rs
MinPts
(A) (B)
C148%
C215%
C312%
C413%
C512%
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 171
relationship (with R2 =0.94) between optimal DBSCAN parameters. The following are
the observed clusters:
Cluster-1 (C1) included regular weekdays of 2016 except summer;
Cluster-2 (C2) included regular weekdays, 2015;
Cluster-3 (C3) included weekday school holidays, 2015 and 2016; and
Cluster-4 (C4) included regular weekdays of November 2016
Figure 6.10: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for subspace-2 of experiment-1
6.3.2 Experiment-2: dNLOD as proximity measure
6.3.2.1 Subspace-1 analysis
The relationship between MinPts and the number of clusters is illustrated in
Figure 6.11a, while Figure 6.11b shows the linear relationship between the optimal
DBSCAN parameters (R2 =0.8977) that resulted in the following clusters:
Cluster-1 (C1) included weekends, Public Holidays, Long Weekends,
January to June 2016.
Cluster-2 (C2) included Sundays of Winter, 2015; and
Cluster-3 (C3) included Saturdays of Winter, 2015.
0123456789
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber
of c
luste
rs
MinPts
(A) (B)
R² = 0.94
15.00
16.00
17.00
18.00
19.00
20.00
2 4 6 8 10
MinPts ( )
C144%
C224%
C324%
C48%
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 172
Figure 6.11: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for subspace-1 of experiment-2
6.3.2.2 Subspace-2 analysis
The relationship between MinPts and number of clusters is shown in Figure
6.12a. The relationship between ε and M for subspace-2 of experiment-2 was also
found to be linear, with a R2 value of 0.9716 (Figure 6.12b). The clusters resulting
from this analysis were:
Cluster-1 (C1) included regular weekdays of 2016 except Summer;
Cluster-2 (C2) included regular weekdays, 2015;
Cluster-3 (C3) included weekday school holidays, 2015 and 2016; and
Cluster-4 (C4) included regular weekdays of November 2016.
Figure 6.12: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for subspace-2 of experiment-2
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber
of c
luste
rs
MinPts
(A) (B)
C137%
C233%
C330%
R² = 0.8977
94.00
99.00
104.00
109.00
4 6 8 10 12
MinPts ( )
01234567
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber
of c
luste
rs
MinPts
(A) (B)
C146%
C223%
C322%
C49%
R² = 0.9716
65.00
67.00
69.00
71.00
73.00
75.00
2 3 4 5 6
MinPts ( )
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 173
6.3.1 Experiment-3: dRMSN as proximity measure
6.3.1.1 Subspace-1 analysis:
The distance measure dRMSN has resulted in only one major cluster for subspace-
1. It included all Saturdays, Sundays, Public Holidays of 2015 and 2016 except
Saturdays of spring and summer, 2016 that was considered to be noise.
6.3.1.2 Subspace-2 analysis:
A total of 4 homogeneous clusters are formed for MinPts ranging from 4 to 13.
The relationship between MinPts and number of clusters are illustrated in Figure
6.13(a) and Figure 6.13(b) shows the linear relationship between the optimal
DBSCAN parameters (R2 =0.9832) that resulted in the following clusters:
Cluster-1 (C1) includes WDR of 2016 except summer;
Cluster-2 (C2) includes WDR, 2015;
Cluster-3 (C3) includes WDSH, 2015 and 2016; and
Cluster-4 (C4) includes WDR of November 2016.
Figure 6.13: (a) Number of clusters vs MinPts and proportion of clusters; and (B) vs for Subspace-2 of experiment-3
6.3.2 Typical B-OD flows
One of the ways to derive typical B-OD matrices and typical OD flows for
individual OD pairs is by taking average of all B-OD matrices within each cluster type.
To give an example of the difference among the typical OD flows, OD flows for the
OD pair-Mt. Gravatt and Brisbane CBD is shown in the Box-Whisker plot (Figure
6.14). The plot is shown for the clusters resulted from experiment-1 where, the first 5
clusters in the x-axis correspond to C-1 to C-5 of subspace-1 and the last 4 clusters
0
2
4
6
8
10
12
14
16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber o
f clu
sters
MinPts
C141%
C224%
C323%
C412%
R² = 0.9832
135
140
145
150
155
160
165
170
175
2 4 6 8 10 12 14
Opt
imum
ε
MinPts
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 174
correspond to C-1 to C-4 of subspace-2, respectively. The y-axis represents the OD
flow values.
Figure 6.14: Box-Whisker plot demonstrating the difference among the typical B-OD flows for OD pair – Mt. Gravatt and Brisbane CBD (results of experiment-1)
6.3.3 Discussion
Since the ground truth is unknown, one of the ways to compare the clusters
resulted from all three experiments is to see how good they are able to reproduce pre-
classified day types. The number of days in each category of day type are shown in the
Figure 6.15. (Refer Figure 6.16 or notations section for the expansion of the terms used
in Figure 6.15).
Figure 6.15: Classification of day types
While the comparison in Figure 6.16 shows that PH (Public Holidays), LW
(Long Weekends), School Holidays during Saturdays and Sundays could not form
standalone clusters, both GSSI (9 clusters) and NLOD (7 clusters) could represent the
pre-classification better than RMSN (5 clusters). The similarity in the clusters resulted
from GSSI and NLOD are further explained in detail below.
3918
40
16
219
67
5 11
SATRSATSHSUNRSUNSHWDRWDSHPHLW
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 175
Both metrics were able to differentiate weekday and weekend patterns. In
fact, there was no typical weekend travel pattern because travel patterns
during Saturday and Sunday were found to differ from each other.
Both metrics observed seasonal trends in travel patterns. For instance,
Saturdays during the Australian Winter, 2015 were observed to have
different travel patterns compared to the rest of the Saturdays. A similar
observation was noted for the Sundays of Winter, 2015.
Both metrics identified a group of Saturdays and Sundays during the
school holiday season that shared similar travel patterns with a few well
noted public holidays of Australia.
The classification of subspace-2 (i.e., WDR and WDSH) was the same in
both experiments. This identified that the travel patterns during WDSH
differed from those of the WDR. Interestingly, WDSH from both 2015 and
2016 were grouped into one single cluster by both metrics.
Both metrics identified that WDR travel patterns during November 2016
differed from those of other regular working weekdays. The difference in
travel patterns during November 2016 could be attributed to major events
held in that month. The annual report published by Royal National
Agricultural and Industrial Association of Queensland (RNA, 2016)
estimated that, in 2016, the Brisbane Showgrounds attracted almost a
million people by hosting more than 250 events, with an increase of 20%
compared to 2015. The month of November was the busiest month of
2016, due to hosting a total of 35 events.
However, the only difference between them is that GSSI identified Saturdays
and Sundays from Australian spring and summer of 2016 into two individual clusters
which NLOD failed to differentiate. The less sensitivity of NLOD in this regard can
be attributed to the fact that it computes statistics on OD pairs belonging to one specific
origin, whereas, GSSI computes statistics on groups of OD pairs belonging to more
than one origin. Due to this, GSSI is able to capture subtle structural differences in
travel patterns during the afore-mentioned days.
On the other hand, clusters produced from experiment-3 (based on RMSN)
demonstrated seasonal trends in subspace-2 travel patterns and were similar to the
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 176
results of other experiments. However, it failed to distinguish the differences among
the daily travel patterns during Saturdays, Sundays and Public Holidays. Resulting in
one major cluster, it was unable to recognize seasonal variations within other days in
subspace-1. This is because RMSN is based on deviations of individual OD flows due
to which it could not identify the structural differences within the respective B-OD
matrices.
The typical OD flows (see section 6.3.2 for results of experiment-1) from each
cluster demonstrated typical travel patterns of the Brisbane city and are better than the
observations from a similar study by Guo, et al. (2012) conducted on Brisbane city
over the same time period. Guo, et al. (2012) could identify only three types of travel
patterns namely Saturday, Sunday and Weekday patterns. This is perhaps because
travel patterns are analysed on the dimensionally reduced OD matrices. However, the
present study is able to identify other patterns highlighting the strength of structural
proximity measures to identify more typical travel patterns.
For travel demand modelling, the knowledge of travel patterns can be used for
estimating typical OD matrices using bi-level solution algorithms. Moreover, the
knowledge of travel patterns is important for effective policy decisions such as shifting
public holidays of similar travel patterns towards weekends can form more number of
long weekends (Chung, 2003). This would encourage public to spend more during the
holidays, and thus boosting the nation’s economy. Further, the knowledge of seasonal
distribution of travel patterns help transport planners to schedule the travel surveys
across the study network over any period. For instance, the Household Travel Survey
(HTS) for South East Queensland (SEQTS, 2010) was conducted for over 10 weeks
from mid-April through late-June and in July in 2009. However, the survey period
avoided the days during School/University holidays. Since, the study showed that the
travel patterns are different during school holidays and during different seasons,
distributing the survey period over a year based on the knowledge of Bluetooth travel
patterns can capture better travel patterns of any study region. There are short-term
ITS applications of identifying typical OD matrices. For instance, developing the
database of typical historical time-sliced OD matrices can improve the performance of
OD prediction algorithms (like Kalman Filter) for real time traffic management and
decision making such as Aimsun Live (Aimsunlive, 2017) etc.
Cha
pter
6: M
etho
dolo
gy to
Clu
ster
B-O
D M
atric
es a
nd Id
entif
y Ty
pica
l Tra
vel P
atte
rns:
Cas
e St
udy
App
licat
ion
of th
e B
CC
regi
on
177
Figu
re 6
.16:
Com
paris
on o
f clu
ster
s res
ulte
d fr
om a
ll th
ree
expe
rimen
ts
2015
2016
2015
2016
2015
,16
2015
,16
2015
2016
2015
2016
2015
2016
2015
2016
1W
eeke
nds,
PH a
nd L
W, J
an-J
un 2
016
23
816
817
2Su
nday
s, Sp
ring
and
sum
mer
201
61
25
83
Satu
rday
s, Sp
ring
and
sum
mer
201
65
94
Sund
ays,
Win
ter 2
015
13
105
Satu
rday
s, W
inte
r 201
53
106
WD
R,
2016
exc
ept s
umm
er11
97
WD
R, 2
015
631
8W
DSH
, 201
5 an
d 20
163
2240
9W
DR
, Nov
embe
r 201
623
2015
2016
2015
2016
2015
,16
2015
,16
2015
2016
2015
2016
2015
2016
2015
2016
1W
eeke
nds,
PH a
nd L
W, J
an-J
un 2
016
33
141
282
101
282
Sund
ays,
Win
ter 2
015
13
103
Satu
rday
s, W
inte
r 201
53
104
WD
R,
2016
exc
ept s
umm
er11
95
WD
R, 2
015
636
WD
SH, 2
015
and
2016
117
407
WD
R, N
ovem
ber 2
016
23
2015
2016
2015
2016
2015
,16
2015
,16
2015
2016
2015
2016
2015
2016
2015
2016
Subs
pace
-11
Wee
kend
s, PH
and
LW
, 201
5 an
d 20
165
116
1011
186
1011
292
WD
R, 2
015
611
3W
DSH
, 201
5 an
d 20
161
2239
4W
DR
, Nov
embe
r 201
624
5W
DR
, 20
16 e
xcep
t sum
mer
109
Long
W
eeke
nds
(LW
)
Satu
rday
sSu
nday
s
Dur
ing
Scho
ol
Hol
iday
s (S
ATS
H)
Reg
ular
(S
ATR
)
Dur
ing
Scho
ol
Hol
iday
s (S
UN
SH)
Reg
ular
(SU
NR
)
Expe
rimen
t-3: R
MSN
Subs
pace
-2
Subs
pace
-1
Subs
pace
-2
Expe
rimen
t-2: N
LOD
Subs
pace
-1
Subs
pace
-2
Expe
rimen
t-1: G
SSI
Wee
kday
sPu
blic
Hol
iday
sW
eeke
nds
Reg
ular
W
eekd
ays
(WD
R)
Scho
ol H
olid
ays
durin
g w
eekd
ays
(WD
SH)
Nor
mal
Pub
lic
Hol
iday
s (P
H)
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 178
6.4 SUMMARY
Although DBSCAN clustering algorithm is not new, the study has two major
contributions:
Firstly, clustering multi-density OD matrices based on structural proximity measures
to identify typical daily travel patterns of large-scale network has not been addressed
in the literature.
Secondly, the proposed three-level clustering approach is simple and effective in
identifying the OD clusters. The prior identification of subspaces addresses the
incapacity of classical DBSCAN with respect to multi-density datasets. Identification
of the set of optimum DBSCAN parameters demonstrates that different parametric
combinations can produce homogeneous clusters and their relationship is nearly linear.
The clustering results demonstrated many typical travel patterns for the BCC
region. All three experiments showed that there were seasonal variations in the travel
patterns for weekdays, and the travel patterns of during weekday school holidays and
November 2016 were unique. The experiments based on structural proximity measures
could identify the seasonal variations even among the travel patterns during Saturdays
and Sundays. On the other hand, RMSN failed to identify any unique travel patterns
within subspace-1 because of its incapacity to capture the subtle structural differences
within those patterns. This highlights the importance of accounting the structural
information of OD matrices with many practical benefits for both long-term strategic
and short-term transport planning applications.
Chapter 7: Conclusion 179
Chapter 7: Conclusion
This chapter contains the conclusions, limitations, and recommendations related
to the research. First, a summary of this thesis is provided in Section 7.1. Second, the
findings of the study and their connection to the research questions raised in Chapter
1 are reflected upon in Section 7.2. Lastly, based on the understanding gained in this
research, new and pertinent questions for future research are discussed in Section 7.3.
7.1 BRIEF SUMMARY
Estimating OD matrices has been the study of transport modelling research for
more than last three decades. Ever since traffic counts began to be treated as indirect
observations of OD flows, “matrix estimation” has been considered an optimisation
problem. Since then, many methods have been proposed and implemented with respect
to solution algorithms, assignment models, rules-based heuristics, objective function
formulations, measurements from alternate data sources, and statistical performance
measures. While most of the methods developed thus far fall under the schema of bi-
level modelling framework, many challenges are yet to be resolved. First, a traffic
count-based bi-level method is an under-determined problem and to address this most
methods are still dependent on an outdated target OD matrix to maintain the structural
consistency in an OD matrix estimation. Second, assignment-models remain
challenging due to modelling errors and inseparable dependency on OD matrix. Third,
bi-level methods are computationally challenging due to the dimensionality of an OD
matrix and lower-level user-equilibrium assignment problem. Fourth, most existing
statistical performance measures do not account for the structural information of OD
matrices. Fifth, there is a great need to identify typical travel patterns and their
corresponding typical OD matrices in demand modelling. The last challenge is related
to bridging the gap between the availability of massive amounts of big-traffic data and
their direct implementation into transport models, especially tackling the issue related
to unknown market penetration rates of trips inferred from advanced data sources.
This research is an attempt to review the literature, understand the state-of-the-
art techniques, and propose methods to address some of the challenges. Specifically,
Chapter 7: Conclusion 180
this study proposes methods to exploit the additional structural knowledge available
from other big data sources, such as Bluetooth, to maintain structural consistency and
address the problem of under-determinacy, develop alternate methodology to the
existing bi-level-based framework, develop new statistical performance measures for
the structural comparison of OD matrices, and propose a methodological approach to
cluster B-OD matrices and identify typical travel patterns based on the structural
proximity measures using a case study application on real Bluetooth datasets from
BCC region.
The Brisbane City network is already equipped with several Bluetooth scanners.
This Bluetooth data is a good source of travel related information in both spatial and
temporal contexts. While the current applications are only limited to travel time
estimation, the unexplored potential of trip-related information formed the strong
motivation for the current research. Taking one step beyond the existing
implementation, the current study investigated the potential of Bluetooth data and
proposed new methods for improving the quality of OD matrix estimates using
additional knowledge (either the “structure of trips” and/or turning proportions) of
Bluetooth observations. Few analyses were conducted as a part of this research (see
Appendix B) to add more confidence into the structural knowledge of real Bluetooth
observations from the BCC region. However, in the absence of ground truth,
simulation-based experiments are the only way to strengthen the argument that the
“structure” of Bluetooth trips could improve the quality of OD estimates. Although,
the current research is based on Bluetooth observations and applied on the BCC region,
the methodology is applicable for data from any other similar data sources that can
provide additional information related to the structure of trips over any other study
network.
Overall, the entire study is based on enhancing the existing research with respect
to OD matrices comparison (through structural similarity measures); OD matrix
estimation (through the knowledge of Bluetooth trips/turning proportions), and
identification of typical travel patterns and typical OD matrices (through structural
proximity-based clustering method).
Chapter 7: Conclusion 181
7.2 RESEARCH FINDINGS
The study identified major research gaps, which lead to the development of four
research questions (Chapter 1) following a comprehensive review of the literature
(Chapter 2). In conjunction with the research questions, the research findings are
discussed as follows:
The sensitivity analysis results from Chapter 3 demonstrated that GSSI and
NLOD are robust statistical performance measures that have enough
potential to structurally compare OD matrices, which answered the first
research question (RQ1).
The findings of Chapter 4 answered RQ-2, as follows:
o The B-OD method demonstrated that the additional structural
knowledge of Bluetooth OD flows can improve the quality of OD
matrix estimates. The B-OD method is suitable for the networks (such
as the BCC region) that have a good connectivity of Bluetooth scanners.
Although, the B-OD method assumes that the trip ends are exactly
known, the methodology still holds well for observations from any
other emerging data sources that can provide more confidence about
trip ends compared to Bluetooth.
o The B-SP method suits the situations when the penetration rate of
Bluetooth trajectories is low. This method demonstrated the
applicability of Bluetooth subpath flows. The quality of the OD matrix
estimates are found to be better than the traditional traffic counts-based
approach even for 2.5% penetration rate of Bluetooth trips.
o Since, the core of both methods is based on structural information of
Bluetooth trips, the need to estimate unknown penetration rates of
Bluetooth trips is relaxed.
The findings of Chapter 5 answered RQ-3, as follows:
o It demonstrated the ability of the proposed turning-proportion-based
technique as an alternate method to replace the assignment-based
models.
o The improvement in the quality of the OD matrix estimates through
additional knowledge of Bluetooth trips strengthened the proposed
Chapter 7: Conclusion 182
single-level formulation. In fact, knowledge about traffic assignment
was implicitly considered in the observed turning proportions and
Bluetooth trips.
The core of Chapter 6 was to develop a methodological approach to cluster multi-
density B-OD matrices database and identify typical travel patterns with a real
case study application on the BCC region. This chapter addressed RQ-4. The
major findings of clustering analysis were:
o The clusters resulting from experiment-1 and experiment-2
demonstrated the ability of the proposed statistical metrics – GSSI and
NLOD as potential structural proximity measures for DBSCAN
clustering algorithm.
o The clusters from experiment-3 that is based on RMSN failed to
distinguish travel patterns during the weekends and public holidays.
This is because most traditional metrics do not the account the structure
of OD matrices in their mathematical formulation and due to which they
could not identify the subtle structural differences in the afore-
mentioned travel patterns.
7.3 RECOMMENDATIONS FOR FUTURE RESEARCH
This section discusses the future research directions and some pertinent
questions:
Although introducing randomness in Bluetooth flows demonstrated
improvement in the quality of OD flow estimates, to achieve more realistic
modelling, the experiments could include errors and inconsistencies in the
observed traffic counts and turning proportions.
In this study, Bluetooth subpaths were created by trimming the first and last
IDs of BMS from the complete sequence of trips. However, as shown in
Figure 1.12, there could be mis-detections within the Bluetooth trajectories.
Accounting for these mis-detections before incorporating them into the
optimisation model would be even more realistic.
Future studies could be tested using state-of-the-art solution algorithms,
such as versions of SPSA (Tympakianaki, et al., 2018) or metamodels
Chapter 7: Conclusion 183
(Osorio, 2019), and these could be compared with other solution algorithms,
such as a genetic algorithm (Kim, et al., 2001), etc., over a benchmark
network. More improvements could be made with respect to the parameters
of gradient-based algorithms. For instance, in the present study, the prior
step-size was chosen through trial-and error. However, the sensitivity of OD
flows to different values of step-sizes and the rate of change of step-sizes
need to be investigated. The step-sizes could also be sensitive to the OD
flow values; that is, higher and lower flow values. Convergence criteria
could also be tested for future investigation.
The current research focussed only on utilising the knowledge of Bluetooth
trips in the objective function formulation. As vehicle trajectories can be
inferred from Bluetooth observations, they could be used to calibrate the
assignment model in the future research.
This study can be extended to dynamic OD space. Current state-of-the-art
techniques to estimate better quality time-dependent OD matrices use quasi-
dynamic approaches. Thus, the methods proposed in this research could
incorporate a quasi-dynamic assumption with respect to the distribution of
origin flows and estimate better time-dependent offline OD matrices. Quasi-
dynamic Kalman filter algorithms could then be investigated with additional
measurements from Bluetooth observed flows for real-time estimation of
OD flows.
Bibliography 184
Bibliography
Abedi, N., Bhaskar, A., & Chung, E. (2013). Bluetooth and Wi-Fi MAC address based crowd data collection and monitoring: benefits, challenges and enhancement. Retrieved from
Abedi, N., Bhaskar, A., & Chung, E. (2014). Tracking spatio-temporal movement of
human in terms of space utilization using Media-Access-Control address data. Applied Geography, 51, 72-81. Retrieved from
Abedi, N., Bhaskar, A., Chung, E., & Miska, M. (2015). Assessment of antenna
characteristic effects on pedestrian and cyclists travel-time estimation based on Bluetooth and WiFi MAC addresses. Transportation Research Part C: Emerging Technologies, 60, 124-141. Retrieved from
ABS (Singer-songwriter). (2017). More than two in three drive to work, Census
reveals. On. Retrieved from http://www.abs.gov.au/ausstats/[email protected]/mediareleasesbyReleaseDate/7DD5DC715B608612CA2581BF001F8404?OpenDocument
ABS. (2018). Census of Population and Housing: Community Profile, DataPack and
TableBuilder Templates, Australia, 2016 Retrieved from http://www.abs.gov.au/AUSSTATS/[email protected]/Latestproducts/2079.0Main%20Features42016?opendocument&tabname=Summary&prodno=2079.0&issue=2016&num=&view=. http://www.abs.gov.au/AUSSTATS/[email protected]/Latestproducts/2079.0Main%20Features42016?opendocument&tabname=Summary&prodno=2079.0&issue=2016&num=&view=
Ahas, R., Silm, S., Järv, O., Saluveer, E., & Tiru, M. (2010). Using mobile positioning
data to model locations meaningful to users of mobile phones. In Journal of urban technology (Vol. 17, pp. 3-27).
Aimsun. (2019). Aimsun Next 8.4 User's Manual. Aimsun, Barcelona, Spain.
Retrieved from https://www.aimsun.com/ Aimsunlive. (2017). Gold Coast: Predictive Solutions Trial. Retrieved from
https://www.aimsun.com/gold-coast-predictive-solutions-trial/Retrieved from https://www.aimsun.com/gold-coast-predictive-solutions-trial/
Alexander, L., Jiang, S., Murga, M., & González, M. C. (2015). Origin–destination
trips by purpose and time of day inferred from mobile phone data. In Transportation research part c: emerging technologies (Vol. 58, pp. 240-250).
Bibliography 185
Alibabai, H., & Mahmassani, H. (2008). Dynamic origin-destination demand estimation using turning movement counts. Transportation Research Record: Journal of the Transportation Research Board(2085), 39-48. Retrieved from
Allahviranloo, M., & Recker, W. (2015). Mining activity pattern trajectories and
allocating activities in the network. In Transportation (pp. 1-19). Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric
regression. The American Statistician, 46(3), 175-185. Retrieved from Andrienko, G., Andrienko, N., Fuchs, G., & Wood, J. (2017). Revealing patterns and
trends of mass mobility through spatial and temporal abstraction of origin-destination movement data. IEEE Transactions on Visualization & Computer Graphics(1), 1-1. Retrieved from
Antoniou, C., Barceló, J., Breen, M., Bullejos, M., Casas, J., Cipriani, E., . . . Marzano,
V. (2016). Towards a generic benchmarking platform for origin–destination flows estimation/updating algorithms: Design, demonstration and validation. Transportation Research Part C: Emerging Technologies, 66, 79-98. Retrieved from
Antoniou, C., Ben-Akiva, M., & Koutsopoulos, H. (2004). Incorporating automated
vehicle identification data into origin-destination estimation. Transportation Research Record: Journal of the Transportation Research Board(1882), 37-44. Retrieved from
Antoniou, C., Ben-Akiva, M., & Koutsopoulos, H. N. (2006). Dynamic traffic demand
prediction using conventional and emerging data sources. In IEE Proceedings-Intelligent Transport Systems (Vol. 153, pp. 97-104): IET.
M. (2014). A framework for the benchmarking of OD estimation and prediction algorithms. In 93rd Transportation Research Board Annual Meeting.
Asakura, Y., Hato, E., & Kashiwadani, M. (2000). Origin-destination matrices
estimation model using automatic vehicle identification data and its application to the Han-Shin expressway network. Transportation, 27(4), 419-438. Retrieved from
ASGS. (2017). Australian Statistical Geography Standard (ASGS). Retrieved from
Balakrishna, R., Ben-Akiva, M., & Koutsopoulos, H. (2007). Offline calibration of
dynamic traffic assignment: simultaneous demand-and-supply estimation. Transportation Research Record: Journal of the Transportation Research Board(2003), 50-58. Retrieved from
Bar-Gera, H., Mirchandani, P. B., & Wu, F. (2006). Evaluating the assumption of
independent turning probabilities. Transportation Research Part B: Methodological, 40(10), 903-916. Retrieved from
Barceló Bugeda, J., Montero Mercadé, L., Marqués, L., & Carmona, C. (2010). A
Kalman-filter approach for dynamic OD estimation in corridors based on bluetooth and Wi-Fi data collection. In 12th World Conference on Transportation Research WCTR, 2010.
Barceló, J., Gilliéron, F., Linares, M., Serch, O., & Montero, L. (2012). Exploring link
covering and node covering formulations of detection layout problem. Transportation Research Record: Journal of the Transportation Research Board(2308), 17-26. Retrieved from
Barceló, J., Montero, L., Bullejos, M., Linares, M., & Serch, O. (2013). Robustness
and Computational Efficiency of Kalman Filter Estimator of Time-Dependent Origin-Destination Matrices: Exploiting Traffic Measurements from Information and Communications Technologies. Transportation Research Record: Journal of the Transportation Research Board(2344), 31-39. Retrieved from
Barceló, J., Montero, L., Bullejos, M., Serch, O., & Carmona, C. (2013). A Kalman
filter approach for exploiting bluetooth traffic data when estimating time-dependent OD matrices. Journal of Intelligent Transportation Systems, 17(2), 123-141. Retrieved from
Battiti, R. (1989). Accelerated backpropagation learning: Two optimization methods.
Complex systems, 3(4), 331-342. Retrieved from
Bibliography 187
Bauer, D., Richter, G., Asamer, J., Heilmann, B., Lenz, G., & Kölbl, R. (2018). Quasi-
Dynamic Estimation of OD Flows From Traffic Counts Without Prior OD Matrix. IEEE Transactions on Intelligent Transportation Systems, 19(6), 2025-2034. Retrieved from
Behara, K. N., Bhaskar, A., & Chung, E. (2018, 7- 11 January 2018). Classification of
typical Bluetooth OD matrices based on structural similarity of travel patterns-Case study on Brisbane city. In Transportation Research Board 97th Annual Meeting.
Bell, M. G. (1983). The estimation of an origin-destination matrix from traffic counts.
Transportation Science, 17(2), 198-217. Retrieved from Bell, M. G. (1991). The estimation of origin-destination matrices by constrained
generalised least squares. Transportation Research Part B: Methodological, 25(1), 13-22. Retrieved from
Ben-Akiva, M. E., Gao, S., Wei, Z., & Wen, Y. (2012). A dynamic traffic assignment
model for highly congested urban networks. Transportation research part C: emerging technologies, 24, 62-82. Retrieved from
Bera, S., & Rao, K. (2011). Estimation of origin-destination matrix from traffic counts:
the state of the art. European Transport - Trasporti Europei, 49, 2-23. Retrieved from
Bhaskar, A., & Chung, E. (2013). Fundamental understanding on the use of Bluetooth
scanner as a complementary transport data. Transportation Research Part C: Emerging Technologies, 37, 42-72. Retrieved from
Bhaskar, A., Qu, M., & Chung, E. (2015). Bluetooth vehicle trajectory by fusing
bluetooth and loops: motorway travel time statistics. IEEE Transactions on Intelligent Transportation Systems, 16(1), 113-122. Retrieved from
Bhaskar, A., Qu, M., Nantes, A., Miska, M., & Chung, E. (2015). Is bus
overrepresented in Bluetooth MAC scanner data? Is MAC-ID really unique? International Journal of Intelligent Transportation Systems Research, 13(2), 119-130. Retrieved from
Bierlaire, M. (2002). The total demand scale: a new measure of quality for static and
dynamic origin–destination trip tables. In Transportation Research Part B: Methodological (Vol. 36, pp. 837-850).
Bierlaire, M., & Crittin , F. (2004). An efficient algorithm for real-time estimation and
prediction of dynamic OD tables. Operations Research, 52(1), 116-127. Retrieved from
Bierlaire, M., & Toint, P. L. (1995). Meuse: An origin-destination matrix estimator
that exploits structure. Transportation Research Part B: Methodological, 29(1), 47-60. Retrieved from
Bibliography 188
Blogg, M., Semler, C., Hingorani, M., & Troutbeck, R. (2010). Travel time and origin-
destination data collection using Bluetooth MAC address readers. In Australasian Transport Research Forum (pp. 1-15).
Bluetooth data from Brisbane City Council. (2016). Retrieved from Brooks, A. C., Zhao, X., & Pappas, T. N. (2008). Structural similarity quality metrics
in a coding context: Exploring the space of realistic distortions. IEEE Transactions on image processing, 17(8), 1261-1273. Retrieved from
BSTM (Cartographer). (2015). Traffic Analysis Zonal network on Google Earth. BSTM. (2016). Brisbane Strategic Transport Demand Model. Retrieved from Bullejos, M., Barceló Bugeda, J., & Montero Mercadé, L. (2014). A DUE based bilevel
optimization approach for the estimation of time sliced OD matrices. In Proceedings of the International Symposia of Transport Simulation (ISTS) and the International Workshop on Traffic Data Collection and its Standardisation (IWTDCS), ISTS'14 and IWTCDS'14.
Calabrese, F., Di Lorenzo, G., Liu, L., & Ratti, C. (2011). Estimating origin-
destination flows using mobile phone location data. IEEE Pervasive Computing, 10(4), 0036-0044. Retrieved from
Cantelmo, G., Cipriani, E., Gemma, A., & Nigro, M. (2014). An adaptive bi-level
gradient procedure for the estimation of dynamic traffic demand. IEEE Transactions on Intelligent Transportation Systems, 15(3), 1348-1361. Retrieved from
Carpenter, C., Fowler, M., & Adler, T. (2012). Generating route-specific origin-
destination tables using Bluetooth technology. Transportation Research Record: Journal of the Transportation Research Board(2308), 96-102. Retrieved from
Cascetta, E. (1984). Estimation of trip matrices from traffic counts and survey data: a
generalized least squares estimator. Transportation Research Part B: Methodological, 18(4-5), 289-299. Retrieved from
Cascetta, E., Inaudi, D., & Marquis, G. (1993). Dynamic estimators of origin-
destination matrices using traffic counts. Transportation science, 27(4), 363-373. Retrieved from
Cascetta, E., & Nguyen, S. (1988). A unified framework for estimating or updating
origin/destination matrices from traffic counts. Transportation Research Part B: Methodological, 22(6), 437-455. Retrieved from
Cascetta, E., Papola, A., Marzano, V., Simonelli, F., & Vitiello, I. (2013). Quasi-
dynamic estimation of o–d flows from traffic counts: Formulation, statistical
Bibliography 189
validation and performance analysis on real data. Transportation Research Part B: Methodological, 55, 171-187. Retrieved from
Cascetta, E., & Postorino, M. N. (2001). Fixed point approaches to the estimation of
O/D matrices using traffic counts on congested networks. Transportation science, 35(2), 134-147. Retrieved from
Chang, G.-L., & Wu, J. (1994). Recursive estimation of time-varying origin-
destination flows from traffic counts in freeway corridors. Transportation Research Part B: Methodological, 28(2), 141-160. Retrieved from
Cheung, W., Wong, S., & Tong, C. (2006). Estimation of a time‐dependent origin‐
destination matrix for congested highway networks. Journal of advanced transportation, 40(1), 95-117. Retrieved from
Chitturi, M. V., Shaw, J. W., Campbell IV, J. R., & Noyce, D. A. (2014). Validation
of Origin–Destination Data from Bluetooth Reidentification and Aerial Observation. Transportation Research Record, 2430(1), 116-123. Retrieved from
Chung, E. (2003). Classification of traffic pattern. In Proc. of the 11th World Congress
on ITS (pp. 687-694). Chung, E. (2016). Use of Bluetooth and Wifi for Measuring Vehicles and People
Movements, PATREC. Retrieved from http://www.patrec.uwa.edu.au/announcements/use-of-bluetooth-and-wifi-for-measuring-vehicles-and-people-movements
Cipriani, E., Florian, M., Mahut, M., & Nigro, M. (2010). Investigating the efficiency
of a gradient approximation approach for the solution of dynamic demand estimation problems. Chapters. Retrieved from
Cipriani, E., Florian, M., Mahut, M., & Nigro, M. (2011). A gradient approximation
approach for adjusting temporal origin–destination matrices. Transportation Research Part C: Emerging Technologies, 19(2), 270-282. Retrieved from
Ciuffo, B., & Punzo, V. (2010). Verification of traffic micro-simulation model
calibration procedures: Analysis of goodness-of-fit measures. In Proceeding of the 89th Annual Meeting of the Transportation Research Record, Washington, DC.
Cools, M., Moons, E., & Wets, G. (2010). Assessing the quality of origin-destination
matrices derived from activity travel surveys: Results from a Monte Carlo experiment. Transportation Research Record: Journal of the Transportation Research Board(2183), 49-59. Retrieved from
Cooper, R. (1977). Abstract Structure and the Indian Rāga System. In
Ethnomusicology (pp. 1-32).
Bibliography 190
Crawford, F., Watling, D. P., & Connors, R. D. (2018). Identifying road user classes based on repeated trip behaviour using Bluetooth data. Transportation research part A: policy and practice, 113, 55-74. Retrieved from
Cremer, M., & Keller, H. (1981). Dynamic identification of flows from traffic counts
at complex intersections. In Proc., 8th International Symposium on Transportation and Traffic Theory (pp. 121-142): University of Toronto Press, Canada.
Cremer, M., & Keller, H. (1987). A new class of dynamic methods for the
identification of origin-destination flows. Transportation Research Part B: Methodological, 21(2), 117-132. Retrieved from
Dandy, G., Daniell, T., Foley, B., & Warner, R. (2017). Planning and design of
engineering systems: CRC Press. de Dios Ortuzar, J., & Willumsen, L. G. (2011). Modelling transport: John Wiley &
Sons. De Haas, M. (2016). Travel pattern transitions: A study on the effects of life events on
changes in travel patterns. Retrieved from Dictionary. (Ed.) (2018) Cambridge online dictionary. Cambridge, UK. Dixit, V., Gardner, L. M., & Waller, S. T. (2013). Strategic User Equilibrium
Assignment Under Trip Variability. In Transportation Research Board 92nd Annual Meeting (Vol. 9).
Dixon, M. P. (2000). Incorporation of automatic vehicle identification data into the
synthetic OD estimation process. Ph.D. thesis, Texas A&M University, College Station, TX.
Dixon, M. P., & Rilett, L. (2002). Real‐Time OD Estimation Using Automatic Vehicle
Identification and Traffic Count Data. Computer‐Aided Civil and Infrastructure Engineering, 17(1), 7-21. Retrieved from
Djukic, T. (2014). Dynamic OD demand estimation and prediction for dynamic traffic
management. In PhD Thesis. Djukic, T., Barceló Bugeda, J., Bullejos, M., Montero Mercadé, L., Cipriani, E., van
Lint, H., & Hoogendoorn, S. (2015). Advanced traffic data for dynamic od demand estimation: The state of the art and benchmark study. In TRB 94th Annual Meeting Compendium of Papers (pp. 1-16).
Djukic, T., Hoogendoorn, S., & Van Lint, H. (2013). Reliability assessment of dynamic
OD estimation methods based on structural similarity index. Retrieved from Djukic, T., Van Lint, J., & Hoogendoorn, S. (2012). Application of principal
component analysis to predict dynamic origin-destination matrices.
Bibliography 191
Transportation Research Record: Journal of the Transportation Research Board(2283), 81-89. Retrieved from
Dong, H., Wu, M., Ding, X., Chu, L., Jia, L., Qin, Y., & Zhou, X. (2015). Traffic zone
division based on big data from mobile phone base stations. In Transportation Research Part C: Emerging Technologies (Vol. 58, pp. 278-291).
Elbatta, M. T., & Ashour, W. M. (2013). A dynamic method for discovering density
varied clusters. Int. Journal of Signal Processing, Image Processing, and Pattern Recognition, 6(1), 123-134. Retrieved from
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for
discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, pp. 226-231).
Fisk, C. (1989). Trip matrix estimation from link traffic counts: The congested network
case. Transportation Research Part B: Methodological, 23(5), 331-336. Retrieved from
Fisk, C. S., & Boyce, D. E. (1983). A note on trip matrix estimation from link traffic
count data. Transportation Research Part B: Methodological, 17(3), 245-250. Retrieved from
Florian, M., & Chen, Y. (1995). A Coordinate Descent Method for the Bi‐level O–D
Matrix Adjustment Problem. International Transactions in Operational Research, 2(2), 165-179. Retrieved from
Frederix, R., Viti, F., & Tampère, C. M. (2011). A hierarchical approach for dynamic
origin-destination matrix estimation on large-scale congested networks. In 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC) (pp. 1543-1548): IEEE.
Frederix, R., Viti, F., & Tampère, C. M. (2013). Dynamic origin–destination
estimation in congested networks: theoretical findings and implications in practice. Transportmetrica A: Transport Science, 9(6), 494-513. Retrieved from
Friedrich, M., Immisch, K., Jehlicka, P., Otterstätter, T., & Schlaich, J. (2010).
Generating origin-destination matrices from mobile phone trajectories. Transportation Research Record: Journal of the Transportation Research Board(2196), 93-101. Retrieved from
Gan, L., Yang, H., & Wong, S. C. (2005). Traffic counting location and error bound
in origin-destination matrix estimation problems. Journal of Transportation Engineering, 131(7), 524-534. Retrieved from
Gazis, D. C., & Knapp, C. H. (1971). On-line estimation of traffic densities from time-
series of flow and speed data. Transportation Science, 5(3), 283-301. Retrieved from
Bibliography 192
Gong, L., Liu, X., Wu, L., & Liu, Y. (2016). Inferring trip purposes and uncovering travel patterns from taxi trajectory data. Cartography and Geographic Information Science, 43(2), 103-114. Retrieved from
Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A.-L. (2008). Understanding individual
human mobility patterns. nature, 453(7196), 779. Retrieved from Guo, D., Zhu, X., Jin, H., Gao, P., & Andris, C. (2012). Discovering spatial patterns
in origin‐destination mobility data. Transactions in GIS, 16(3), 411-429. Retrieved from
Gur, Y. J. (1980a). Estimation of an origin-destination trip table based on observed
link volumes and turning movements. Executive summary. Retrieved from Gur, Y. J. (1980b). ESTIMATION OF AN ORIGIN-DESTINATION TRIP TABLE
BASED ON OBSERVED LINK VOLUMES AND TURNING MOVEMENTS. EXECUTIVE SUMMARY. Retrieved from
Hai, Y., Akiyama, T., & Sasaki, T. (1998). Estimation of time-varying origin-
destination flows from traffic counts: A neural network approach. Mathematical and computer modelling, 27(9), 323-334. Retrieved from
Hazelton, M. L. (2000). Estimation of origin–destination matrices from link flows on
uncongested networks. Transportation Research Part B: Methodological, 34(7), 549-566. Retrieved from
Heeringa, W. J. (2004). Measuring dialect pronunciation differences using
Levenshtein distance. Citeseer. Hensher, D. A. (1976). The structure of journeys and nature of travel patterns. In
Environment and Planning A (Vol. 8, pp. 655-672). Hollander, Y., & Liu, R. (2008). The principles of calibrating traffic microsimulation
models. Transportation, 35(3), 347-362. Retrieved from Hu, S. (1996). An adaptive kalman filtering algorithm for the dynamic estimation and
prediction of freeway origin-destination matrices (Order No. 9725558). Available from ProQuest Dissertations & Theses Global. (304264559). . Retrieved from
Huang, T.-q., Yu, Y.-q., Li, K., & Zeng, W.-f. (2009). Reckon the parameter of
DBSCAN for multi-density data sets with constraints. In Artificial Intelligence and Computational Intelligence, 2009. AICI'09. International Conference on (Vol. 4, pp. 375-379): IEEE.
Iqbal, M. S., Choudhury, C. F., Wang, P., & González, M. C. (2014). Development of
origin–destination matrices using mobile phone call data. In Transportation Research Part C: Emerging Technologies (Vol. 40, pp. 63-74).
Bibliography 193
Jiang, S., Ferreira, J., & González, M. C. (2017). Activity-based human mobility patterns inferred from mobile phone data: A case study of Singapore. In IEEE Transactions on Big Data (Vol. 3, pp. 208-219).
Jornsten, K., & Nguyen, S. (1979). On the estimation of a trip matrix from network
data. Publication No. 153, Centre de Recherche sur les Transports, Universite~ de Montreal, Montreal. Retrieved from
Jörnsten, K., & Wallace, S. W. (1993). Overcoming the (apparent) problem of
inconsistency in origin-destination matrix estimations. Transportation science, 27(4), 374-380. Retrieved from
Kang, Y. (1999). Estimation and prediction of dynamic origin-destination (OD)
demand and system consistency control for real-time dynamic traffic assignment operation.
Kantorovich, L. V. (1942). On the translocation of masses. In Dokl. Akad. Nauk. USSR
(NS) (Vol. 37, pp. 199-201). Khoei, A. M., Bhaskar, A., & Chung, E. (2013). Travel time prediction on signalised
urban arterials by applying SARIMA modelling on Bluetooth data. In 36th Australasian transport research forum (ATRF) 2013.
Kieu, L.-M., Bhaskar, A., & Chung, E. (2015). A modified Density-Based Scanning
Algorithm with Noise for spatial travel pattern analysis from Smart Card AFC data. Transportation Research Part C: Emerging Technologies, 58, 193-207. Retrieved from
Kieu, L. M., Bhaskar, A., & Chung, E. (2012). Bus and car travel time on urban
networks: integrating bluetooth and bus vehicle identification data. Retrieved from
Kim, H., Baek, S., & Lim, Y. (2001). Origin-destination matrices estimated with a
genetic algorithm from link traffic counts. Transportation Research Record: Journal of the Transportation Research Board(1771), 156-163. Retrieved from
Kim, S.-J., Kim, W., & Rilett, L. (2005). Calibration of microsimulation models using
nonparametric statistical techniques. Transportation Research Record: Journal of the Transportation Research Board(1935), 111-119. Retrieved from
Kroeber, A. L. (1943). Structure, function and pattern in biology and anthropology.
The Scientific Monthly, 56(2), 105-113. Retrieved from Kwon, J., & Varaiya, P. (2005). Real-time estimation of origin-destination matrices
with partial trajectories from electronic toll collection tag data. Transportation Research Record: Journal of the Transportation Research Board(1923), 119-126. Retrieved from
Laharotte, P.-A., Billot, R., Come, E., Oukhellou, L., Nantes, A., & El Faouzi, N.-E.
(2015). Spatiotemporal analysis of Bluetooth data: Application to a large urban
Bibliography 194
network. IEEE Transactions on Intelligent Transportation Systems, 16(3), 1439-1448. Retrieved from
using hierarchical region-based and trajectory-based clustering. Proceedings of the VLDB Endowment, 1(1), 1081-1094. Retrieved from
Lee, M., & Sohn, K. (2015). Inferring the route-use patterns of metro passengers based
only on travel-time data within a Bayesian framework using a reversible-jump Markov chain Monte Carlo (MCMC) simulation. Transportation Research Part B: Methodological, 81, 1-17. Retrieved from
Lee, M. S., & McNally, M. G. (2003). On the structure of weekly activity/travel
patterns. Transportation Research Part A: Policy and Practice, 37(10), 823-839. Retrieved from
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and
reversals. In Soviet physics doklady (Vol. 10, pp. 707-710). Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for
stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 661-670): ACM.
Lo, H.-P., & Chan, C.-P. (2003). Simultaneous estimation of an origin–destination
matrix and link choice proportions using traffic counts. Transportation Research Part A: Policy and Practice, 37(9), 771-788. Retrieved from
Lu, L., Xu, Y., Antoniou, C., & Ben-Akiva, M. (2015). An enhanced SPSA algorithm
for the calibration of Dynamic Traffic Assignment models. Transportation Research Part C: Emerging Technologies, 51, 149-166. Retrieved from
Lu, Z., Rao, W., Wu, Y. J., Guo, L., & Xia, J. (2015). A Kalman filter approach to
dynamic OD flow estimation for urban road networks using multi‐sensor data. Journal of Advanced Transportation, 49(2), 210-227. Retrieved from
Lundgren, J. T., & Peterson, A. (2008a). A heuristic for the bilevel origin–destination-
matrix estimation problem. Transportation Research Part B: Methodological, 42(4), 339-354. Retrieved from
Lundgren, J. T., & Peterson, A. (2008b). A heuristic for the bilevel origin–destination-
matrix estimation problem. In Transportation Research Part B: Methodological (Vol. 42, pp. 339-354).
Ma, W., & Qian, Z. S. (2018). Statistical inference of probabilistic origin-destination
demand using day-to-day traffic data. In Transportation Research Part C: Emerging Technologies (Vol. 88, pp. 227-256).
Bibliography 195
Maher, M. (1983). Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach. Transportation Research Part B: Methodological, 17(6), 435-447. Retrieved from
Maher, M. (1998). Algorithms for logit-based stochastic user equilibrium assignment.
Transportation Research Part B: Methodological, 32(8), 539-549. Retrieved from
Maher, M. J., Zhang, X., & Van Vliet, D. (2001). A bi-level programming approach
for trip matrix estimation and traffic control problems with stochastic user equilibrium link flows. Transportation Research Part B: Methodological, 35(1), 23-40. Retrieved from
Manual, T. A. (1964). Bureau of public roads. In US Department of Commerce. Martin, W. A., & McGuckin, N. A. (1998). Travel estimation techniques for urban
planning (Vol. 365): National Academy Press Washington, DC. Marzano, V., Papola, A., Simonelli, F., & Papageorgiou, M. (2018). A Kalman Filter
for Quasi-Dynamic od Flow Estimation/Updating. IEEE Transactions on Intelligent Transportation Systems(99), 1-9. Retrieved from
Masip, D., Djukic, T., Breen, M., & Casas, J. (2018). Efficient OD Matrix Estimation
Based on Metamodel for Nonlinear Assignment Function. Paper presented at Australasian Transport Research Forum 2018 Proceedings, Darwin, Australia.
McNally, M. G. (2008). The four step model. Center for Activity Systems Analysis.
Retrieved from Michau, G. (2016). Link dependent origin-destination matrix estimation: nonsmooth
convex optimisation with Bluetooth-inferred trajectories. Université de Lyon. Michau, G., Nantes, A., Bhaskar, A., Chung, E., Abry, P., & Borgnat, P. (2017).
Bluetooth data in an urban context: Retrieving vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems, 18(9), 2377-2386. Retrieved from
Michau, G., Nantes, A., & Chung, E. (2013). Towards the retrieval of accurate OD
matrices from Bluetooth data: lessons learned from 2 years of data. Retrieved from
Michau, G., Nantes, A., Chung, E., Abry, P., & Borgnat, P. (2014, 17-18 February
2014). Retrieving trip information from a discrete detectors network: The case of Brisbane Bluetooth detectors. In 32nd Conference of Australian Institutes of Transport Research (CAITR 2014).
Michau, G., Pustelnik, N., Borgnat, P., Abry, P., Nantes, A., Bhaskar, A., & Chung,
E. (2016). A Primal-Dual Algorithm for Link Dependent Origin Destination Matrix Estimation. arXiv preprint arXiv:1604.00391. Retrieved from
Bibliography 196
Michau, G., Pustelnik, N., Borgnat, P., Abry, P., Nantes, A., Bhaskar, A., & Chung, E. (2017). A primal-dual algorithm for link dependent origin destination matrix estimation. IEEE Transactions on Signal and Information Processing over Networks, 3(1), 104-113. Retrieved from
Mishalani, R. G., Coifman, B., & Gopalakrishna, D. (2002). Evaluating Real-Time
Origin-Destination Flow Estimation Using Remote Sensing Based Surveillance Data. In Proceeding of the 7th International Conference on the Applications of Advanced Technology in Transportation, ASCE, Cambridge, MA.
Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. Histoire de
l'Académie Royale des Sciences de Paris, 177, 666-704. Retrieved from Nanda, D. (1997). A Method to Enhance the Performance of Synthetic Origin-
Destination (OD) Trip Table Estimation Models. In Masters Thesis. Naoki, M. (2013). Geographic Boundaries of Population Census of Japan. Retrieved
from http://ggim.un.org/meetings/2013-ISGI-NY/documents/ESA_STAT_AC.279_P20_Geographic%20Boundaries%20of%20Population%20Census%20of%20Japan02.pdf
Naveh, K. S., & Kim, J. (2018). Urban Trajectory Analytics: Day-of-Week Movement
Pattern Mining Using Tensor Factorization. IEEE Transactions on Intelligent Transportation Systems. Retrieved from
Nguyen, S. (1976). A unified approach to equilibrium methods for traffic assignment.
In Traffic equilibrium methods (pp. 148-182): Springer. Nguyen, S. (1977). Estimating and OD Matrix from Network Data: a Network
Equilibrium Approach. Montréal: Université de Montréal, Centre de recherche sur les transports. Retrieved from
NPTEL. (2009). Data collection. I. Madras (Ed.) Retrieved from
https://nptel.ac.in/courses/105101087/06-Ltexhtml/p8/p.html Okutani, I., & Stephanedes, Y. J. (1984). Dynamic prediction of traffic volume through
Kalman filtering theory. Transportation Research Part B: Methodological, 18(1), 1-11. Retrieved from
Oliveira-Neto, F. M., Han, L. D., & Jeong, M. K. (2012). Online license plate matching
procedures using license-plate recognition machines and new weighted edit distance. Transportation research part C: emerging technologies, 21(1), 306-320. Retrieved from
Osorio, C. (2017). High-dimensional offline OD calibration for stochastic traffic
simulators of large-scale urban networks. In Technical Report: Massachusetts Institute of Technology.
Bibliography 197
Osorio, C. (2019). Dynamic origin-destination matrix calibration for large-scale network simulators. In Transportation Research Part C: Emerging Technologies (Vol. 98, pp. 186-206).
Oxford. (Ed.) (2018) English Oxford living Dictionaries. Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional
data: a review. Acm Sigkdd Explorations Newsletter, 6(1), 90-105. Retrieved from
Patriksson, M. (2015). The traffic assignment problem: models and methods: Courier
Dover Publications. Perera, K., Bhattacharya, T., Kulik, L., & Bailey, J. (2015). Trajectory inference for
mobile devices using connected cell towers. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 23): ACM.
Pollard, T., Taylor, N., van Vuren, T., & MacDonald, M. (2013). Comparing the
Quality of OD Matrices in Time and Between Data Sources. In Proceedings of the European Transport Conference.
Pool, B. (2014). Brisbane Strategic Transport Model-Multi-Modal (BSTM-MM):
model improvement program. In Australian Institute of Traffic Planning and Management (AITPM) National Conference, 2014, Adelaide, South Australia, Australia.
Rakha, H., & Van Aerde, M. (1995). Statistical analysis of day-to-day variations in
real-time traffic flow data. Transportation research record, 26-34. Retrieved from
Respati, W. S., Bhaskar, A., Zheng, Z., & Chung, E. (2017). Systematic Identification
of Peak Traffic Period. Paper presented at Australasian Transport Research Forum 2017 Proceedings, Auckland, New Zealand.
RNA. (2016). RNA Annual Report. Retrieved from
https://www.rna.org.au/media/881637/2016%20rna%20annual%20report.pdf Robillard, P. (1975). Estimating the OD matrix from observed link volumes.
Transportation Research, 9(2), 123-128. Retrieved from Ros-Roca, X., Montero, L., Schneck, A., & Barceló, J. (2018). Investigating the
performance of SPSA in simulation-optimization approaches to transportation problems. In Transportation research procedia (Vol. 34, pp. 83-90).
Ruiz de Villa, A., Casas, J., & Breen, M. (2014). OD matrix structural similarity:
Wasserstein metric. In Transportation Research Board 93rd Annual Meeting. SEQTS. (2010). South-East Queensland Travel Survey 2009. In Queensland
Transport and Main Roads.
Bibliography 198
Shafiei, M., Nazemi, M., & Seyedabrishami, S. (2015). Estimating time-dependent
origin–destination demand from traffic counts: extended gradient method. Transportation Letters, 7(4), 210-218. Retrieved from
Shafiei, S., Gu, Z., & Saberi, M. (2018). Calibration and validation of a simulation-
based dynamic traffic assignment model for a large-scale congested network. Simulation Modelling Practice and Theory, 86, 169-186. Retrieved from
Spall, J. C. (1992). Multivariate stochastic approximation using a simultaneous
perturbation gradient approximation. IEEE transactions on automatic control, 37(3), 332-341. Retrieved from
Spiess, H. (1987). A maximum likelihood model for estimating origin-destination
matrices. Transportation Research Part B: Methodological, 21(5), 395-412. Retrieved from
Spiess, H. (1990). A gradient approach for the OD matrix adjustment problem.
CENTRE DE RECHERCHE SUR LES TRANSPORTS PUBLICATION, 1(693), 2. Retrieved from
Stathopoulos, A., & Tsekeris, T. (2003). Framework for analysing reliability and
information degradation of demand matrices in extended transport networks. Transport Reviews, 23(1), 89-103. Retrieved from
Stathopoulos, A., & Tsekeris, T. (2005). Methodology for Validating Dynamic
Origin–Destination Matrix Estimation Models with Implications for Advanced Traveler Information Systems. Transportation Planning and Technology, 28(2), 93-112. Retrieved from
Steinbach, M., Ertöz, L., & Kumar, V. (2004). The challenges of clustering high
dimensional data. In New directions in statistical physics (pp. 273-309): Springer.
Stone, J. R., Han, Y., Khattak, A. J., Fan, Y., Huntsinger, L. F., & Bing Mei, P. (2007).
Guidelines for Developing Travel Demand Models: Medium Communities and Metropolitan Planing Organizations. Retrieved from
Tamin, O., & Willumsen, L. (1989). Transport demand model estimation from traffic
counts. Transportation, 16(1), 3-26. Retrieved from Tavana, H. (2001). Internally-Consistent Estimation of Dynamic Network Origin-
Destination Flows from Intelligent Transportation Systems Data Using Bi-Level Optimization. Ph.D. Dissertation, The University of Texas at Austin. Retrieved from
Tavassoli, A., Alsger, A., Hickman, M., & Mesbah, M. (2016a). How close the models
are to the reality? Comparison of Transit Origin-Destination Estimates with Automatic Fare Collection Data. In Australasian Transport Research Forum (ATRF), 38th, 2016, Melbourne, Victoria, Australia.
Bibliography 199
Tavassoli, A., Alsger, A., Hickman, M., & Mesbah, M. (2016b). How close the models
are to the reality? Comparison of Transit Origin-Destination Estimates with Automatic Fare Collection Data. In Australasian Transport Research Forum 2016 Proceedings.
TMR. (2016). BSTM data. In Department of Transport Main Roads. TMR. (2017). The Future of Transport. Retrieved from
https://blog.tmr.qld.gov.au/blog/2017/02/09/the-future-of-transport/ Toledo, T., & Kolechkina, T. (2013). Estimation of Dynamic Origin-Destination
Matrices Using Linear Assignment Matrix Approximations. IEEE Trans. Intelligent Transportation Systems, 14(2), 618-626. Retrieved from
Toledo, T., Koutsopoulos, H., Davol, A., Ben-Akiva, M., Burghout, W., Andréasson,
I., . . . Lundin, C. (2003). Calibration and validation of microscopic traffic simulation tools: Stockholm case study. Transportation Research Record: Journal of the Transportation Research Board(1831), 65-75. Retrieved from
Toole, J. L., Colak, S., Sturt, B., Alexander, L. P., Evsukoff, A., & González, M. C.
(2015). The path most traveled: Travel demand estimation using big data resources. Transportation Research Part C: Emerging Technologies, 58, 162-177. Retrieved from
Transport, B. o., & Economics, R. (Singer-songwriters). (2007). Estimating urban
traffic and congestion cost trends for Australian cities. On: Department of Transport and Regional Services Canberra.
Tympakianaki, A., Koutsopoulos, H. N., & Jenelius, E. (2018). Robust SPSA
algorithms for dynamic OD matrix estimation. Procedia computer science, 130(C), 57-64. Retrieved from
USCensus. (2019). 2005 Metropolitan and Micropolitan Statistical Areas (CBSAs) of
the United States and Puerto Rico. Retrieved from https://www2.census.gov/geo/maps/metroarea/us_wall/Dec2005/cbsa_us_1205.pdf?#.
Van Der Zijpp, N. (1997). Dynamic origin-destination matrix estimation from traffic
counts and automated vehicle identification data. Transportation Research Record: Journal of the Transportation Research Board(1607), 87-94. Retrieved from
Van Zuylen, H. (1978). The information minimising method: validity and applicability
to transport planning. New developments in modelling travel demand and urban systems. Retrieved from
Van Zuylen, H. J., & Willumsen, L. G. (1980). The most likely trip matrix estimated
from traffic counts. Transportation Research Part B: Methodological, 14(3), 281-293. Retrieved from
Bibliography 200
Verbas, İ., Mahmassani, H., & Zhang, K. (2011). Time-dependent origin-destination
demand estimation: Challenges and methods for large-scale networks with multiple vehicle classes. Transportation Research Record: Journal of the Transportation Research Board(2263), 45-56. Retrieved from
Villani, C. (2003). Topics in optimal transportation: American Mathematical Soc. Vogl, T. P., Mangis, J., Rigler, A., Zink, W., & Alkon, D. (1988). Accelerating the
convergence of the back-propagation method. Biological cybernetics, 59(4-5), 257-263. Retrieved from
descendant STRONG for large-scale Stochastic Optimization. In Winter Simulation Conference (WSC), 2016 (pp. 614-625): IEEE.
Wang, Y., Ma, X., Liu, Y., Gong, K., Henricakson, K. C., Xu, M., & Wang, Y. (2016).
A Two-Stage Algorithm for Origin-Destination Matrices Estimation Considering Dynamic Dispersion Parameter for Route Choice. PloS one, 11(1), e0146850. Retrieved from
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality
assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612. Retrieved from
Weijermars, W., & Van Berkum, E. (2005). Analyzing highway flow patterns using
using dimensionality reduction and clustering methods. In Intelligent Transportation Systems (ITSC), 2017 IEEE 20th International Conference on (pp. 548-553): IEEE.
Yang, H. (1995). Heuristic algorithms for the bilevel origin-destination matrix
estimation problem. In Transportation Research Part B: Methodological (Vol. 29, pp. 231-242).
Yang, H., Iida, Y., & Sasaki, T. (1991). An analysis of the reliability of an origin-
destination trip matrix estimated from traffic counts. Transportation Research Part B: Methodological, 25(5), 351-363. Retrieved from
Yang, H., Sasaki, T., Iida, Y., & Asakura, Y. (1992). Estimation of origin-destination
matrices from link traffic counts on congested networks. Transportation Research Part B: Methodological, 26(6), 417-434. Retrieved from
Yujian, L., & Bo, L. (2007). A normalized Levenshtein distance metric. IEEE
transactions on pattern analysis and machine intelligence, 29(6), 1091-1095. Retrieved from
Yun, I., & Park, B. (2005). Estimation of dynamic origin destination matrix: A genetic
Zhang, A., Kang, J. E., Axhausen, K. W., & Kwon, C. (2018). Multi-day activity-
travel pattern sampling based on single-day data. In 97th Annual Meeting of the Transportation Research Board (TRB 2018): TRB Annual Meeting.
Zhou, X. (2004). Dynamic origin-destination demand estimation and prediction for
off-line and on-line dynamic traffic assignment operation. Retrieved from Zhou, X., & Mahmassani, H. S. (2006). Dynamic origin-destination demand
estimation using automatic vehicle identification data. IEEE Transactions on intelligent transportation systems, 7(1), 105-114. Retrieved from
Zhou, X., & Mahmassani, H. S. (2007). A structural state space model for real-time
traffic origin–destination demand estimation and prediction in a day-to-day learning framework. Transportation Research Part B: Methodological, 41(8), 823-840. Retrieved from
Zhu, K. (2007). Time-dependent origin-destination estimation: Genetic algorithm-
based optimization with updated assignment matrix. KSCE Journal of Civil Engineering, 11(4), 199-207. Retrieved from
Appendices 202
Appendices
Appendix A
Methodology to develop B-OD matrix
The knowledge of trajectories can further help in developing Bluetooth based
OD matrices at scanner as well as at zonal level. The methodology to develop
Bluetooth-based OD matrix (B-OD) at zonal level is explained using flowchart shown
in the Figure A1.1.
To develop a B-OD matrix, raw Bluetooth data from a particular day is spatially
and temporally matched to define individual Bluetooth vehicle trajectories that are
further split into trips (Michau, et al., 2014). Here, the Bluetooth dataset for the study
date is downloaded from the BCC server and unique Device IDs are then identified.
Records are retrieved individually for each Device ID and are sorted based on time-
stamp detections for further analysis. Within the record of each Device ID, difference
in time-stamps between successive detections; that is, δ, is used to identify unique
trips/trajectories. If successive detections are from the same scanner, then the threshold
value of δ chosen to identify a new trip is 10 minutes. On the other hand, if the
successive detections are from different scanners, the threshold value of δ chosen is 30
minutes, to identify a new trip. The threshold values are chosen in accordance with a
similar study on Brisbane Bluetooth datasets by Michau et al. (2017). This way, all
individual trips/trajectories of each Device ID are identified, and are then further used
to infer OD trips at a scanner level to form the sOD matrix. The size of the sOD matrix
is 845 × 845, which is further transformed into B-OD matrix at either SA2 or SA3
levels. For this, the concordance between BMS location and SA zones are considered
from the BCC. The process is repeated over 415 days to generate the B-OD matrices
for each day.
Appendices 203
Figure A1.1: Methodology to develop B-OD matrix at zonal level
BCC Bluetooth dataset
Select Device ID
Retrieve the detection record (R) of Device ID and sort it
based on time-stamps
Identify trip ends and add trips of Device _ID into
OD flows for corresponding OD pairs
Exogenous information relating scanner
locations to SA2 zones
Is it the last Device_ID?
End
If successive detections are from the
same scanner
Select two successive detections from the first till the
last detections in record R
If δ >= 10 mins
Record a new trip for the Device ID
If δ >=30 minsYesNo
No
Yes Yes
No
Is it the last record for Device_ID?
Yes
No
Trajectories construction
Yes
No
Identify trip ends and add trips of Device _ID into
OD flows for corresponding OD pairs
Exogenous information relating scanner
locations to SA2 zones
YYYesYesY
OD matrix development
Appendices 204
Appendix B
Can the structure of Bluetooth trips be a proxy for true OD?
1. Background
Although Bluetooth observations capture only a fraction of the actual OD
demand, the observed trip distribution patterns can provide some insights into the real
travel behaviour within any network. Due to this capacity, the knowledge of Bluetooth
trips seems to have the potential to contribute to the OD matrix estimation process.
However, it is important to validate the knowledge of Bluetooth trips before any
practical implementation. Since the ground truth is unknown, it is not directly possible
to validate Bluetooth trips. However, in the absence of the availability of true OD
flows, confidence in the Bluetooth trips can be gained using surrogate measures that
are considered to be the structural properties of OD matrices (Antoniou, et al., 2016).
Because Bluetooth trips are only partial observations, they might not infer a
complete sequence of trajectories. However, at a macroscopic level, the structure of
Bluetooth trips might provide some valuable trip-related information.
In this context, few analyses were conducted to check if the “structure” of
Bluetooth trips preserves the integrity of the actual demand distribution and can be
used as a proxy for the actual distribution of trips. This hypothesis was validated by
testing the following four surrogate measures: a) screenline counts, b) the Brisbane
Strategic Transport Model (BSTM) (BSTM, 2016) travel time distributions, c) car
users (as drivers) taken from the 2016 Census (ABS, 2018), and d) BSTM OD flows.
2. Bluetooth vs Screenline counts
Screenlines divide the region into larger zones, running along natural barriers,
such as river sides, with few cross points across them or along major road
corridors/tunnels (NPTEL, 2009). They are primarily used to calibrate and validate the
base year transport models, such as BSTM (Pool, 2014). See Figure A2.1(a) for the
screenlines and the locations of screenline counts (blue coloured Google pins), and
Figure A2.1(b) for a closer look at the alignment of screenlines with the locations of
BMSs (red coloured circles) within the BCC region.
Appendices 205
(b) Figure A2.1: (a) Locations of screen line counts and screen lines for BCC region (b) Closer
look at the alignment of BMS locations with the screen lines (BSTM, 2016)
A good correlation between screenline counts and the number of Bluetooth
observations from BMS scanners upstream and downstream of the screenline count
location should enhance confidence in using Bluetooth data. For the current analysis,
selected locations of the screenline survey (blue coloured Google pins) and the
corresponding BMS locations (red coloured circles) are shown in Figure A2.2. For
each selected location (both directions of flow), BMS scanners were identified
upstream and downstream, such that the detected Bluetooth data should pass through
the screenline count location. Here, eight screenline count locations were selected, and
these locations were distributed throughout the study region (see Figure A2.2). The
data for comparison were weekday traffic from the year 2016.
Figure A2.2: Selected screen line and BMS locations
Figure A2.3 presents the correlation between the two counts. An increasing trend
between Bluetooth and screenline counts with R2 value = 0.7594 and correlation
coefficient (ρ) = 0.8714 was observed. A decent alignment with high correlation
Appendices 206
coefficients between both observations demonstrates the aptness of Bluetooth in
transport applications.
The penetration rate of Bluetooth counts; that is, the ratio of Bluetooth to
screenline counts for the selected locations is illustrated in Figure A2.3. The average
penetration rate is observed to be nearly 20% and spread between 15%-35% (see
Figure A2.4), which is consistent with 12%-30% for the year 2014 for Brisbane City
(Michau, 2016). Note that slope of the plot in Figure A2.3 also illustrates the
penetration rate of Bluetooth counts. Although traffic counts observations from both
data sources do not provide any “structure” or trip distribution related information, the
penetration rate of Bluetooth counts being consistent both in the literature and in the
current study provides some intial confidence on the Bluetooth observations.
Figure A2.3: Bluetooth vs screenline counts
Figure A2.4: The penetration rate of Bluetooth counts at the selected study locations
y = 0.1923x + 2.8692R² = 0.7594
0
500
1000
1500
2000
1000 2000 3000 4000 5000 6000 7000 8000 9000
Blue
toot
h co
unts
-AM
pea
k(7
AM
-9 A
M)
Screenline counts- AM peak (7AM-9 AM)
Correlation coefficient = 0.8714
0.15
0.34
0.19
0.19
0.16
0.22
0.15
0.19
0.20
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
Walter Taylor Bridge
Breakfast Creek Rd
William Jolly Bridge
Compton Rd
Sherwood Road
Wynnum Rd
South Pine Road
Beckett Road
Average penetration
Bluetooth penetration rate
Sele
ctiv
e sc
reen
line
loca
tions
Appendices 207
3. Trip length (travel time) distribution
Trip length distribution tables are generally used to compare and validate the
modelled trip distribution (such as gravity model) with that of the survey data (Stone
et al., 2007). The trip length distribution plots of existing demand models can also be
used to compare the distributions developed from other data sources. In this study, a
similar analysis was carried out to check the validity of the Bluetooth travel time
distribution plots with BSTM’s distribution.
First, the raw travel times from Bluetooth observations were filtered using a
median absolute deviation filter with f=2 (Kieu, Bhaskar, & Chung, 2012) and the
Bluetooth travel times were estimated for trips between SA2 zones. Similarly, the
BSTM travel times were aggregated from BSTM zonal level to SA2 level for a fair
comparison.
The travel time distribution plots for the Bluetooth observations and the BSTM
model are shown in Figure A2.5. Here, the x-axis represents the travel time in minutes
between SA2 zonal pairs and the y-axis represents the proportion of car trips during
the AM peak period. The mean travel time of trips observed from the BSTM and
Bluetooth were 15.87 minutes and 12.96 minutes, respectively, and their
corresponding standard deviations were 19.70 minutes and 15.33 minutes,
respectively. The highest proportion of car trips (represented by peaks) for Bluetooth
and the BSTM plots were at 10 and 15 minutes, respectively. The difference between
the two plots could be due to the modelling errors in the BSTM, or because the
Bluetooth travel time was the travel time between BMS scanner to scanner locations,
which was not consistent with that of BSTM zone to zone travel time. Another reason
for the negative shift was that Bluetooth detections at the first and last signalised
intersections were not necessarily captured. Thus, proper care must be taken when
using Bluetooth data. Nevertheless, the general shape of the distribution and the values
are acceptable for current surrogate comparison.
Appendices 208
Figure A2.5: BSTM vs Bluetooth travel time distribution
4. Trip productions: Bluetooth vs Census
In this section, Bluetooth trips produced from SA2 zones during the AM peak
period are compared to the 2016 Census “Method of travel to work” observations
(ABS, 2018). The following assumption was made before the comparison: Since most
of the Bluetooth trips were from the detections of in-built cars systems, they could be
considered as a proxy for car trips within the study region.
According to the 2016 Census, most work-based trips in Brisbane were made by
car (as driver) for commuting (75.3%) (ABS, 2017). Since most work-based trips are
generally observed during the AM peak period, car users (who preferred to travel to
work as drivers) from the 2016 Census data were used as a proxy for actual car trips
produced.
The comparison between trips produced by Bluetooth (x-axis) and car users (as
drivers) from the 2016 Census (y-axis) is demonstrated using a scatter plot in Figure
A2.6. Bluetooth observations were found to closely correspond to the 2016 Census
data, with a correlation coefficient (ρ) of 0.8467 and R2 value of 0.7168. Bluetooth
trips also constituted approximately 4.3% of the census car trips. Interestingly, this
observation is consistent with the average Bluetooth trips capture rate of 4.4%
Figure A2.6: Bluetooth vs 2016 Census – trips productions at SA2 level
5. BSTM OD flows vs Bluetooth based OD flows
In this section, BSTM OD flows are compared with Bluetooth based OD (B-OD)
flows at the SA3 level for the AM peak period. In practice, the BSTM base year OD
was generated using extensive modelling techniques. On the other hand, the B-OD
flows were developed through the inference of vehicles trajectories (see Appendix A
for the details of the methodology adopted for developing the B-OD matrix).
Because the B-OD flows are only a fraction of the actual OD flows and BSTM
flows represent scaled-up demand, this section first analyses the variation in the
capture rates of B-OD flows with respect to BSTM OD flows, and then compares both
through R2 and correlation coefficient.
The total number of BSTM OD flows and B-OD flows to be compared were
235,556 and 56,542, respectively. This implies that Bluetooth captured almost 24% of
the total BSTM flows (this value also lies in the range of 15%-35%; i.e., the penetration
rate of Bluetooth counts in Section 2). However, it must be noted that the capture rate
of Bluetooth OD flows was different from that with respect to counts, and varied for
different OD pairs due to many factors, such as distance, socio-economic
characteristics, etc. To provide an example of the variations, the comparison between
BSTM OD flows and B-OD flows is shown using Pareto distribution plots (see Figure
A2.7), where the x-axis represents the ratio of the Bluetooth to BSTM OD flows
( ) arranged in the order of their frequency; the y-axis (left) represents the
proportion of total OD pairs for different values of , and the y-axis (right)
R² = 0.7168
0
1000
2000
3000
4000
5000
6000
7000
0 200 400 600 800 1000 1200 1400
Car u
sers
/SA
2 fro
m C
ensu
s 20
16
Bluetooth trips produced at SA2 level
Correlation coefficient = 0.8467
Appendices 210
represents the cumulative percentage of OD pairs. Interestingly, 75% of the OD pairs
had varying between 5 to 35%. Compared to the capture rate of Bluetooth
counts (i.e. 15% - 35% from Section 2), the penetration rate of the OD flows had a
higher variation. Note that although BSTM is a modelled flow, for understanding
purposes, can be considered a proxy for the actual capture rates of the OD
flows.
Figure A2.7: Pareto distribution of the ratio of Bluetooth OD to BSTM OD flows
Nevertheless, a good correlation was observed between BSTM OD flows and B-
OD flows (ρ = 0.8878 in Figure A2.8). The line of fit between both OD flows also
shows a descent alignment with R2 = 0.7883, and the slope of the fit suggests that the
B-OD flows were nearly 25% of BSTM OD flows. Although there was a wide spread
of , a good correlation with BSTM OD provides more confidence in the
structure of Bluetooth trips.
Perc
enta
ge o
f OD
pai
rs
Appendices 211
Figure A2.8: B-OD flows vs BSTM OD flows
From the above comparisons of the OD matrix structural properties (over four
surrogate measures) it can be concluded that although Bluetooth observations are
partial and only constitute a sample, the structure of the Bluetooth trips is not bad and
probably it can be used as a proxy for the actual distribution of trips. However, in the
absence of the ground truth, and the discrepancies due to the statistical and model
errors in the Bluetooth and data from other sources that are difficult to disentangle, a
further detailed investigation is recommended for the future research.
y = 4.0379x + 114.47R² = 0.7883
0
1000
2000
3000
4000
5000
6000
0 200 400 600 800 1000 1200 1400 1600
BST
M O
D fl
ows
B-OD flows
Correlation Coefficient = 0.8878
Appendices 212
Appendix C
MATLAB optimisation code for B-OD/B-SP methods
clc clear all currentFolder = pwd; True_OD_matrix = load(fullfile(currentFolder, 'inputs', 'OD.txt')); W=size(True_OD_matrix,1)*size(True_OD_matrix,2); % Size of OD vector True_transpose = True_OD_matrix'; OD_True_Vector=True_trasnpose(:); % True OD vector Prior_OD_matrix=load(fullfile(currentFolder, 'inputs', Prior_OD_matrix.txt')); Prior_transpose = Prior_OD_matrix'; Prior_OD_vector=Prior_transpose(:);% Prior OD vector load (fullfile(currentFolder, 'inputs','zones.txt')); load (fullfile(currentFolder, 'inputs', 'ObsCounts.txt'));% Observed link flows load (fullfile(currentFolder, 'inputs','det_sec.txt')); % The IDs of loop detectors and corresponding links (sections) y_obs=ObsCounts(: , 2);% Observed Link counts OD=Prior_OD_matrix; OD_Tranp=OD'; OD_Vector = OD_Tranp (:); %% case=1 for B-OD method and case=2 for B-SP method
if case==1 load (fullfile(currentFolder, 'inputs','BOD_vector.mat'));% Vector of B-OD flows BOD_matrix = reshape(BOD_vector,size (zones,1),size(zones,1)); BOD_matrix=BOD_matrix'; % B-OD matrix pen=Ω*210;% Ω is the percentage number of connected OD pairs (excluding internal OD pairs). So, for 210 OD pairs, Ω=100%, for 168 OD pairs, Ω= 80%, and so on. [BpenStr,Bpen,OD_Vector,OD_ind] = Bluetooth_connected_ODpairs (OD, pen);% Refer to “Bluetooth_connected_ODpairs” function
elseif case==2
load (fullfile(currentFolder, 'inputs',' EndDet_Zone.txt')); % Look up table relating BMS at trip ends with zonal IDs load (fullfile(currentFolder, 'inputs',' Subpathfreq_obs.mat')); % The 1st column is for subpath flows; 2nd and 3rd (last) column for origin and destination zones Subpathflows_obs = Subpathfreq_obs (:,1); % Vector of subpath flows
end lambda = lambda_prior; % choose any prior step length as lambda_prior StrOD_Prior=corr2(Prior_OD_vector,OD_True_Vector); StrOD_BT = corr2(BOD_vector,OD_True_Vector); Obj_ite=[]; y_est_ite=[];Demand=OD_Vector;Values_Ite=[]; l_up=1.5;l_down=0.9; % chose l_up and l_down by trial and error
Appendices 213
[GSSI_PriorOD]=GSSI_computation (Prior_OD_matrix,True_OD_matrix); % Refer to “GSSI_computation” function Objective=2;% Objective=1 corresponds to the obj. function of traditional method and Objective=2 is for B-OD/B-path method for ite=1:20% the number of iterations
[OD_Id_Sno] = Aimsun_matrix (OD,zones); [terminal] = Aimsun (); % Refer to the function “Aimsun.m” system(terminal); % Executing “terminal” [extracted_data] = SQLITE(); % Refer to the function “SQLITE” extracted_data=cell2mat (extracted_data); diff=abs(OD_True_Vector - OD_Vector); SumdiffSq =sum((diff). *(diff)); RMSE_OD=sqrt(SumdiffSq/size(OD_True_Vector,1)); load('BNE.matrix');% BNE is the output from ‘AutoRun_BNE.py’ saved as a text file. Refer python script (AutoRun_BNE.py) in appendix D. [y_est, Sections, LinkPropMat] = Assignment (det_sec, extracted_data, BNE, OD_Id_Sno); % Refer the “assignment” function diff2=abs(y_obs - y_est); SumdiffSq2 =sum((diff2). *(diff2)); RMSE_linkflows=sqrt(SumdiffSq2/size(y_obs,1));
if case==1 [Obj, Gradient, StrBOD] = Obj_Grad (y_obs, y_est, LinkPropMat, Objective, BpenStr, OD_Vector, BOD_vector, pen); % Refer the “Obj_Grad” function
elseif case==2
BTraw = readtable("Det2DetDataALLDETECTOR.txt"); % this text file is output from Aimsun through a separately scripted API. It resembles the raw Bluetooth observations from BMSs. [Traj3_table] = BTpaths_secs (BTraw, det_sec, EndDet_Zone); % refer to “BTpaths_secs” function. [SubTraj3_table] = Subpathsanalysis (Traj3_table); % refer “Subpathsanalysis” function MLSPNo=1; % Only one Most Likely Subpath per OD pair is considered [SubMLP, SubPathFreq] = MostLikelySubpaths (SubTraj3_table, zones, det_sec, MLSPNo); % refer “MostLikelySubpaths” function Subpathprop = Sub_path_proportion_matrix (Subpathfreq_obs, SubPathFreq, OD_True_Vector, OD, zones); [Obj, Gradient, StrSP] = Obj_Grad_subpathflows (y_obs, y_est, PropMat, Subpathprop, Subpathflows_obs, Subpathflows_est, Objective);
end Obj_ite = [Obj_ite; Obj];
if size(Obj_ite,1)>1 if Obj<=Obj_ite(end-1)
lambda=lambda*l_up;
Appendices 214
else
lambda=lambda*l_down; % Deleting the parameter values of current iteration Demand (:, end)=[]; Obj_ite(end)=[];y_est_ite(:,end)=[];Values_Ite(end,:)=[];
% Setting the OD vector to previous iteration OD_Vector=Demand(:,end);
end end
OD_Vector=OD_Vector.*(1-lambda.*(Gradient));% Updating OD vector temp5=reshape(OD_Vector,[size(OD,1),size(OD,2)]); OD = temp5';% Reshaping OD vector into matrix
if case==1 values = [StrBOD, RMSE_OD, RMSE_linkflows, Obj];
elseif case==2 values = [StrSP, RMSE_OD, RMSE_linkflows, Obj]; end
Values_Ite=[Values_Ite; values]; y_est_ite=[y_est_ite, y_est]; Demand = [Demand, OD_Vector]; fopen ('matrix.txt','w'); % deleting flow values in the text file “matrix.txt” delete 'BNE.ang.sqlite'; % deleting the Aimsun sqlite database delete 'BNE.ang.old'; % deleting the Aimsun back-up delete 'BNE.matrix'; % deleting the assignment related text file, “matrix.txt”
if case==2 delete 'Det2DetDataALLDETECTOR.txt'; end end IteNo=length (Obj_ite); tempe2=reshape (Demand (:,IteNo),[size(OD,1),size(OD,2)]); Final_OD_matrix =tempe2';% This is the final estimated OD matrix [GSSI_OD] = GSSI_computation (Final_OD_matrix, True_OD_matrix);
Appendices 215
Appendix D
Functions
Function-1: Bluetooth_connected_ODpairs.m function [BpenStr, OD_ind] = Bluetooth_connected_ODpairs (OD, pen) diagind= []; for ind=1: size(OD,1) diagind = [diagind; size(OD,1)*(ind-1)+ind]; % indices of diagonal elements end SNO= [1:size(OD,1)*size(OD,1)]; filter = [~ismember(SNO, diagind)]; OD_ind =SNO(: , filter); % Indices of all OD pairs except that of diagonal BpenStr=datasample (OD_ind, pen-size(OD,1),'Replace', false); % Indices of OD pairs that are Bluetooth connected end Function-2: Aimsun_matrix.m function [OD_Id_Sno] = Aimsun_matrix(OD,zones)
for j=1:size(OD,2) m=OD; %Save the matrix into a .txt file compliant AIMSUN standards filename=strcat('matrix','.txt'); fid=fopen(filename,'w'); fprintf (fid, 'id\t'); fprintf (fid,'%i\t', zones); fprintf (fid,'\n'); fclose (fid); fid=fopen (filename, 'a'); for i=1: length (zones) fprintf (fid,'%i\t', zones(i)); fprintf (fid,'%5.2f\t', m(i,:)); fprintf (fid,'\n'); end fclose(fid); end OD_Id=[]; for i=1: length(zones) for j=1: length(zones) OD_Id=[OD_Id; zones(i) zones(j)]; end end OD_Id_Sno=[[1:size(OD,1)*size(OD,1)]' OD_Id];
end
Appendices 216
Function-3: Aimsun.m function [terminal] = Aimsun () AIMSPath= ('C:\Program Files\Aimsun\Aimsun Next 8.2\aconsole.exe'); Autorunpath= ('C:\.....\AutoRun_BNE.py'); Angpath= ('C:\.....\BNE.ang'); Detpath= ('C:\.....\det_sec.txt'); terminal =horzcat ('"', AIMSPath, '"',' -script ', '"',Autorunpath,'"',' ','"', Angpath, '"',' ','"', Detpath, '"' ); end Function-4: SQLITE.m function [extracted_data] = SQLITE () Sqlitepath=('C:\.....\BNE.ang.sqlite'); conn=database(Sqlitepath,'','','org.sqlite.JDBC','jdbc:sqlite:C:\......\BNE.ang.sqlite'); sqlQuery='SELECT oid, did, sid, ent, countveh, speed, occupancy, density FROM MIDETEC ORDER BY oid, ent;';% Selected fields of sqlite database extracted_data = fetch (conn, sqlQuery); close(conn); end Function-5: GSSI_computation.m function [GSSI] = GSSI_computation(X,Y) % In this function geographical windows are created for 15 x 15 OD matrix % Higher zones (hz) are created as follows: % hz1: Westend-Southbank-Highgate Hill, Ext5, Gabba % hz2: BNE Inner East, New Farm; hz3: Valley, Spring Hill, CBD i.e. 9, 14,2 % hz4: Newstead-Bowen Hills, Ext 2, Ext 4; hz5: Ext 1, Kelvin Grove-Herston % hz6: Red Hill-Milton-Auchenflower, Ext 3 hz=[1;2;3;4;5;6]; % 6 hzs Zonal_IDs=[3,2,5,4,6,4,1,3,5,2,4,6,3,1,1];% hz IDs for all 15 small zones that links to the order of OD matrix loaded into Aimsun that is not in the sequence of hz. hz=unique(Zonal_IDs);
for i=1: length(hz) for j=1: length(hz) Filter_Row=[ismember(Zonal_IDs, hz(i))]; Filter_Col=[ismember(Zonal_IDs, hz(j))]'; X_Geo=X(Filter_Row, Filter_Col); Y_Geo=Y(Filter_Row, Filter_Col);
Covariance=cov(X_Geo,Y_Geo); str_comp(i,j)=Covariance(1,2)/(std2(X_Geo)*std2(Y_Geo)); SSIM(i,j)=mean_comp(i,j)*std_comp(i,j)*str_comp(i,j); end end GSSI=mean2(SSIM); end
Appendices 217
Function-6: Assignment.m function [y_est, Sections, PropMat] = Assignment (det_sec, extracted_data, BNE, OD_Id_Sno) Detectors=unique(extracted_data(:,1));% as 1st column represents detectors y_est=[]; Sections=[];
for q=1: length(Detectors) Filter=[det_sec(:,1)==Detectors(q)]; Filter0 = [extracted_data(:,1)== Detectors(q)];
temp0=extracted_data(Filter0,:); if sum(Filter)~=0
Sections = [Sections; det_sec(Filter,2)]; % Links equipped with detectors y_est = [y_est; max(temp0 (:,5))]; % User equilibrium link flows
end end
PropMat= zeros(24,225); % 24 Links and 225 OD pairs (including diagonals) for w=1: length(Sections)
Filter2 = [BNE(:,4)==Sections(w)]; K=BNE(Filter2,:); for c=1:size(K,1)
for d=1:size(OD_Id_Sno,1) if OD_Id_Sno(d,2)== K(c,1)& OD_Id_Sno(d,3)== K(c,2)
PropMat(w, OD_Id_Sno(d,1))=K(c,7); end end
end end
end Function-7: MostLikelySubpaths.m function [SubMLP, SubPathFreq] = MostLikelySubpaths (SubTraj3_table, zones, det_sec, MLSPNo) SubMLP= []; SubPathFreq = []; for z1=1: size(zones,1)
for z2=1: size(zones,1) if z1~=z2
Filter =[SubTraj3_table.Zorg==zones(z1) & SubTraj3_table.Zdest==zones(z2)]; if sum(Filter)>0 Org_trips=SubTraj3_table.trip_det(Filter,:); subpath_id=[];subpath_id_str=[];
for p=1:size(Org_trips,1) p1=cell2mat(Org_trips(p)); subpath_str=[];
for j=1:size(p1,1) result = strcat(num2str(p1(j))); subpath_str = [subpath_str result];
end a = unique(subpath_id);% “a” gives all path IDs that are unique temp2 = sortrows([a, histc(subpath_id(:),a)],2); % temp2 gives the frequency of each unique subpath_id a1 = unique(subpath_id_str); MLPTab=table; MLPTab.a1=a1; a2=[];
for r=1:size(a1,1) MLPTab.a1_no(r)=str2num(char(a1(r))); a2= [a2; MLPTab.a1_no(r)];
end MLPTab_sort=sortrows(MLPTab,{'a1_no'}); a3 = zeros(size(a2));
for i = 1:size(a2,1) %Replaced a3(i) = sum(path_id(:) == a2(i));
end if size(temp2,1)>MLPNo
dp=MLPNo; else
dp=size(temp2,1); end
MLPath_id = [];MLPath_freq=[]; for i=size(temp2,1):-1:(size(temp2,1)-dp+1)
end DetIDs = det_sec_test(:,1); Dig_DetID = numel(num2str(fix(abs(DetIDs(1)))));%Dig_DetID = No of digits in a detector ID MLP_Det = [];MLPaths=[];
for i=1:size(MLPath_id,1); AllDet_in_Path= []; filter = [MLPTab.a1_no(:)==MLPath_id(i)]; y=char(MLPTab.a1(filter)); Dig_PathID = numel(y);% Dig_PathID = No of digits in a pathID
for j=1:Dig_DetID:Dig_PathID % here step length of 3 is taken because, the no of digits in each detector is 3
Det_in_path = sscanf(y(j:j+Dig_DetID-1), '%d');
Appendices 219
AllDet_in_Path=[AllDet_in_Path Det_in_path];
end te=struct('f1',AllDet_in_Path); MLPaths = [MLPaths; [struct2cell(te) zones(z1) zones(z2)]]; end
SubMLP = [SubMLP; MLPaths]; SubPathFreq=[SubPathFreq; MLPath_freq]; MLPaths=[]; end
end end
end end ------------------------------------------------------------------------------------------------------ Function-8: Subpath proportion matrix function (Subpathprop) = Sub_path_proportion_matrix (Subpathfreq_obs, SubPathFreq, OD_True_Vector, OD, zones) u=unique(Subpathfreq_obs(:,2:3),'rows');temp=[]; for f=1:size(u,1) filter1 = [Subpathfreq_obs(:,2)==u(f,1) & Subpathfreq_obs(:,3)==u(f,2)]; filter2=[SubPathFreq(:,2)==u(f,1) & SubPathFreq(:,3)==u(f,2)]; p1=Subpathfreq_obs (filter1,:); p2=SubPathFreq (filter2,:); if size(p1,1)==size(p2,1) temp = [temp; p2]; elseif size(p1,1)>size(p2,1) diff=size(p1,1)-size(p2,1); if size(p2,1)==0 temp2 = repmat([0 p1(1,2) p1(1,3)],diff,1); temp = [temp; p2; temp2]; else temp2 = repmat([0 p2(1,2) p2(1,3)],diff,1); temp = [temp; p2; temp2]; end else size(p1,1)<size(p2,1) temp = [temp; p2(1:size(p1,1),:)]; end end Subpathfreq_est = temp; Subpathflows_est = Subpathfreq_est(:,1); OD_noDiag=[];OD_Flows_Zones=[]; for q1=1:size(OD,1) for q2=1:size(OD,1) OD_Flows_Zones = [OD_Flows_Zones; OD(q1,q2), zones(q1), zones(q2)]; % Creating a OD vector with Origin and Dest IDs if q1~=q2
Appendices 220
OD_noDiag=[OD_noDiag; OD(q1,q2), zones(q1), zones(q2)]; end end end % constructing subpath proportion matrix based on estimated subpath flows Subpathprop=zeros(size(Subpathflows_obs,1),size(OD_True_Vector,1)); for q3=1:size(Subpathprop,1) filter1 = [OD_Flows_Zones(:,2)==Subpathfreq_est(q3,2)]; filter2= [OD_Flows_Zones(:,3)==Subpathfreq_est(q3,3)]; filter3=filter1.*filter2; ODflow = OD_Flows_Zones(logical(filter3),1); f=find(filter3==1); Subpathprop(q3,f)=Subpathfreq_est(q3,1)/ODflow; end Function-9: Obj_Grad.m function [Obj, Gradient, StrBOD] = Obj_Grad_rep (y_obs, y_est, PropMat, Objective, BpenStr, OD_Vector, BOD_vector, pen) OD_Vec_sample = OD_Vector(BpenStr',:); BOD_Vec_sample = BOD_vector(BpenStr',:); B1=BOD_Vec_sample-mean(BOD_Vec_sample); OD1=OD_Vec_sample-mean(OD_Vec_sample); OD_ones = repmat(1,size(OD_Vector,1),1); StrBOD = corr2(OD_Vec_sample, BOD_Vec_sample); c1=sum(B1.*OD1)/sum(OD1.^2); c2=sqrt(sum(B1.^2)*sum(OD1.^2)); Grad_Str_sample=(B1-c1*OD1)/c2; Grad_Str_x = zeros(size(OD_Vector,1),1);
for u=1: pen Grad_Str_x(BpenStr(u))= Grad_Str_sample (u);
end G1= (y_est-y_obs)*(2-StrBOD); G2= (2-StrBOD)*PropMat'; G3 = Grad_Str_x*(y_est-y_obs)';
if Objective = = 1 % Link flows deviation Obj=0.5*(sum(((y_est-y_obs).^2))); Gradient = (PropMat')*((y_est-y_obs));
return elseif Objective = = 2 % Link flows deviations and Structural deviation of OD flows
Obj=0.5*(sum(((y_est-y_obs)*(2-StrBOD)).^2)); Gradient = (G2-G3)*(G1);return end end
# sections_det code is to get only those sections with detectors installed sections_det=list() file=open(detectorsFileName,'r') if file!=None: for line in file.readlines(): idDetector = line.split(";") det=model.getCatalog().find(int(idDetector[0])) if det != None: sections_det.append(det.getBottomObject())
# links code is to get all sections in the network links=list(); linkType = model.getType( "GKSection" ) for segs in model.getCatalog().getUsedSubTypesFromType( linkType ): for lk in segs.itervalues(): links.append(lk) if replication.getExperiment().getSimulatorEngine() == GKExperiment.eMicro: simulator.addSimulationTask (GKSimulationTask(replication,GKReplication.eBatch)) simulator.setGatherProportions (True,assignMatrixFileName+'.matrix',sections_det, turnings, 0 ) simulator.simulate() console.close() else: console.getLog().addError( "Cannot load the network" ) print "cannot load network" if __name__ == "__main__": sys.exit(main(sys.argv))