n9351833 krishna nikhil sumanth behara thesis · 2019. 9. 3. · ORIGIN-DESTINATION MATRIX ESTIMATION USING BIG TRAFFIC DATA: A STRUCTURAL PERSPECTIVE Krishna Nikhil Sumanth Behara

ORIGIN-DESTINATION MATRIX

ESTIMATION USING BIG TRAFFIC DATA:

A STRUCTURAL PERSPECTIVE

Krishna Nikhil Sumanth Behara

Master of Civil (Transportation) Engineering Birla Institute of Technology and Science (BITS), Pilani, India, 2012

Submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy (PhD)

School of Civil Engineering and Built Environment

Science and Engineering Faculty

Queensland University of Technology

2019

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective i

Keywords

Bi-level; Bluetooth; subpaths; Brisbane city; BSTM; clustering OD matrices;

DBSCAN; gradient descent; Mean geographical window based SSIM (GSSI); Mean

Levenshtein distance for OD matrices (NLOD); non-assignment-based; local sliding

window; origin destination (OD) matrix; OD matrix estimation; OD matrix structure;

single-level formulation; statistical performance measures; structural correlations;

structural consistency; structure of trips; structural proximity measures; structural

similarity Index (SSIM); subspace analysis; turning proportions; typical OD matrix;

travel pattern; structural comparison of OD matrices; trajectories; optimum parameters

of DBSCAN; under-determinacy problem.

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective ii

Abstract

Origin-destination (OD) matrices and the knowledge of travel patterns are key

inputs into most transport models aimed at both long-term strategic planning, as well

as short-term traffic control and management. OD matrices are not simply mere

representations of individual OD flows. The distribution of OD flows between

different OD pairs indicates the inherent structural information of the OD matrix that

cannot be neglected while comparing OD matrices. The structural knowledge of OD

matrices aids in understanding and analysing travel demand patterns.

An OD matrix is generally unobserved; thus, it is often estimated as an

optimisation problem. However, optimisation models are generally dependent on

point-based (loop detectors) traffic count observations to update and estimate the

outdated prior OD matrix, and lack the ability to describe the distribution of trips (or

“structure” of travel patterns) across the network. To maintain structural consistency

during OD estimation, the adopted methods are generally based on traffic survey-based

constraints, such as trip productions/attractions, the ratio of OD flows, or enhancing

the objective function through deviations with respect to the target OD matrix.

However, the major drawback is that these constraints/formulations are based on travel

surveys that are generally outdated. Most popular statistical measures used for either

the general comparison of OD matrices or for quality comparison of OD matrices

(estimated from different optimisation algorithms), depend on individual cell-based

statistics, and fail to account for the inherent structural information of OD matrices.

With advancements in technology, there is growing interest in exploiting big

traffic data sources, such as Bluetooth, etc. in travel demand modelling. However,

knowledge about travel demand obtained from these data sources may not reveal a

detailed demographic and contextual picture about commuter trips. For instance,

Bluetooth data only captures a fraction of the actual demand, providing incomplete

information about trips, and most importantly, the penetration rate of Bluetooth trips

remains unknown due to the unavailability of ground truth. Nevertheless, it provides

high spatial and temporal resolution compared to travel surveys. Thus, despite the

abundant availability, more effort is required to integrate advanced data sources, such

as Bluetooth, into main stream traffic modelling.

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective iii

To this end, the research mainly emphasises on the importance of the structural

knowledge of travel demand (either OD/path flows) and it has four major

contributions. First, it develops statistical metrics - Mean Geographical window based

Structural Similarity Index (GSSI) and Mean Normalised Levenshtein Distance for

OD matrices (NLOD) - for the structural comparison of OD matrices. As compared to

traditional SSIM, the GSSI technique is computationally effective, can capture local

travel patterns and preserves geographical integrity. NLOD is a novel approach to

capture the “structural” information of OD matrices through the preference of

destinations and distribution of origin flows. It is an optimisation-based metric and is

computationally better than another popular metric – Wasserstein distance. The

sensitivity analysis performed on both metrics proved that they are robust in nature.

Second, the study enhances the bi-level formulation by integrating structural

knowledge of Bluetooth trips (in terms of Bluetooth subpath flows) into the existing

objective function without the need to know their penetration rates. Third, the study

develops a novel non-assignment-based approach to estimate the OD matrices from

observed turning proportions and structural knowledge of Bluetooth trips. This is a

single-level formulation, does not depend on simulation-based assignment and thus

computationally faster than bi-level OD estimation method. Finally, the fourth

contribution is the development of a methodological framework (three-level approach)

to cluster multi-density OD matrices using DBSCAN algorithm. It highlights the

importance of accounting the structural information of OD matrices in the proximity

measures of clustering algorithms. The methodology is tested with a real case study

application on identifying typical travel patterns of the Brisbane City Council (BCC)

region.

Although the proposed methods were tested using Bluetooth data and

demonstrated using the BCC case study, they are generic in nature and suitable for any

other emerging data sources that can provide similar type of measurements over any

other study network.

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective iv

Table of Contents

Keywords .................................................................................................................................. i

Abstract .................................................................................................................................... ii

Table of Contents .................................................................................................................... iv

List of Figures ......................................................................................................................... ix

List of Tables ........................................................................................................................ xvi

List of Publications .............................................................................................................. xvii

Notations ............................................................................................................................... xix

Abbreviations ....................................................................................................................... xxii

Statement of Original Authorship ....................................................................................... xxiv

Acknowledgements ...............................................................................................................xxv

Chapter 1: Introduction ...................................................................................... 1

1.1 Background .....................................................................................................................1

1.1.1 Origin-Destination (OD) matrix ...........................................................................2

1.1.2 OD matrix estimation problem .............................................................................4

1.1.3 The “structure” of OD matrix/trips .......................................................................8

1.1.4 Advanced traffic data sources ............................................................................11

1.2 Research Problem .........................................................................................................14

1.2.1 Problem of under-determinacy ...........................................................................14

1.2.2 Mapping relationship between link flows and OD flows ...................................15

1.2.3 Computation cost ................................................................................................16

1.2.4 Lack of potential performance measures ............................................................17

1.2.5 The need for typical OD matrices that represent typical travel patterns ............17

1.2.6 Unknown penetration rates of trips inferred from advanced data sources .........18

1.3 Research Motivation .....................................................................................................19

1.4 Research Questions, Aim, and Objectives ....................................................................19

1.5 Research Methodology .................................................................................................20

1.5.1 Task-1 .................................................................................................................21

1.5.2 Task-2 .................................................................................................................22

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective v

1.5.3 Task-3 .................................................................................................................23

1.5.4 Task-4 .................................................................................................................23

1.6 Significance and scope .................................................................................................23

1.7 Definitions ....................................................................................................................25

1.8 Thesis Outline ...............................................................................................................26

Chapter 2: Literature Review ........................................................................... 28

2.1 Background of OD matrix estimation ...........................................................................28

2.2 Problem Formulation ....................................................................................................30

2.2.1 Static OD formulation - uncongested networks .................................................30

2.2.2 Static OD formulation - congested networks .....................................................34

2.2.3 Dynamic OD formulation ...................................................................................41

2.2.4 Quasi-Dynamic formulation ...............................................................................44

2.3 The solution algorithms ................................................................................................45

2.4 OD matrix structural information .................................................................................46

2.5 Statistical performance measures..................................................................................48

2.6 Indirect/partial measurements of OD flows ..................................................................51

2.6.1 Point sensors .......................................................................................................52

2.6.2 Point to point sensors (AVI data) .......................................................................52

2.7 Summary of literature review .......................................................................................54

Chapter 3: Development of Statistical Metrics for the Structural Comparison

of OD Matrices ......................................................................................................... 57

3.1 Background ...................................................................................................................57

3.2 Structural Similarity (SSIM) index ...............................................................................58

3.2.1 Local sliding window .........................................................................................61

3.3 Mean Geographical window-based SSIM (GSSI) ........................................................64

3.3.1 Structural comparison of local travel patterns ....................................................67

3.3.2 Geographical window vs sliding window ..........................................................68

3.3.3 Computational efficiency ...................................................................................69

3.4 Levenshtein Distance ....................................................................................................69

3.4.1 Traditional Levenshtein distance ........................................................................70

3.4.2 Proposed Levenshtein distance for structural comparison of OD matrices ........74

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective vi

3.4.3 Levenshtein vs Wasserstein distances ................................................................79

3.5 Sensitivity analysis of GSSI and NLOD.......................................................................84

3.5.1 Experimental criteria ..........................................................................................86

3.5.2 Results of uniform scaling effects: .....................................................................88

3.5.3 Results of random scaling effects .......................................................................89

3.6 Summary .......................................................................................................................91

Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the

Structure of Bluetooth Trips ................................................................................... 93

4.1 Background ...................................................................................................................93

4.1.1 B-OD structure-based method (or B-OD method) .............................................94

4.1.2 B-SP structure-based method (or B-SP method) ................................................95

4.2 Study network and data ................................................................................................97

4.2.1 Development of observed B-OD flows ( ) ......................................................100

4.2.2 Development of observed B-SP flows ( ) .......................................................101

4.3 Bi-level Framework: Matlab - Aimsun Integration ....................................................103

4.4 B-OD method: OD matrix estimation using B-OD structure .....................................103

4.4.1 Objective function formulation ........................................................................104

4.4.2 OD matrix estimation algorithm .......................................................................106

4.4.3 Experiments – ideal and near-ideal scenarios of B-OD method.......................109

4.4.4 Results for the ideal scenario of B-OD method ................................................110

4.4.5 Results for the near-ideal scenario of B-OD method .......................................114

4.4.6 Discussion ........................................................................................................118

4.5 B-SP method: OD matrix estimation using B-SP structure ........................................119

4.5.1 Objective function formulation ........................................................................121

4.5.2 OD matrix estimation algorithm .......................................................................122

4.5.3 Experiments for B-SP method ..........................................................................123

4.5.4 Results for B-SP method ..................................................................................123

4.5.5 Discussion ........................................................................................................127

4.6 Comparison of B-OD and B-SP methods ...................................................................127

4.7 B-SP method for lower penetration rates of Bluetooth trajectories ............................129

4.7.1 Experiments for B-SP method (lower penetration rates): ................................131

4.7.2 Results for B-SP method (lower penetration rates) ..........................................131

4.7.3 Discussion ........................................................................................................133

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective vii

4.8 Summary .....................................................................................................................134

Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting

Observed Turning Proportions and Structure of Bluetooth Trips ................... 136

5.1 Background .................................................................................................................136

5.2 OD matrix estimation: Traditional versus proposed approach ...................................137

5.3 Study networks ...........................................................................................................138

5.3.1 Toy network .....................................................................................................139

5.3.2 TMR network ...................................................................................................139

5.4 Concept of possible paths ...........................................................................................140

5.4.1 Possible paths in the toy network .....................................................................140

5.4.2 Possible paths in the TMR network ..................................................................141

5.5 OD matrix estimation methodology ...........................................................................142

5.5.1 Link flows estimation from turning proportion matrix ....................................142

5.5.2 The structural comparison of OD flows ...........................................................145

5.5.3 OD matrix estimation formulation ...................................................................146

5.6 Experiments and Results: Toy network ......................................................................147

5.6.1 Convergence of gradient descent algorithm .....................................................149

5.6.2 Structural consistency .......................................................................................149

5.6.3 Under-determinacy problem .............................................................................150

5.6.4 Optimal percentage of Bluetooth connectivity .................................................151

5.7 Experiments and Results: TMR network ....................................................................151

5.7.1 Non-assignment-based vs assignment-based experiments ...............................152

5.7.2 RMSE results ....................................................................................................153

5.7.3 GSSI results ......................................................................................................154

5.7.4 Computational time: Non-assignment-based vs assignment-based ..................155

5.8 Summary .....................................................................................................................156

Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical

Travel Patterns: Case Study Application of the BCC region ............................ 159

6.1 Background .................................................................................................................159

6.2 Methodology to cluster B-OD matrices and identify typical travel patterns ..............162

6.2.1 Traditional DBSCAN approach .......................................................................162

6.2.2 Three-level approach for identifying DBSCAN parameters ............................165

6.2.3 Distance measures for clustering B-OD matrices.............................................167

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective viii

6.3 Experiments and results ..............................................................................................168

6.3.1 Experiment-1: dGSSI as proximity measure .......................................................170

6.3.2 Experiment-2: dNLOD as proximity measure......................................................171

6.3.1 Experiment-3: dRMSN as proximity measure .....................................................173

6.3.2 Typical B-OD flows .........................................................................................173

6.3.3 Discussion ........................................................................................................174

6.4 Summary .....................................................................................................................178

Chapter 7: Conclusion ..................................................................................... 179

7.1 Brief summary ............................................................................................................179

7.2 Research findings........................................................................................................181

7.3 Recommendations for future research ........................................................................182

Bibliography ........................................................................................................... 184

Appendices .............................................................................................................. 202

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective ix

List of Figures

Figure 1.1: Trends in the social cost of congestion in AUD for different scenarios

(Transport & Economics, 2007) .................................................................... 1

Figure 1.2: Illustration of OD matrix for a spatial distribution of travel demand ........ 2

Figure 1.3: Statistical Areas of BCC region: SA2 (left) and SA3 (right) .................... 3

Figure 1.4: TAZs in Greater Brisbane region (BSTM, 2015) ...................................... 4

Figure 1.5: The overview of OD matrix estimation process ........................................ 5

Figure 1.6: Traditional bi-level framework .................................................................. 6

Figure 1.7: Demonstration of (a) the skeleton/structure of OD and (b)

corresponding mass/OD flows ....................................................................... 9

Figure 1.8: Example of OD matrix structural dimension ............................................. 9

Figure 1.9: Location of Bluetooth Scanners within the BCC region ......................... 12

Figure 1.10: BMS and loop detectors at an intersection in Brisbane city .................. 12

Table 1.1: Sample Bluetooth data from Brisbane, Australia (Bluetooth data from

Brisbane City Council, 2016) ...................................................................... 13

Figure 1.11: Comparison of turning proportions: Bluetooth vs SCATS (Chung,

2016) ............................................................................................................ 13

Figure 1.12: Consistency of Bluetooth trajectories during regular weekdays ........... 14

Figure 1.13: Demonstration of under-determinacy problem using (a) example

network, (b) feasible solutions ..................................................................... 15

Figure 1.14: Research methodology framework ........................................................ 21

Figure 2.1: Pictorial representation of some of the widely-used sensor types in

OD estimation problem ................................................................................ 52

Figure 3.1: Comparison of MR with OD matrices M1 and M2 .................................. 58

Figure 3.2: (a) Comparison of Images (source Wang et al., 2004) vs (b)

comparison of OD matrices (source Djukic et al., 2013) ............................ 59

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective x

Figure 3.3: An example of sliding window for SSIM calculation. ............................ 61

Figure 3.4: Sensitivity of MSSIM towards local window size .................................. 63

Figure 3.5: An example to illustrate the proposed geographical window-based

approach ....................................................................................................... 65

Figure 3.6: Splitting (a) Monday and (b) Sunday OD matrices into geographical

(SA4) windows ............................................................................................ 66

Figure 3.7: Insights into local travel patterns using geographical local window:

(left) Brisbane South to Brisbane North and (right) Brisbane South to

Brisbane West .............................................................................................. 67

Figure 3.8: GSSI vs sliding windows based MSSIM for weekends .......................... 68

Figure 3.9: GSSI vs sliding windows based MSSIM for weekdays .......................... 69

Figure 3.10: Comparison of computational costs: Sliding windows based SSIM

vs SSIM ........................................................................................................ 69

Figure 3.11: Example to demonstrate Generalised Levenshtein Distance ................. 70

Figure 3.12: Matrix demonstration of traditional Levenshtein approach

(Algorithm 1) ............................................................................................... 72

Figure 3.13: Comparison of strings “Monday” and “Saturday” using GLD ............. 73

Figure 3.14: Example to demonstrate Levenshtein distance application for OD

matrices comparison .................................................................................... 74

Figure 3.16: Matrix demonstration of Algorithm 2 ................................................... 78

Figure 3.17: Matrix (L) demonstration for ......................................... 79

Figure 3.18: Demonstration of Wasserstein distance through an example ................ 80

Figure 3.19: (a) Sample network and (b) OD matrices XR and XQ with their

corresponding paths and travel costs. .......................................................... 82

Figure 3.20: Results of uniform scaling for GSSI and NLOD ................................... 88

Figure 3.21: Results of random scaling effects for (a) GSSI and (b) its structure

component .................................................................................................... 89

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xi

Figure 3.22: Results of random scaling effects for (a) NLOD and (b) its structure

component .................................................................................................... 90

Figure 4.1: Sample network (with installed BMS), paths and OD matrices .............. 95

Figure 4.2: (a) Study site installed with Bluetooth scanners and loop detectors

(b) spatial structure of Brisbane City core network ..................................... 98

Figure 4.3: Splitting the study OD matrix into geographical windows ................... 100

Figure 4.4: Generation of for the near-ideal scenario of the B-OD method......... 101

Figure 4.5: Generation of for the B-SP method .................................................... 102

Figure 4.6: MATLAB-Aimsun integration framework ........................................... 103

Figure 4.7: (a) Traditional link counts-based method vs (b) proposed B-OD

method........................................................................................................ 104

Figure 4.8: RMSE w.r.t. Xtrue for the traditional and ideal scenario cases of the

B-OD method ............................................................................................. 110

Figure 4.9: Percentage of improvement in RMSE w.r.t. Xprior for traditional and

ideal scenario cases of the B-OD method .................................................. 111

Figure 4.10: Percentage of improvement in RMSE w.r.t. traditional method for

ideal scenario case of B-OD method ......................................................... 111

Figure 4.11: StrOD w.r.t. Xtrue for the traditional and ideal scenario cases of the

B-OD method ............................................................................................. 112

Figure 4.12: Percentage of improvement in the StrOD w.r.t. Xprior for the

traditional and ideal scenario cases of the B-OD method .......................... 112

Figure 4.13: Percentage of improvement in the StrOD w.r.t. traditional method

for the ideal scenario cases of the B-OD method ...................................... 112

Figure 4.14: GSSI w.r.t. Xtrue for the traditional and ideal scenario cases of the

B-OD method ............................................................................................. 113

Figure 4.15: Percentage of improvement in the GSSI w.r.t. Xprior for the

traditional and ideal scenario cases of the B-OD method .......................... 113

Figure 4.16: Percentage of improvement in the GSSI w.r.t. traditional method

for the ideal scenario cases of the B-OD method ...................................... 113

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xii

Figure 4.17: RMSE results w.r.t. Xtrue- Near-ideal, B-OD method ......................... 114

Figure 4.18: The percentage of improvement in the RMSE w.r.t. Xprior for near-

ideal B-OD method .................................................................................... 115

Figure 4.19: The percentage of improvement in the RMSE w.r.t. traditional

method for the near-ideal B-OD method ................................................... 115

Figure 4.20: StrOD results w.r.t. Xtrue- near-ideal B-OD method ........................... 116

Figure 4.21: The percentage of improvement in the StrOD w.r.t. Xprior for the

near-ideal B-OD method ............................................................................ 116

Figure 4.22: The percentage of improvement in the StrOD w.r.t. traditional

method for the near-ideal, B-OD method .................................................. 117

Figure 4.23: GSSI results w.r.t. Xtrue- near-ideal B-OD method ............................. 117

Figure 4.24: The percentage of improvement in the GSSI w.r.t. Xprior for the

near-ideal B-OD method ............................................................................ 118

Figure 4.25: The percentage of improvement in the GSSI w.r.t. traditional

method for the near-ideal B-OD method ................................................... 118

Figure 4.26: Proposed B-SP method ........................................................................ 120

Figure 4.28: RMSE w.r.t. Xtrue ,B-SP experiments ................................................. 124

Figure 4.29: Percentage of improvement in RMSE w.r.t. Xprior for the traditional

and B-SP experiments ................................................................................ 124

Figure 4.30: Percentage of improvement in the RMSE w.r.t. traditional method ... 124

Figure 4.31: StrOD w.r.t. Xtrue for the prior, traditional, and B-SP experiments .... 125

Figure 4.32: Percentage of improvement in the StrOD w.r.t. Xprior for the

traditional and B-SP experiments .............................................................. 125

Figure 4.33: Percentage of improvement in the StrOD w.r.t. traditional method .... 125

Figure 4.34: GSSI w.r.t. Xtrue for the prior, traditional, and B-SP experiments ...... 126

Figure 4.35: Percentage of improvement in GSSI w.r.t. Xprior for the traditional

and B-SP experiments ................................................................................ 126

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xiii

Figure 4.36: Percentage of improvement in the GSSI w.r.t. traditional method

for B-SP experiments ................................................................................. 126

Figure 4.37: RMSE comparison of the B-OD (ideal, near-ideal) and B-SP

methods with prior OD and traditional methods........................................ 128

Figure 4.38: StrOD comparison of B-OD (ideal, near-ideal) and B-SP methods .... 128

Figure 4.39: GSSI comparison of the B-OD (ideal, near-ideal) and B-SP methods

.................................................................................................................... 128

Figure 5.1: Non-assignment-based OD matrix estimation methodology ................ 138

Figure 5.2: Sketch of the toy network ...................................................................... 139

Figure 5.3: TMR network ........................................................................................ 140

Figure 5.4: Paths traversed by vehicles in simulation .............................................. 141

Figure 5.5: Traversed paths from all origins until link, l14....................................... 141

Figure 5.6: Possible paths from all origins until link, l14 ......................................... 141

Figure 5.7: The number of possible paths from all origins until the detector

locations of TMR network ......................................................................... 142

Figure 5.8: Schematic representation of an isolated intersection and associated

turning proportions..................................................................................... 143

Figure 5.9: Sample network used by Bar-Gera et al. (2006) ................................... 144

Figure 5.10: Convergence of RMSE for all cases .................................................... 149

Figure 5.11: Convergence of StrOD for all cases .................................................... 149

Figure 5.12: RMSE comparison with ........................................................... 150

Figure 5.13: StrOD comparison with ........................................................... 150

Figure 5.14: RMSE results for non-assignment-based and assignment-based

approaches.................................................................................................. 153

Figure 5.15: Percent improvement in RMSE with respect to Xprior - non-

assignment vs assignment-based methods ................................................. 153

Figure 5.16: Percent improvement in RMSE with respect to traditional method-

non-assignment vs assignment-based methods .......................................... 154

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xiv

Figure 5.17: GSSI results for non-assignment-based and assignment-based

approaches.................................................................................................. 154

Figure 5.18: Percent improvement in GSSI with respect to Xprior - non-

assignment vs assignment-based methods ................................................. 155

Figure 5.19: Percent improvement in GSSI with respect to traditional method-

non-assignment vs assignment-based methods .......................................... 155

Figure 6.1: Typical shape of sorted k-dist graph ...................................................... 163

Figure 6.2: Sample data points (left) along with kth nearest neighbour and k-dist

of all points (right) ..................................................................................... 164

Figure 6.3: Sorted k-dist graphs for k=1, k=2 and k=3 and the resulting clusters ... 164

Figure 6.4: Demonstration of two density levels through sorted k-dist plot ............ 165

Figure 6.5: Three level approach to cluster B-OD matrices .................................... 166

Figure 6.6: Sorted k-dist plots for experiment-1 ...................................................... 169



Figure 6.9: (a) Number of clusters vs MinPts and proportion of clusters; and (b)

vs for Subspace-1 of experiment-1 ..................................................... 170


vs for subspace-2 of experiment-1 ..................................................... 171





Figure 6.13: (a) Number of clusters vs MinPts and proportion of clusters; and

(B) vs for Subspace-2 of experiment-3 .............................................. 173

Figure 6.14: Box-Whisker plot demonstrating the difference among the typical

B-OD flows for OD pair – Mt. Gravatt and Brisbane CBD (results of

experiment-1) ............................................................................................. 174

Figure 6.15: Classification of day types ................................................................... 174

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xv

Figure 6.16: Comparison of clusters resulted from all three experiments ............... 177

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xvi

List of Tables

Table 3.1: Comparison results using the traditional metrics ...................................... 58

Table 3.2: GSSI and local SSIM values: Monday vs Sunday B-OD matrices .......... 67

Table 3.3: Algorithm 1 for Normalised Levenshtein distance for strings

comparison (see Figure 3.12) ....................................................................... 71

Table 3.4: Algorithm 2 for Levenshtein distance for OD matrices (see Figure

3.16) ............................................................................................................. 77

Table 3.5: Computation of Wasserstein distance for the example problem .............. 81

Table 3.6: Structural comparison of sample OD matrices using the proposed

metrics .......................................................................................................... 91

Table 4.1: Path flows for example network ............................................................... 95

Table 4.2: Demonstrating the difference between true and Bluetooth subpath

flows for the given example ......................................................................... 97

Table 4.3: Comparison of Xprior with Xtrue for all three replications ....................... 100

Table 5.1: Demonstration of equation (62) for l14.................................................... 144

Table 5.2: Paths and path flows for Bar-Gera et al. (2016) network ....................... 144

Table 5.3: Link flows at link, l5-2 estimated using the proposed approach .............. 145

Table 5.4: Comparison of link flows for the selected links ..................................... 151

Table 5.5: Comparison of OD demand flows .......................................................... 151

Table 5.6: Comparison of computational times: Non-assignment-based vs

assignment-based methods......................................................................... 156

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xvii

List of Publications

JOURNALS

Behara, K. N., A. Bhaskar, and E. Chung. Levenshtein distance for the

structural comparison of origin-destination matrices (Chapter 3 of thesis and under

review in Transportation Research Part C: Emerging Technologies).

Behara, K. N., A. Bhaskar, and E. Chung. Geographical window based

structural similarity index for OD matrices comparison (Chapter 3 of thesis and under

review in Journal of Intelligent Transportation Systems).

Behara, K. N., A. Bhaskar, and E. Chung. OD matrix estimation using observed

traffic counts and Bluetooth subpath flows (Chapter 4 of thesis and to be submitted to

Transportation Research Part C: Emerging Technologies by 31st July 2019).

Behara, K. N., A. Bhaskar, and E. Chung. A non-assignment-based approach to

estimate OD matrices using observed turning proportions and structural knowledge of

Bluetooth trips (Chapter 5 of thesis and to be submitted to IEEE Transactions on

Intelligent Transportation Systems by 7th Aug 2019).

Behara, K. N., A. Bhaskar, and E. Chung. Clustering multi-density OD matrices

datasets using structural proximity measures: A case study on Brisbane Bluetooth

based OD (Chapter 6 of thesis and to be submitted to a Q1 journal by 21st Aug 2019).

CONFERENCES

Behara, K. N., Bhaskar, A., & Chung, E. (2017). Insights into geographical

window based SSIM for comparison of OD matrices. In 39th Australasian Transport

Research Forum (ATRF), 27-29 November 2017, Auckland, New Zealand (abridged

version).

Behara, K. N., Bhaskar, A., & Chung, E. (2017). Classification of typical

Bluetooth OD matrices based on structural similarity of travel patterns- Case study on

Brisbane city. In Transportation Research Board 97th Annual Meeting, 7th-11th

January 2018, Washington D.C., USA.

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xviii

Behara, K. N., Bhaskar, A., & Chung, E. (2018). Novel approach for OD

estimation based on observed turning proportions and Bluetooth structural

information: Proof of the concept. In 40th Australasian Transport Research Forum

(ATRF), 30-31 October 2018, Darwin Convention Centre, Darwin, Australia

(abridged version).

Behara, K. N., Bhaskar, A., & Chung, E. (2018). Levenshtein distance for the

structural comparison of OD matrices. In 40th Australasian Transport Research Forum

(ATRF), 30-31 October 2018, Darwin Convention Centre, Darwin, Australia

(abridged version).

Behara, K. N., Bhaskar, A., & Chung, E. (2019). Estimating OD matrices from

observed trajectories and link counts. In World Conference on Transport Research -

WCTR 2019, 26-31 May 2019, Mumbai, India (abridged version).

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xix

Notations

It refers to the origin zone number e.g. oth origin

Number of zones which serve as origin points

It refers to the destination zone number e.g. dth destination

Number of zones which serve as destination locations

It refers to the OD pair e.g. wth OD pair

It is the number of OD pairs in the OD matrix; w ϵ W

OD vector to be estimated

Target OD vector

True OD matrix

Prior OD matrix

OD matrix in Aimsun format

The flows of wth OD pair in

The flows of wth OD pair in General dimensions of OD matrix whenever expressed in matrix form

Trips produced from oth zone

Origin flows vector to be estimated

Trips attracted to dth zone

It refers to the link number e.g. lth link

It is the total number of selected links in the network

It is the simulated/estimated flow on lth link

It is the observed flow on lth link

It is the estimated link flows vector of size L*1

It is the observed link flows vector of size L*1

It refers to the path number connecting wth OD pair

It refers to the path number connecting lth link with oth origin

It refers to the number of paths connecting wth OD pair

It refers to the number of possible paths connecting lth link with oth

origin It is flow on kth path

Kronecker Delta function. It is equal to 1, if lth link is present in kth

path, and 0 otherwise.

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xx

Weight factor of OD flows deviation from target OD matrix in the

objective function

Weight factor of link flows deviation in the objective function

It refers to the travel cost for lth link.

It is the path cost though kth path between wth OD pair

It is the cost on the shortest route for wth OD pair,

It represents the observed Bluetooth OD vector

It represents the observed Bluetooth subpath flows vector

It represents the consolidated vector of Bluetooth subpath flows

observed from several days of similar travel patterns Path flows from the model (Aimsun)

It represents the vector of OD flows that are Bluetooth connected

It represents the vector of true OD flows that are Bluetooth connected

Incidence matrix that converts X to X*

It is the proportional assignment matrix linking link flows with OD

User equilibrium assignment (link-proportion) matrix (either analytical

or simulated) User equilibrium path-proportion matrix (either analytical or

simulated) The proportion of Xw flowing in lth link

Local window ID Number of local windows

Likelihood

Error term for the OD matrix (difference between and X)

Error term for the link flows (difference between and Y)

Error term for the link flows (difference between and AX) It is a dispersion parameter to describe road users’ perception of travel

costs Link OD matrix

It is the incidence matrix; that is, a network-based information

Time slice

It is the trips generated from oth origin during tth time slice

It is the proportion of trips generated from oth origin to dth destination

It is the OD flow between oth origin and dth destination during time-

slice, t

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxi

Correlation coefficient between and Y Scale factor expressed as a sum of ratios of and

compares the mean values ( ) of the group of OD pairs (i.e. x

and y) from both matrices, X and Y. compares the standard deviations ( of the group of OD pairs compares the structure by computing correlation between the

normalised group of OD pairs (i.e. x and y) from both matrices, X and

Y. Sequence of Levenshtein edit operations to transform strings or sorted

kth Levenshtein edit operation

The Levenshtein matrix for comparing strings

It is set including that is ith preferred destination from oth origin

It is set including that is the corresponding demand value of

from oth origin It is the sorted set of destination IDs ( ) and the corresponding

demand from oth origin ( )

Penetration rate of Bluetooth inferred trips

Percentage of Bluetooth connected OD pairs

Step length at kth iteration

Objective function value

Step length parameter to scale-up by times Step length parameter to scale-down by times Turning Proportion matrix developed from observed turning

proportions It refers to the intersection number

It is the turning proportion observed at intersection present along

(kl,o)th path

It refers to the probability of origin flows passing through (kl,o)th path

and observed at lth link It is the total probability of trips generated from oth origin observed at

lth link

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxii

Abbreviations

ABS Australian bureau of statistics

ANOVA Analysis of variance

ATAP Australian transport assessment and planning

AVI Automatic vehicle identification

BCC Brisbane City Council

BMS Bluetooth media access control scanner

B-OD Bluetooth based Origin Destination matrix

BPR Bureau of Public Roads

B-SP Bluetooth based subpaths

BSTM Brisbane Strategic Transport Model

CBD Central Business District

CDA Combined distribution and assignment

DBSCAN Density-based spatial clustering of applications with noise

EBM Eigenvalue-based measure

EM Entropy maximisation

GEH Geoffrey E. Havers statistic

GPS Global positioning system

GLD Generalised Levenshtein distance

GLS Generalised least squares

GU Global Theil measure of fit

HTS Household travel survey

IM Information minimisation

ITS Intelligent transport systems

KF Kalman filter

LOD Levenshtein distance for OD matrices

LSQR Least squares

LW Long weekend

MAE% Mean absolute error percent

MAER Mean absolute error ratio

MAPE% Mean absolute percent error

GSSI Mean geographical window based structural similarity index

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxiii

ML Maximum likelihood

MLPP Most likely possible paths

NLOD Mean Normalised Levenshtein distance for OD matrices

MPAE Maximum possible absolute error

MSE Mean square error

NLD Normalised Levenshtein distance

OD Origin-destination

PH Public holidays

RE Relative error

RMSE Root mean square error

RMSN Normalised root mean square error

RSD Relative standard deviation

SSIM Structural similarity index

SA Statistical area

SCATS Sydney coordinated adaptive traffic system

SEQTS South East Queensland Travel Survey

SPSA Simultaneous perturbation stochastic approximation

SSIM Structural similarity index

StrUE Strategic user equilibrium

SATR Saturdays regular

SATSH Saturdays during school holidays

SUNR Sundays regular

SUNSH Sundays during school holidays

TAZ Traffic Analysis Zone

TMR Transport and Main Roads

TDD Total demand deviation

TLAP Time lapse aerial photography

WDR Weekday regular

WDSH Weekday during school holiday

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxiv

Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet

requirements for an award at this or any other higher education institution. To the best

of my knowledge and belief, the thesis contains no material previously published or

written by another person except where due reference is made.

Signature:

Date: ______19/07/2019___________________

QUT Verified Signature

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxv

Acknowledgements

I would like to thank the following people for their involvement at various stages

of my PhD journey.

My gurus – Dr Ashish Bhaskar and Professor Edward Chung. In Sanskrit – “gu”

means “darkness and ignorance” and “ru” means “that which removes”. Both terms

combined together forms the word “guru”. This PhD journey wouldn’t have been made

possible without continuous support of my gurus. They helped me to understand

myself and explore my strengths. Dr. Ashish, my principal supervisor, is a source of

inspiration. I have learnt a lot from him, both academic and otherwise, and for this I

am highly indebted. He has been always been very positive and encouraged brain-

storming discussions, the results of which have always been helpful for my research.

Professor Edward Chung, my associate supervisor, for his continuous guidance and

encouragement. I have been very lucky to receive guidance from him. I am thankful

for the many thought-provoking discussions that helped shape this thesis into a quality

piece of work. Special thanks to teachers from India - Dr Shriniwas Arkatkar and

Professor AK Sarkar who were the main reasons for beginning this PhD journey.

Minh and Gabriel for being sources of inspiration. Minh was the first person I

met at QUT and he has been part of my supervision team for some-time. Gabriel visited

QUT during his holiday to Brisbane, where we had a friendly chat and encouraging

discussion about his work and research achievements. His motivating words about

looking at my PhD as just a part of life helped me to think outside the box, which

helped me achieve to my aims with ease and passion.

All of my friends and colleagues, too many to list, who made my Australian

experience very pleasant. However, there are five of them very special to me –

Umashankar, Narendra, Kiran, Mahadeesh and Yasir. Uma was my dearest and closest

friend, always encouraged me. Both of us had planned to start an IES coaching centre

in India after finishing my PhD. However, I lost him in an unfortunate car accident

that should never have happened. He always reminds me that life is too short and make

utmost use of it every moment. Narendra, Kiran and Mahadeesh are excellent beings

whom I can always trust without a second thought. They have always been there

Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxvi

extending their support during the toughest rides of my journey. Yasir has always been

supportive, and a ready-to-help person at any moment. Every time we sit for some

good research discussion, I would feel more empowered and confident about my skills.

Queensland University of Technology for providing the HDR tuition fee waiver,

QUT postgraduate and top-up scholarships, and providing the necessary infrastructure

that fostered the research developments. The staff are very friendly, and my special

thanks go to the HDR research support team for providing guidance during the many

phases of my research at QUT. I am also very thankful to all those I have encountered

outside QUT during my stay in Australia. People here are very friendly, kind, and

helpful. I would also like to thank professional editor, Kylie Morris, who provided

copyediting and proofreading services, according to university-endorsed guidelines

and the Australian Standards for editing research theses. I am grateful to the thesis

examiners and reviewers of my academic papers who provided valuable comments

and appreciated my work.

Many thanks to my family members - my mother, aunts, uncles, grandparents,

sisters and cousins, who have always been very supportive. Especially my uncle

(srimammu) and my grandfather (thathagaru), who have constantly encouraged and

motivated me; both of them have always been my strength. My nephew – Akki entered

my life during this PhD journey – he is very very special to me!

Finally, this acknowledgement would be incomplete without conveying my

special thanks to my dear friend, well-wisher, role-model, motivator, and guide - the

all attractive Kṛṣṇa, the Supreme God Himself. This PhD thesis is a homage to Him.

म ना भव म ो म ाजी मा नम |

मामव िस स त ितजान ि योऽिस म || BG 18.65||

Always think of Me and become My devotee. Worship Me and offer your

homage unto Me. Thus, you will come to Me without fail. I promise you this because

you are My very dear friend.

Chapter 1: Introduction 1

Chapter 1: Introduction

This chapter discusses the background of this research (Section 1.1); the research

problem (Section 1.2); research motivation (Section 1.3); research questions, aim, and

objectives (Section 1.4); research significance and scope (Section 1.5); definitions

(Section 1.7); and finally, provides an outline of the thesis (Section 1.8).

1.1 BACKGROUND

With ever increasing urban sprawl, cities are witnessing more serious problems from

traffic congestion. Policy decisions to mitigate traffic problems can have a huge impact

on a nation’s economy, environment, and society (Australian Transport Assessment

and Planning (ATAP), 2017). For instance, the social cost of traffic congestion on

Australian roads for the year 2020 (predicted from the base year 2005 cost of $9.4

billion) is estimated to be nearly $20.4 billion (Transport & Economics, 2007). Figure

1.1 illustrates the near-linear escalation of the congestion cost (base case) for Australia.

Figure 1.1: Trends in the social cost of congestion in AUD for different scenarios (Transport & Economics, 2007)

It is extremely important to have an accurate estimation and prediction of travel

demand for strategic planning and control and for the success of any major transport

infrastructure projects, as the lack of such could result in huge economic losses. Thus,

the accurate knowledge of how, when, and where people move on the road network is


important before making any policy decisions. While this sounds simple,

understanding how a city moves is the most complicated process due to challenges

related to indirect, incomplete, and inaccurate measurements, and errors in modelling

realistic travel patterns.

1.1.1 Origin-Destination (OD) matrix

Fundamentally, a city is a geographical entity divided into many statistical zones.

While the geographical structure of any city implies spatial distribution of urban

centres connected by transportation networks, the distribution of travel demand

between different zones defines the structure of the travel patterns in a city. In transport

planning, travel demand between zonal pairs and their distribution pattern (also

referred as structure (see section 1.1.3)) is generally represented using an origin-

destination (OD) demand matrix (see Figure 1.2). The yellow coloured cell in the OD

matrix represents trips (say, by car) between the OD pair of Z1 and Z2, and similarly,

for other cells of the OD matrix.

Figure 1.2: Illustration of OD matrix for a spatial distribution of travel demand

The demand for an OD pair, which is the number of trips between an origin and

a destination, is a given number (for a time interval) equal to the sum of the path flows

in the paths connecting them. These flows can change due to changes in route choice,

but the total amount remains unchanged for that time. Broadly speaking, there are two

types of OD matrices: static and dynamic. For static OD matrices, the time-period

considered is sufficiently large (of order of hours) so that the traffic observed at the

detectors is from the demand departing during the same time interval. Every trip is

assumed to be completed within a single analysis time-period. On the other hand,

dynamic OD matrices assume a shorter time-period and the traffic observed at the

detector must be assigned to different departure time intervals.

0 4500

3500

0

OD matrix for a typical peak-period Spatial representation of OD pairs and directions of OD flows

Z1

Z2


The zones that produce trips are referred to as origins and those that attract trips

are destinations. In Australia, two types of zones are popularly used for strategic

planning: statistical areas (SAs) and traffic analysis zones (TAZs).

1.1.1.1 Statistical Areas (SAs)

According to Australian Bureau of Statistics (ABS) (ASGS, 2017), “A statistical

geography provides the extra dimension of location to statistics”. The ABS defines

the hierarchy of geographical areas for the release of statistical information. This

includes statistical areas (SA) for four levels: Statistical Area Level 1 (SA1) to

Statistical Area Level 4 (SA4). SA1 has a population of between 200 to 800 persons,

SA2 normally reflects the sub-urban level and is an aggregation of SA1, SA3 is

designed at the regional level and is an aggregation of SA2, and SA4 reflects the labour

market within each state and territory and is an aggregation of SA3. Figure 1.3

illustrates the SA2 (left) and SA3 (right) zones of the Brisbane City Council (BCC)

region (excluded Moreton Bay Islands in this study).

Figure 1.3: Statistical Areas of BCC region: SA2 (left) and SA3 (right)

While the term “statistical area” is popularly used in Australia, the urban

structures around the world have their own representations of zonal hierarchy. For

instance, the zones in the US are referred as metropolitan statistical areas (USCensus,

2019), and refer (Naoki, 2013) for the hierarchy of geographical boundaries defined

for Japan cities.


1.1.1.2 Traffic Analysis Zones (TAZs)

The geographical units generally used in transport planning models are referred

to as traffic analysis zones (TAZs). In Australia, a TAZ generally covers a population

of approximately 3,000 people. The number of trips between zonal pairs is dependent

on the size and shape of the zone. In addition to population, other factors that

differentiate TAZs from SAs are potential future alternatives to existing road

infrastructure, network details, etc. The size of the zones is smaller within the central

business district (CBD) region and larger in far-away suburbs and rural/regional areas.

Each TAZ can be a combination of SA1s and/or each SA2 can contain multiple

portions of TAZs. The Brisbane Strategic Transport Model (BSTM)1 considers the

TAZ boundaries of Greater Brisbane area, which includes Brisbane City Council (key

partner), Redland City Council, Logan City Council, Ipswich City Council, and

Moreton Bay Regional Council, as shown in Figure 1.4: (BSTM, 2015).

Figure 1.4: TAZs in Greater Brisbane region (BSTM, 2015)

1.1.2 OD matrix estimation problem

As the complete distribution of travel demand across the network cannot be

observed directly, an OD matrix needs to be estimated. In practice, traffic demand

forecasts rely heavily on the base (reference) year OD matrix and establishing this is

critical before implementing any major transport projects (ATAP, 2016b).

1 BSTM is a multi-model transport model for medium to long-term strategic planning for the Greater

Brisbane region. The model aids transport planners to estimate/forecast and assess travel patterns and

behaviour across the region.


Traditionally, the base year OD demand for large scale networks is generally estimated

using four step model. However, there have been many concerns over the effectiveness

of four-step modelling approach. For example, they were unanswerable to road

congestion problems in the 1990s and it led the US Department of Transportation to

heavily sponsor Travel Demand Improvement Programs with more focus on activity-

based travel demand models (McNally, 2008). Although to date, there have been no

alternative frameworks to defy the theoretical construction of activity-based models,

researchers have still been working to enhance their predictive capabilities. The main

difficulty with the activity-based approach is combinatorics involved in multi-

dimensional choice modelling at an individual (agent) level and relies heavily on travel

surveys. On the other hand, seamless observations of traffic counts are able to provide

up-to-date information related to traffic demand and thus, researchers have begun to

consider OD estimation as an optimisation problem.

The following sub-sections provide insights into the overview of OD matrix

estimation, the optimisation modelling approach, the structural significance of OD

matrix/trips, and the role of advanced data sources in OD matrix estimation problems.

1.1.2.1 Overview of OD matrix estimation

An overview of the OD matrix estimation process is shown in Figure 1.5, where

the key elements are: inputs (observed link flows, target OD matrix, and any other

measurements, such as path flows, travel time, etc.), optimisation model (solution

algorithm and assignment model), outputs (OD matrix estimates, user equilibrium link

flows, and travel time, etc.), and a reliability check of the OD matrix estimates using

performance measures (e.g., root mean square error (RMSE) etc.).

Figure 1.5: The overview of OD matrix estimation process

Inputs

Outputs

Optimisation model

Reliability check using performance

measures


1.1.2.2 Optimisation model

Most studies generally adopt a bi-level framework for the OD matrix estimation

process, as represented using a flowchart in Figure 1.6. In the bi-level formulation, the

upper level minimises the objective function formulation (generally deviations of

traffic counts) and lower level runs traffic assignment (generally user-equlibrium). The

assignment and OD matrix are inter-dependent on each other, and as such, the former

plays a significant role in the OD matrix estimation process. The optimisation process

begins with a prior OD matrix that is generally developed from traffic surveys and

socio-economic data in a four-step transport modelling approach. The traffic

simulation model (which could be built in Aimsun (2019)) considers the prior OD

matrix as an input, and runs traffic assignment (either as a stochastic route choice or

dynamic user equilibrium) over the study network.

Figure 1.6: Traditional bi-level framework

The most general outputs of this simulation are the user-equilibrium link flows

and assignment matrix. Once the lower-level assignment is complete, the value of the

objective function; that is, the deviations between the user-equilibrium (estimated) and

observed link flows, is computed in the upper-level formulation. The OD matrix is

then updated using any popular search direction techniques (such as gradient-based

methods) to estimate the OD matrix for the next iteration. In this way, the OD matrix

is constantly updated until the convergence criteria are reached. Further details about

Traffic Survey Socio Economic data

OD matrix (X)

User Equilibrium Assignment

(Simulator or Analytical)

Obs. Link flows ( )

Est. Link counts ( )

Minimizing deviationUpdate

Upper Level

Lower Level

Bi-level framework


the bi-level estimation process are provided in Chapter 4. Earlier bi-level methods

depended on analytical models for assignment. However, it is preferable to choose

simulation-based assignment for its capacity to model realistic congestion effects over

the network.

Objective function formulation

The traditional method for expressing link counts deviation within the objective

function (Z1), as described by Spiess (1990), is shown in Equation (1).

Z= (1)

Where the modelled link flows (Y) for every iteration are retrieved from the

simulation, and is the observed link flows.

The size of OD matrix is generally far greater than the size of the link flows

vector. As such, there is an imbalance between unknowns and knowns, and the

traditional traffic counts-based formulation leads to the problem of under-determinacy.

Thus, most previous studies (Cascetta & Postorino, 2001; Yang, 1995) have used

deviations from the target OD matrix ( ) as an additional objective in the formulation,

as shown in Equation (2).

Z= (2)

Where, and are the weight factors given to the corresponding objectives,

and X and are the estimated OD matrix and target OD matrix, respectively.

Mapping relationship between link flows and OD matrix

Since the observed traffic counts are indirect measurements of OD matrix, a

mapping relationship between observed link flows and the OD matrix must be present.

Fundamentally, traffic counts on any link are the result of the OD matrix assigned over

a network. Thus, the relationship between both is an assignment model (de Dios

Ortuzar & Willumsen, 2011). Different types of traffic assignment models have been

used in transport modelling, such as all-or-nothing assignment, incremental

assignment, capacity restraint assignment, user-equilibrium assignment, stochastic

user equilibrium assignment, system optimal assignment, etc. (Patriksson, 2015).

In general, traffic assignment is modelled using Wardrop’s (1952) user

equilibrium principle. According to this, users choose routes in such a way that it


minimises their travel cost, and it is assumed that the decisions of route choices at an

individual level creates an “equilibrium” at the network level. The link flows are said

to be in equilibrium when no user can further improve his/her travel cost by unilaterally

shifting to any other route. This state is referred to as Wardrop’s (1952) user

equilibrium. The results of this assignment are the user-equilibrium link flows, and the

matrix that represents the proportions of OD flows passing through the selected links

is referred to as either a link-proportion matrix, or in general, an assignment matrix.

The general relationship between link flows and the OD matrix is shown in Equation

(3).

(3)

Equation (3) shows that the assignment matrix is dependent on OD flows and is

generally obtained as an output of the simulation.

1.1.3 The “structure” of OD matrix/trips

The definition of the word “structure” refers to “the arrangement of and relations

between the parts or elements of something complex” (Oxford, 2018). In general, the

word “structure” is used either with respect to material structure (either man-made or

natural) or an abstract structure. “Material structure” refers to the arrangement of

physical things. In transportation terms, the road network is an example of man-made

structure (Dandy, Daniell, Foley, & Warner, 2017). An “abstract structure” basically

includes the precise rules of behaviour, such as chords of music (Cooper, 1977) or in

transport terms – the travel behaviour of commuters during working weekdays

(Hensher, 1976).

In this study, the structure of an OD matrix is defined as “the arrangement of and

the correlation that exist between OD pairs within the OD matrix”. To avoid

ambiguity, in this research the following terms are defined:

a) Structure is the skeletal framework of the OD matrix, where the skeleton is

expressed as the preference/arrangement of the destinations from each origin. For

instance, refer to Figure 1.7, where the skeleton/structure of the OD matrix (shown at

the top) is illustrated in Figure 1.7a. Here, the columns for each row (origin) is arranged

in order of the destination preferences. The correlations, if exist, between OD pairs

due to sharing similar activities, geographical zones, trip productions/attractions, etc.

are referred as structural correlations (Antoniou et al., 2016).


b) The OD flows corresponding to the structure (skeleton) of the OD is termed

as mass. The corresponding mass for the structure illustrated in Figure 1.7a is

presented in Figure 1.7b.

Figure 1.7: Demonstration of (a) the skeleton/structure of OD and (b) corresponding mass/OD flows

Different methods are used to quantify the similarity between two OD matrices.

If the structure of the OD matrices is also considered in the similarity estimation, then

it is termed as structural similarity. Two OD matrices have perfect structural

similarity if their structures are similar with zero differences in the OD flows. Perfect

structural similarity is possible only when the OD matrices are exactly the same.

One of the ways to capture the skeleton/structure of OD matrix is through

correlation coefficient. The relationship between the correlation coefficient and the

preference/arrangement of destinations can be explained with an example shown in

Figure 1.8, where the two OD matrices represent the distribution of trips during; for

example, Sunday and Australia Day. The order of destination preferences is same; that

is, A, B, and C during both days, and both OD matrices have highest correlation

coefficient and it is equal to one.

Figure 1.8: Example of OD matrix structural dimension

D1 D2 D3 D4

O1 3 4 6 10

O2 7 4 5 11

O3 12 8 5 6

O4 13 7 9 6

Dest. Choice-1

Dest. Choice-2

Dest. Choice-3

Dest. Choice-4

O1 D4 D3 D2 D1

O2 D4 D1 D3 D2

O3 D1 D2 D4 D3

O4 D1 D3 D2 D4

Dest. Choice-1

Dest. Choice-2

Dest. Choice-3

Dest. Choice-4

O1 10 6 4 3

O2 11 7 5 4

O3 12 8 6 5

O4 13 9 7 6

Skeleton/structure of OD matrix Mass/ OD flows on Skeleton/Structure

OD matrix

(a) (b)

A B C A B CO1 200 100 50 O1 160 80 40: :: :

A B CO1 4 2 1::

Sunday OD Australia Day OD

Skeleton/Structure

Scaled down by 50 times

Scaled down by 40 times


Since correlation coefficient performs on the normalised values, let’s see the

skeleton/structure of the normalised flows in both OD matrices. To achieve this,

Sunday OD and Australia OD are normalised by scaling down by 50 times and 40

times, respectively. It can be shown, in Figure 1.8(bottom), that the “skeleton” of both

OD matrices are the same. In other words, although both OD matrices have different

sets of individual OD flows, they have same skeleton or structure and it is reflected in

the same preferences of destinations and correlation coefficient. Note that in this

example uniform scale factor is assumed for all OD pairs for ease of demonstration

only.

The significance of “structure” based information is that it yields classification

through patterns. For instance, in biology, the structure of organisms is analysed, and

the classification is based on the similarity of patterns defined by their structures

(Kroeber, 1943). Similarly, if there are a large number of observations from a specific

study region, there is an inherent structure attached to those observations. For example,

the structure of trips helps to classify travel patterns by analysing the demand

variations among different days or different times of a day. Because an OD matrix

defines the structure of travel patterns between different geographical locations, a

structural comparison of OD matrices also reflects structural comparison of travel

patterns.

Most transport planning models depend on the knowledge of travel patterns

expressed in terms of OD matrices for both short-term intelligent transport systems

applications, such as effective route guidance strategies, etc. (ATAP, 2016a), and long-

term strategic planning, such as transport network planning and service design. The

knowledge of travel patterns is also helpful for certain policy decisions, such as

shifting public holidays towards weekends. In Japan, as a strategic move to improve

the nation’s ailing economy, public holidays have been shifted to long weekends

(Chung, 2003).

Although the importance of OD matrix structural information is acknowledged

in the literature (see Section 2.4), the measures adopted to maintain structural

consistency during the OD estimation process are either based on traffic counts

measurements or travel surveys. It is possible that neither of these approaches may

capture the true structure of OD matrix because: a) the traffic counts on any link are

only point-based observations and cannot capture the distribution (structure) of trips


over a larger spatial context; and b) the constraints, such as trip productions/attractions

or ratio of OD flows or deviations from target OD matrix, are generally based on a

target OD matrix that is generally outdated. The most popular performance measures

are generally based on deviations of individual OD flows, and therefore cannot capture

the structure of the OD matrix.

1.1.4 Advanced traffic data sources

The availability of automated traffic counts has lessened the burdens of

cumbersome conventional methods of traffic modelling. Despite many issues and

challenges, the traffic counts-based approach has been widely adopted, mainly due to

the unavailability of alternative data sources at a larger spatial-temporal scale.

However, with advancements in information and communication technologies,

such as Bluetooth, GPS, smart cards, e-tags, mobile phones, etc., travel behaviour

research has garnered a large amount of interest compared to conventional methods.

Knowledge about travel demand obtained from big data sources may not reveal a

detailed demographic and contextual picture about commuter trips, but could provide

high spatial and temporal resolution when compared to travel surveys (Toole et al.,

2015). In cities such as Brisbane, Bluetooth data sets are currently used for travel time

and speed analysis (Bhaskar & Chung, 2013). However, with a good penetration rate

and detection layout, Bluetooth observations can also be used to construct and estimate

the OD matrices for large scale networks (Barceló, Gilliéron, Linares, Serch, &

Montero, 2012; Carpenter, Fowler, & Adler, 2012; Michau, Pustelnik, et al., 2017).

Thus, the era of big data is creating new avenues for the development of alternative

methods. The following section provides some detailed insights about the Bluetooth

data from Brisbane, Australia.

1.1.4.1 Brisbane Bluetooth data

The Brisbane City Council (BCC) region is equipped with more than 1200

Bluetooth media access control scanners (BMS), the locations of which are shown in

Figure 1.9.


Figure 1.9: Location of Bluetooth Scanners within the BCC region

Most Bluetooth observations are taken from cars equipped with Bluetooth

devices (Bhaskar et al., (2015)). In Brisbane City, traffic signal boxes at a traffic

intersection are generally connected to both magnetic loop detectors and Bluetooth

MAC Scanner (BMS). Figure 1.10 provides a snapshot of a Bluetooth equipped

vehicle approaching an intersection equipped with magnetic loop detectors and BMS.

The BMS detects the device MAC ID (of Bluetooth equipped car) and time-stamp of

detection within the scanning range of roughly 100 meters (Bhaskar & Chung, 2013).

Figure 1.10: BMS and loop detectors at an intersection in Brisbane city

The format specifications of raw Bluetooth data for the Brisbane region

(Bluetooth data from Brisbane City Council, 2016) are shown in Table 1.1. Here,

Select ID is the record number, Device ID is the encrypted MAC-ID of the Bluetooth

device, Area ID is the ID of the scanner location, the day and time-stamp are the

columns representing the day of the month (it is 7th day of March in Table 1.1) and

the time when the device was detected in the communication range of Bluetooth


scanner. The last column, Duration, represents the difference between the time of the

first and last discovery of the Bluetooth device. In this study, data from Duration was

not required for constructing trajectories and OD matrices. Table 1.1 shows that 35 is

the Device ID that traverses along the Area ID beginning from 10110 till 10277 at

approximately 7:30 A.M. -7:45 A.M.

Table 1.1: Sample Bluetooth data from Brisbane, Australia (Bluetooth data from Brisbane City Council, 2016)

Select ID Device ID Area ID Time-stamp Duration(s)

25055749 35 10110 2016:03:07 07:31:21 20

25055996 35 10224 2016:03:07 07:31:41 20

… … … … …

… … … … …

25113737 35 10277 2016:03:07 07:43:15 58

The raw Bluetooth data shown in Table 1.1 can be used to estimate: a) travel

times, b) turning proportions, c) retrieve trajectories, and d) OD flows between two

locations.

Figure 1.11 demonstrates the consistency of the Bluetooth turning proportions

as compared to Sydney Coordinated Adaptive Traffic System (SCATS) traffic counts

data from an intersection in the Milton area, Brisbane (Chung, 2016).

Figure 1.11: Comparison of turning proportions: Bluetooth vs SCATS (Chung, 2016)

Figure 1.12 illustrates the consistency of Bluetooth inferred trajectories

represented as a sequence of BMSs; that is, 1107-1168-1064-1513 (represented by


green coloured pins) over a period of four regular working weekdays in the month of

July 2014.

Figure 1.12: Consistency of Bluetooth trajectories during regular weekdays

1.2 RESEARCH PROBLEM

There are several challenges and problems associated with respect to OD matrix

estimation. Some of the most challenging research problems have been identified in

this study, and these are discussed in further detail in the following sections.

1.2.1 Problem of under-determinacy

Because the number of OD pairs is far greater than the number of equations

mapping the relationship between OD flows and link flows, there can be no unique

solution of OD matrix estimate (Antoniou, et al., 2016). In other words, when loaded

into the network, many OD matrices can reproduce similar set of link counts. For

instance, Figure 1.13 shows that the size of the OD matrix (i.e., 4*1) is greater than

the size of link flows vector (i.e., 2*1). Due to this imbalance, more than one OD

matrix can produce the same set of link flows. For example, if the link flows observed

at detector-1 and detector-2 are y1 and y2; flows between O1-D1, O1-D2, O2-D1 and

O2-D2 are represented by x1, x2, x3 and x4, respectively. Therefore, y1 is the result

of OD flows x1 and x2, and y2 is the result of x3 and x4. Thus, multiple combinations

of x1 and x2 can produce the same y1 flows. This is also the case with link flows in

another detector. This example clearly highlights the problem of under-determinacy in

traffic counts-based OD matrix estimations.


(a)

(b)

Figure 1.13: Demonstration of under-determinacy problem using (a) example network, (b) feasible solutions

Researchers previously introduced a target OD matrix within the objective

function (Cascetta & Nguyen, 1988; Yang, 1995) to minimise the problem of under-

determinacy and maintain structural consistency in the solution estimates. It was

assumed that an a priori (target) matrix contains important structural information; that

is, the patterns of trip distribution. Because the actual OD matrix is generally

unobserved, the structural consistency within the estimates can be preserved by

minimising the deviations between target and estimated matrices. However, by doing

so, the solution search space tends to be biased around target OD matrix, and this may

not improve the quality of the OD estimate because the target matrix is often

constructed from outdated surveys (Yang, 1995).

1.2.2 Mapping relationship between link flows and OD flows

The mapping relationship between link flows and OD flows (or assignment

model) often relies on the assumptions of route choices between OD pairs. The

“assignment model” is in itself a broad area of research, and realistic assignment of

the traffic on the network remains a challenging research problem (Balakrishna, Ben-

Akiva, & Koutsopoulos, 2007; Ben-Akiva, Gao, Wei, & Wen, 2012; Shafiei, Gu, &

Saberi, 2018; Toledo et al., 2003). The most common issues related to the assignment

matrix formulation in OD matrix estimation methods are:

O1

O2

D1

D2

Detector-2 (y2)

Detector-1 (y1)

y1=1000y2=2000

x1=300x2=700x3=1200x4=800

x1=600x2=400x3=500x4=1500

x1=ax2=1000-ax3=bx4=2000-b

…………………

Observed Link counts

OD matrix-1 OD matrix-2 OD matrix-i

……


The assignment matrix and OD matrix are mutually dependent on each other (see

Equation (3)). The first estimated assignment matrix is generally dependent on the

prior OD matrix, and both the OD and assignment matrices are mutually updated

until convergence. However, if the structure of the prior OD matrix (generally an

outdated matrix) or target OD matrix is poor, then the convexity condition might

not be satisfied and a perfect Stackelberg condition (see Section 2.2.2.3) might not

be obtained (Kim, Baek, & Lim, 2001).

The link flows are a function of the OD flows. Their relationship (i.e., assignment)

is non-separable and generally obtained from simulation (Antoniou, et al., 2016).

As such, the bi-level method is also treated as a non-convex problem.

The objective of most assignment-based methods is to match the deviations

between observed link flows with the user-equilibrium flows. Because the

observed flows may not always represent a user-equilibrium state, the deviations

between both might not be justified (Yang, 1995).

1.2.3 Computation cost

The fourth challenge arises from the computational cost associated with the size

of the OD optimisation problem (Osorio, 2017). For a smaller sized network, such as

intersections and linear networks, an assignment matrix is not complex, as it does not

involve route choices. However, for the large-scale networks, an assignment matrix

(either analytical or simulation based) plays a crucial role in increasing the OD

dimensionality problem. The complexity further increases in the case of dynamic OD

matrix computation due to the additional temporal-dimension involved (Djukic, Van

Lint, & Hoogendoorn, 2012). For instance, a city like Brisbane has around 1,500

Traffic Analysis Zones (TAZs) that contribute to around two million OD pairs to be

estimated for static OD matrix. Estimating dynamic OD matrices for four consecutive

time periods implies that the number of OD variables is eight million, which demands

a high computational requirement. Note that for simulation-based optimisation

algorithms, dimensions of the order 200 are generally considered to be high-

dimensional (Wang, Wan, & Chang, 2016).

Bi-level formulation is computationally expensive due to the complex user-

equilibrium assignment required for every iteration. Furthermore, the inseparable non-

linear relationship between the assignment matrix and OD matrix is a major hindrance


for solving upper-level objective function, and updating the OD matrix is consequently

quite challenging. Some researchers have proposed linearizing the assignment as an

alternative solution. However, linearization requires two assignment solutions, which

means two simulations per iteration, which further adds to the computational cost

(Maher, Zhang, & Van Vliet, 2001).

1.2.4 Lack of potential performance measures

In the literature, less attention has been paid to developing statistical measures

for structural comparison of OD matrix estimates. Performance measures, such as

RMSE, etc., are widely used in practice because they are mathematically convenient

and simple to use. However, the major limitation of most traditional metrics is that

they compare individual cells of OD matrices and do not compute statistics on groups

of OD pairs that are correlated (Djukic, Hoogendoorn, & Van Lint, 2013). Section

1.1.3 emphasised the importance of the inherent structural information of an OD

matrix. See Figure 3.1 and Table 3.1, for an example of the values of traditional metrics

being the same for both OD matrices, although they are structurally different.

1.2.5 The need for typical OD matrices that represent typical travel patterns

Most traditional transport planning models focus on mode-specific, trip-

purpose-based, and time-of-the-day OD matrices, but are limited to weekday and

weekend patterns only (ATAP, 2016a). The Household Travel Survey (HTS) for South

East Queensland (SEQTS, 2010) was conducted for over 10 weeks from mid-April

through late-June and in July in 2009. However, the survey period avoided the days

during School/University holidays which means travel patterns during that period were

unobserved. In addition to that the seasonal variations in the demand patterns is also

generally unknown because the survey is conducted only during a particular period of

the year. To estimate typical OD matrices, the typical travel patterns need to be

identified first. In this context, the following questions with respect to travel patterns

are intriguing:

What are the major travel patterns observed other than weekdays and

weekends?

How do travel patterns during Saturdays differ from those of Sundays?


Are travel patterns during public holidays different from those on

weekends?

Do school holidays during weekdays having different patterns from

regular working weekdays?

Are there any seasonal trends in travel patterns?

Estimating typical OD matrices that are representative of the aforementioned

travel patterns is not easy using current state-of-the-art techniques because: a)

traditional surveys are expensive, and as such, they are only conducted for weekdays

and weekends; and b) traffic counts data from loop detectors are point-based

measurements, and as such, they cannot provide the trip distribution information

necessary for travel patterns analysis.

Thus, there is a great need for advanced sources of data that can provide seamless

trip distribution related information to better understand the structural changes in travel

patterns over a network in both space and time.

1.2.6 Unknown penetration rates of trips inferred from advanced data sources

Lastly, emerging traffic data sources, such as Bluetooth, etc., capture only a

fraction of the actual demand, providing incomplete information about trips, and most

importantly, the penetration rate of Bluetooth trips is unknown due to the

unavailability of ground truth (Bhaskar & Chung, 2013). Bluetooth observations are

also random due to many factors, such as the socio-economic characteristics of zones,

distance between the scanners, speed of vehicles, etc. (Michau, Nantes, & Chung,

2013).

In the past, few efforts have been made to exploit Bluetooth data, and these are

limited to travel time observations only (Antoniou et al., 2014; Barceló, Montero,

Bullejos, Serch, & Carmona, 2013). From the validation perspective of Bluetooth OD

flows, previous studies have been limited only to intersections due to the availability

of ground truth from observed entry and exit counts (Carpenter, et al., 2012; Chitturi,

Shaw, Campbell IV, & Noyce, 2014). In terms of trajectories, although Michau et al.

(2016) developed a method to estimate OD flows, the method is not practical, because

the penetration rate of Bluetooth counts is considered a proxy for the penetration of

OD flows, which is not true in general. This is because the trajectories inferred from


Bluetooth do not represent complete trip sequences. In other words, the actual origins

and destinations of the trips cannot be observed from Bluetooth. Zhou and

Mahmassani (2006) proposed method to avoid estimating penetration rates of link to

link split fractions. However, no technique has been proposed until now to use the

vehicle trajectories (say from Bluetooth) information into the OD estimation

formulation without the need to estimate the unknown penetration rates.

1.3 RESEARCH MOTIVATION

Although past studies (Michau, et al., 2016) have foreseen the practical

applications, there has not yet been any direct implementation of Bluetooth flows into

OD matrix optimisation models. The transport departments in most metropolitan

cities, especially in Brisbane, Australia (Department of Transport and Main Roads

(TMR) and Brisbane City Council (BCC)), are working towards data-driven

approaches for traffic demand estimation and prediction (TMR, 2017). Both TMR and

BCC have supported transport-related research by sharing encrypted Bluetooth data

with the Queensland University of Technology under a license agreement for many

years. Brisbane is one of the few cities in the world collecting massive quantities of

big traffic data over a larger spatial and temporal context (TMR, 2017). The challenges

of OD matrix estimation coupled with the availability of emerging data sources, such

as Bluetooth, forms the key motivation for exploring new perspectives into the OD

matrix estimation problem in this study. This research is focussed on estimating

vehicle trips through Bluetooth observations. Since most of the Bluetooth trips are

inferred from the Bluetooth equipped cars, the unit of travel demand can be considered

as car trips.

1.4 RESEARCH QUESTIONS, AIM, AND OBJECTIVES

Based on the research problems, this study aims to answer the following research

questions:

RQ1: How can the structural comparison of OD matrices be achieved?

RQ2: How can Bluetooth data be incorporated into the exiting OD matrix

estimation process?

RQ3: How can Bluetooth data be used to address the challenges of bi-level

optimisation methods?


RQ4: How can Bluetooth data be used to infer typical travel patterns of large-

scale networks?

This research aims to develop statistical metrics for the structural comparison of

OD matrices; develop methodological approaches to improve the quality of OD matrix

estimates using big-traffic data (Bluetooth and loop-detector); and cluster Bluetooth

based OD matrices to identify typical travel patterns for large-scale networks.

Corresponding to the research questions above, the objectives are:

Objective-1: To develop statistical metrics for the structural comparison of

OD matrices. While traditional metrics account for deviation of individual

OD flows, the developed metrics should account for the structure of OD

matrix/trips distribution.

Objective-2: To develop methods for incorporating structural information of

Bluetooth trips into the bi-level OD matrix estimation.

Objective-3: To advance the OD estimation methodology by relaxing the

dependence on assignment matrix using big traffic data.

Objective-4: To devise a methodological approach for clustering Bluetooth

based OD (B-OD) matrices and identify typical travel patterns for large-scale

networks using a case-study application on the BCC region.

1.5 RESEARCH METHODOLOGY

Followed by a comprehensive literature review in Chapter-2, the methodology

was systematically defined using five tasks that address the objectives and research

questions, as shown in Figure 1.14. These tasks include:

Task-1: This task was used to develop new statistical metrics and addresses RQ-

1 and Objective-1.

Task-2: This task was used to develop assignment-based methods and addresses

RQ-2 and Objective-2.

Task-3: This task was used to develop non-assignment-based method and

addresses RQ-3 and Objective-3.

Task-4: This task was used to develop a detailed methodological approach to

cluster B-OD matrices, and identify typical travel patterns for large scale


networks with a case study application on the BCC region. This addresses RQ-

4 and Objective-4.

Further insights into the individual tasks are discussed in the following sections.

Figure 1.14: Research methodology framework

1.5.1 Task-1

This task focussed on developing statistical metrics for the structural comparison

of OD matrices after discussing the limitations of existing metrics. The fundamental

concepts of the proposed metrics; that is, the Mean Geographical Window based

Structural Similarity Index (GSSI) and Mean Normalised Levenshtein distance for OD

matrices (NLOD) were borrowed from other disciplines and extended to exploit the

structural information of OD matrices. In the end, the robustness of the proposed

metrics is tested through sensitivity analysis.


1.5.2 Task-2

This task aimed to develop an assignment-based OD matrix estimation methods

using additional structural knowledge of Bluetooth trips. Here, the Bluetooth trips

were incorporated through structural comparison of estimated and Bluetooth OD/path

flows within the objective function formulation. Task-2 was further divided into two

methods - B-OD method and B-SP method.

Bluetooth OD (B-OD) method: Here, the objective function formulation

included the structure of Bluetooth trips expressed in terms of Bluetooth OD

flows. It is further divided into two scenarios – ideal and near-ideal. Both

scenarios are based on B-OD flows built with the assumption that the

Bluetooth trip ends represented the true trip ends, and is suitable for

networks (in cities such as Brisbane) highly equipped with Bluetooth

scanners.

o The ideal scenario assumed that the structure of the Bluetooth based

OD matrix represented the exact structure of true OD matrix with a

fixed (20%) penetration rate of Bluetooth OD flows.

o The near-ideal scenario assumed randomness in the structure of a

Bluetooth based OD matrix developed by randomly selecting 20% of

Bluetooth trajectories.

The B-OD method was tested for different percentages of Bluetooth connected

OD pairs in a controlled environment established in Aimsun.

Bluetooth subpath (B-SP) method: Here, the objective function formulation

included the structure of Bluetooth trips expressed in terms of subpath

flows. The B-SP method was close to reality because it included the actual

Bluetooth observations, which were only a sample, random and incomplete.

Since it is based on subpath flows, this method works even when Bluetooth

trip ends do not represent true trip ends. This method was tested for different

penetration rates of Bluetooth inferred trajectories in a controlled

environment established in Aimsun Next. This method can be applied over

networks equipped with less density of scanners.


1.5.3 Task-3

This task developed a non-assignment-based OD matrix estimation method.

Specifically, the task focussed on:

A methodology to replace the traditional assignment-based mapping

relationship between link flows and OD flows with observed turning

proportions-based relationship.

Maintaining the structural consistency using additional knowledge of the

Bluetooth OD structure in the objective function (similar to near-ideal

scenario in the B-OD method used in Task-2).

In this part of research, the experiments were designed for different percentages

of Bluetooth connected OD pairs. The methodology was tested on a sample network

with sufficient route choice options and a real network. To validate the results, the true

observations of OD flows, link flows and Bluetooth OD flows were obtained from

simulation in Aimsun Next (2019), and are compared with the results of both non-

assignment-based and assignment-based methods.

1.5.4 Task-4

This task developed a methodological approach to identify typical travel patterns

by clustering multi-density B-OD matrices. The methodology to cluster B-OD

matrices specifically included:

Deploying proposed statistical metrics (explained in Chapter 3) as structural

proximity measures in a simple three-level approach. The methodology

identifies optimum DBSCAN parameters and clusters multi-density datasets

(OD matrices).

A practical demonstration of the proposed clustering methodology with a

case study application using real Brisbane Bluetooth data from 415 days.

1.6 SIGNIFICANCE AND SCOPE

This thesis contributes to the future of travel demand estimation by exploiting

the additional knowledge from emerging data sources, such as Bluetooth. The

significance of this research is two-fold:


1. Improve the current practice of OD demand estimation: State-of-the-art

techniques and practice depend greatly on observed traffic counts for traffic

demand estimation. However, accurate estimation of traffic counts close to

observed ones does not guarantee the correct estimation of traffic demand

due to under-determinacy. Thus, additional knowledge from big traffic data

should fill this gap.

2. Alternatives to demand modelling: Traffic modelling is a computationally

intensive and an expensive process. There is growing interest in alternative

techniques that are mostly data-driven. With the availability of pervasive

data sets, a good blend of theoretical and empirical models should reduce

the dependence on developing complex mathematical models for traffic

simulation. Data-driven empirical models that use partial observed OD/path

flows information can relax the assumptions involved in an assignment

matrix that maps the relationship between unobserved OD flows and

observed traffic counts. This has huge computational benefits, especially in

the space of dynamic OD estimation.

While this study focusses on the major issues of state-of-the-art techniques and

practice, it has the following limitations:

1. This study is limited to static OD demand estimation only. However, it can

be extended to dynamic and quasi-dynamic models.

2. This study focusses only on improving the objective function formulation

and does not focus on new solution algorithms. Classical gradient descent

algorithm was used for testing the proposed approaches. However, the

proposed objective function can be readily used in state-of-the-art

algorithms, such as simultaneous perturbation stochastic approximation

SPSA (see Section 2.3).

3. The non-assignment-based methodology was based on the assumption that

turning proportions are available at all intersections. The formulation was

based on Bluetooth OD flows and not on Bluetooth subpath flows.


1.7 DEFINITIONS

The definitions of a few key terms used in this study are provided below:

Assignment matrix: is the mapping relationship between the OD matrix and link

flows. Thus, it is also referred to as the link-proportion matrix because it represents the

proportion of the OD flows passing through a link.

Bluetooth subpath: is the Bluetooth trajectory represented as the sequence of

BMSs that is a part of the actual trip sequence.

Geographical window: refers to the group of OD pairs within the geographical

boundaries of a higher-order OD pair.

Link flows/traffic counts: the flows observed on any link are referred to as link

flows.

Local window: the local window from an OD matrix refers to a sub-matrix that

consists of group of OD pairs.

OD matrix: is a tableau representation of travel demand (in terms of trips)

between different origin and destination pairs. Each cell of the OD matrix represents

an OD pair and the value refers to OD pair demand or simply OD flows.

Path flows: refer to the portion of OD flows passing through any path.

Path proportion matrix: represents the proportions of OD flows passing through

a path.

Structure of the OD matrix: is defined as the arrangement of and the correlation

that exist between OD pairs within the OD matrix.

Structure of Bluetooth trips: refers to the inherent structural information present

within a group of Bluetooth inferred trips.

Structural correlation: is the correlation that exists between group of OD pairs

or paths when they share similar activities, travel costs, or zones of similar geography,

etc.

Target/Prior OD matrix: refers to the best historical estimate developed from an

outdated travel survey.

Turning proportion: refers to the ratio of the turning volumes to the approach

volumes at an intersection.


Typical OD matrix: refers to an OD matrix that represents a typical travel pattern

observed in the OD matrices belonging to a certain type (cluster).

1.8 THESIS OUTLINE

The chapters for the remainder of the thesis are outlined below:

Chapter 2: provides a comprehensive review of the studies pertaining to OD

matrix estimation, with special attention paid to the problem formulation, solution

algorithms, OD matrix structural information, statistical performance measures, and

the types of measurements widely used in OD estimation problems.

Chapter 3: discusses the development of statistical metrics for structural

comparison of OD matrices. The robustness of the proposed metrics is further tested

through sensitivity analysis.

Chapter 4: discusses the development of an assignment-based OD matrix

estimation methodology using additional structural knowledge of Bluetooth trips.

Chapter 5: discusses the development of non-assignment-based OD matrix

estimation methodology using observed turning proportions and Bluetooth structural

knowledge.

Chapter 6: proposes a methodological approach for clustering B-OD matrices

and identifying typical travel patterns with a case study application using Brisbane

Bluetooth data.

Chapter 7: provides the conclusion for this research, and also includes the future

scope and recommendations.


Chapter 2: Literature Review 28

Chapter 2: Literature Review

This chapter provides a comprehensive review of the literature with respect to

the OD matrix estimation problem. First, the methods of OD matrix estimation are

broadly classified in Section 2.1. A comprehensive review of previous studies is then

provided from five different perspectives: problem formulations (Section 2.2), solution

algorithms (Section 2.3), OD matrix structural information (Section 2.4), statistical

performance measures (Section 2.5), and indirect/partial measurements of OD flows

(Section 2.6). Finally, summary of the chapter is provided in Section 2.7.

2.1 BACKGROUND OF OD MATRIX ESTIMATION

Willumsen (1978) classified the methods of origin-destination matrix estimation

into three major categories: a) survey-based, b) trip-distribution model-based, and c)

traffic counts-based.

a) Survey-based: Here, the OD matrix is estimated through direct household

surveys and/or road side interviews. The survey is generally too expensive and

cumbersome and almost impossible to ‘truly’ capture the entire demand,

especially for larger networks. Therefore, researchers apply different sampling

techniques, such as cluster sampling, geographic, and demographic

stratification, etc., to collect data (generally a travel diary) on a smaller scale

and use grossing-up factors for the estimation of a full OD matrix (Willumsen,

1978). The OD matrix from this method approximates OD patterns for a

particular period, and as such, is often outdated for current practical

applications.

b) Trip distribution model-based: Trip distribution models estimate the trip

interchanges between zones based on land use, transportation characteristics,

and the distance between the zones. Models used in the trip distribution stage

are generally gravity models based on the gravitational theory of Newtonian

physics (Martin & McGuckin, 1998; Wilson, 1967). The generic form of the

gravity model is shown in Equation (4).


(4)

Where, is the OD demand of wth OD pair (oth origin and dth destination);

are the trips produced from oth zone; are trips attracted to dth zone;

is the friction factor (existing/estimated travel times or distances) indicating the

temporal/spatial separation between the two zones; is the trip-distribution

adjustment factor for the interchanges between two zones; and refers to the

number of destinations, respectively. The drawback of these models is that

model calibration is expensive and they are generally not transferable over

space and time.

c) Traffic counts-based: The third method that became quite attractive in the early

80’s is estimating OD matrices from observed traffic counts. Traffic counts

data that were limited to the study of traffic control, accident studies,

maintenance planning, or road construction and intersection improvements

were later extended to OD demand estimation by researchers like Robillard

(1975). Traffic flows are generally used in two ways to estimate the OD matrix:

a) to calibrate the parameters of demand models; and b) in optimisation

formulation, where the OD matrix estimation is solved as an inverse of the

assignment problem (Cascetta, 1984).

Among the above-mentioned methods, the traffic counts-based method has

gained more importance, as it is cost effective and can provide seamless traffic data

that facilitates tracking of traffic evolution in a convenient and efficient way (Cascetta,

1984). Over the last three decades, almost all studies have been based on these indirect

measurements, although the quality of an OD matrix estimated based on this approach

is still being questioned (Stathopoulos & Tsekeris, 2003). Several researchers have

addressed the problem of OD estimation from different perspectives, such as temporal

dimension – static (Yang, 1995) to time-dependent/dynamic (Ashok, 1996), spatial

dimension – intersection (Cremer & Keller, 1981) to large scale networks (Osorio,

2017), traffic assignment – uncongested (Hazelton, 2000) to congested networks

(Frederix, Viti, & Tampère, 2011), optimisation algorithms – deterministic (Cascetta,

1984) to stochastic (Ma & Qian, 2018), and performance measures – simple cell-based


(Cascetta, 1984) to a structure-based (Djukic, et al., 2013). Nevertheless, most

methods are still dependent on traffic counts-based formulations only.

2.2 PROBLEM FORMULATION

The OD matrix adjustment became a relevant research and practical problem

after the research community started to treat it as an optimisation problem in the early

1980’s (Cascetta, 1984; Fisk & Boyce, 1983). The mathematical foundations were

initially laid by considering it a static problem and then extended to dynamic and quasi-

dynamic conditions, as further discussed in detail in the following sub-sections.

2.2.1 Static OD formulation - uncongested networks

Previously, research contributions were limited to uncongested networks

assuming proportional assignment, .Proportional assignment assumes that link cost

(travel time) is not dependent on link flows, and is thus independent of demand. (Bell,

1983; Robillard, 1975) The following sub-sections provide the different mathematical

models proposed for static OD demand estimation in uncongested traffic conditions.

2.2.1.1 Information Minimisation/Entropy Maximisation

In these methods, the OD matrix is estimated by minimising the measure of

distance (or maximising the entropy) from the target/historical trip matrix adhering to

the constraint that the observed flows are reproduced back on some of the links after

assigning the estimated OD demand onto the network (Van Zuylen, 1978; Van Zuylen

& Willumsen, 1980; Willumsen, 1984b).

(5)

Equation (5) minimises the information matrix, IM and the corresponding

solution is shown in Equation (5a):

(5a)

Where, is the proportion of passing through lth link and is the

corresponding weight factor.


2.2.1.2 Maximum Likelihood approach

The nature of OD demands and traffic observations are random and are

subjective to sampling and measurement errors respectively. Therefore, statistical

inference to OD matrix estimates has become relevant, with methods such as the

maximum likelihood (ML) approach being developed (Spiess, 1987).

In the Spiess (1987)’s ML approach, the likelihood of observing the “target” OD

matrix and the observed traffic counts is maximised upon the condition of the

estimated OD matrix. The elements of the target OD matrix are obtained from a simple

random sampling and assumed to follow a multinomial distribution. For larger

samples, the distribution can be Poisson. The traffic observations are also assumed to

be Poisson distributed. The distributions of both the target OD matrix and traffic counts

are assumed to be independent; thus, the likelihood of observing both sets is the same

as the product of the two likelihoods. The likelihood of observing the target OD

demand, and observed traffic flows, is expressed in Equation (6):

(6)

The solution of Equation (6) is that maximises the product of likelihoods.

Since maximising the logarithm of Equation (6) is same as maximising the likelihoods,

it is convenient to use a logarithmic form for estimating the solution. Assuming that

the sample OD flows follow a MNL distribution, and is the sampling

fraction for trips. For example, is obtained by observing an independent Poisson

process with a mean of , the probability of observing is

; (6a)

(6b)

The logarithm of the likelihood/probability can then be expressed by

Equation 6c,

(6c)

Similarly, if the sample of observed traffic counts is very small, then Poisson

distribution can be used to express as shown in Equation 6d,


(6d)

In Equation 6d, is the flow volume in lth link resulting after assigning the

OD flows, .

Assuming a proportional assignment where, is the proportion of Xw flowing

in lth link, the OD demand estimation problem is formulated as shown in Equations 6e,

6f, and 6g.

(6e)

(6f)

(6g)

The value of X is estimated by maximising Equation (6e) with respect to X.

2.2.1.3 “Generalised Least Squares” (GLS) approach

This approach is similar to the ML approach, but assumes that the target OD

matrix and traffic volumes are generated by some sort of probability distribution

functions. By estimating the parameters of these distributions, the OD matrix is finally

estimated. The attraction feature of GLS approach is that it allows the combination of

traffic survey and observed traffic counts to estimate an OD demand matrix in addition

to accounting for the relative accuracies of both data sources. If the traffic counts and

target OD matrix are assumed to follow multivariate normal distributions, then the

GLS estimator coincides with a maximum likelihood estimator. Since the OD matrix

and observed flows are probabilistic in nature, the direct estimate (target OD) of OD

matrix and of flow vector Y can be expressed in matrix notations, as shown in

Equations (7) and 7a:

(7)

(7a)

In Equation (7), the mean values of , and are assumed to be zero, with no

distributional assumptions. It is generally assumed that is same as . The


dispersion matrices of and are V and W, respectively. In Equation (7a), is

the function of and where is the proportional assignment matrix, the cells of

which express the proportions of the OD demand flowing in each of the network

links. The proportional assignment method simplifies the estimation process by

assuming that the is independent of . Thus, the other advantage of a statistical

approach is that it can accommodate sampling variance and also the temporal

fluctuation of the OD demand if it is significant (Cascetta, 1984). According to

Cascetta (1984), even “heavier” approximations of dispersion matrices can produce

results better than those of a maximum entropy estimator.

The vectors and are mutually independent, and inverse of the

corresponding dispersion matrices; that is, V and W, are used as weight factors

and , respectively, as expressed in Equation (2).

The estimate, X is obtained by solving Equation (8).

(8)

(8a)

2.2.1.4 Bayesian approach

Maher (1983) proposed a Bayesian approach to estimate the posterior probability

of X from the prior probability distribution, ( ) and observed traffic flows,

. In this approach, the target OD matrix is considered a prior probability

function of the estimated OD matrix (note that no distribution is assumed for prior OD

flows in MLE). If the prior OD matrix is completely reliable, and then however

remarkable the random observations of traffic counts are, they will not have any effect

on the estimated OD matrix. Observed flows will have some impact only when there

is little confidence in the prior OD matrix. Bayesian techniques can be used when there

are varying degrees of belief for different OD pairs. For example, recent transport

survey data can establish more confidence on prior OD flows of few OD pairs. At an

intersection level, turning flows for few approaches are known to be more accurate;

for example, if turning movements are banned/impossible for certain movements.

Although the statistical properties are similar, the role played by differentiates

Bayesian from ML and GLS techniques (Maher, 1983). In ML and GLS approaches,

is the parameter of the likelihood functions, and in Bayesian technique,


is the random variable with given prior distributions, ( ). The posterior

probability of observing is conditional on the observed traffic counts. The

expressions for posterior probability and optimisation function to estimate from the

feasible solution set ℛ are shown in Equations (9) and (10), respectively.

(9)

(10)

2.2.2 Static OD formulation - congested networks

Although the above-mentioned models laid strong mathematical foundations for

the OD matrix estimation problem, they also suffer from a few limitations. Firstly,

traditional entropy maximisation models never consider the probabilistic nature of OD

flows and link flows. Normality also does not hold well for larger flow values. Thus,

a Bayesian approach might fail if normal distribution is considered for both OD flows

and link flows. Normal distribution is widely considered because it is compatible with

proportional assignment. Poisson distribution can replace normal distribution, but at

the cost of computational effort.

The second and major limitation is that these models are developed assuming

uncongested network conditions. The strong assumption that the route choice

proportions are independently determined outside the OD demand estimation process

is unrealistic, as it implies that there is a great need to account for congestion in OD

flow estimation. The following sections discuss some of the seminal contributions on

congested networks.

2.2.2.1 Nguyen (1976)’s approach

One way to address the issues related to proportional assignment is to assume an

equilibrium assignment in the modelling framework. The first equilibrium-based

approach was proposed by Ngyuen in the late 70’s. According to this method, the

anticipated solution matrix is one where the assigned network can reproduce travel

times close to the ones that correspond to the observed link flows. The relationship

between link flows and travel times is given by a link cost function ( C(Y)) and

it is assumed that it is a known monotone increasing function (Nguyen, 1976, 1977).

The drawback of this approach is that it is not strictly convex, w.r.t and it does not

guarantee a unique solution. The generic form of Ngyuen (1976)’s approach is shown


in Equation (11. The set of equations (11a)-(11c) detail the formulations of Ngyuen

(1976)’s equilibrium model.

(11)

(11a)

; w ϵ W (11b)

; l ϵ L (11c)

(11d)

Where, , , , , are link flows on link l , travel cost on link l,

average cost of travel for wth OD pair, set of paths for wth OD pair, path flow on path

k, and Kronecker Delta function for link l in path k, respectively.

At equilibrium, the path travel cost is given by Equation (11e, where, is

the equilibrium link flow on link l.

(11e)

Equation (11e is convex with variable , but might not be strictly convex w.r.t.

. This implies that it might not lead to a unique solution for . To address this, some

researchers have proposed the generic form shown in Equation (12 to reduce the

solution search space.

(12)

For example, Gur (1980a) (see Equation (13)) and Jornsten and Nguyen (1979)

(see Equation (14)) proposed to solve for closest to the target matrix from the set

of optimal solutions, ℛ.

(13)

(14)


2.2.2.2 Combined Distribution and Assignment

Instead of solving this as two separate equations (i.e., Equations (11) and (12)),

Fisk and Boyce (1983) proposed a combined formulation known as combined

distribution and assignment (CDA). While it is promising, it has the following

drawbacks: first, it assumes that there are no inconsistencies in the observations of

traffic counts; and second, it requires all links in the network to contribute to the

observed link flows. The generic form is shown in Equation (15) and the detailed

formulation for CDA is shown in Equation (15a).

(15)

(15a)

Where, is the cost on the shortest route for wth OD pair, and are the

weight factors for the two objectives. The rest of the terms have meanings as defined

previously.

2.2.2.3 Basic bi-level formulation

The previous approaches solved for X either using two separate formulations or

through combined formulation. However, in bi-level formulation, the equilibrium

assignment and OD matrix are solved as two sub-problems, mutually dependent on

each other. This approach is similar to a Stackelberg condition in game theory, where

leaders are given the first choice to estimate X in accordance with their constraints to

minimise their objective function while considering the reaction of the follower; that

is, user-equilibrium assignment (Yang, Sasaki, Iida, & Asakura, 1992).

Bi-level means that the OD matrix estimation is not a straight-forward

optimisation in a single formulation. A bi-level approach can be used as an efficient

approach to estimate the OD matrix and route choice simultaneously under congested

traffic conditions (Yang, et al., 1992).

The advantages of bi-level formulation are that; first, the model always results

in a feasible solution, but not with a guarantee that it is close to the ground

observations, even if the traffic counts are inconsistent. Second, the model only


requires a subset of link flows. This means that all links in the network need to

contribute for observed link flows. Third, the route choice proportions are determined

endogenously, and equilibrium link flows and OD matrix are determined

simultaneously (Yang, 1995).

The generalised framework of the bi-level problem is shown in Equation (16).

Upper level:

(16)

Lower level: (16a)

Most studies assume a generalised least squares or entropy maximising model in

the upper-level, and equilibrium assignment as a lower-level problem.

In comparison, Nguyen (1976)’s and Fisk and Boyce (1983)’s CDA approaches

neglect the second term (i.e., is zero) of the objective function shown in Equation

(16). In contrast, Spiess (1990) considered to be zero, and assumed equilibrium link

flows.

Despite its advantages, there are a few limitations to the bi-level approach. Bi-

level programming problems are generally difficult to solve because the objective

function in the upper-level can be evaluated only after solving the optimisation

problem in the lower level. This framework is non-convex and non-differentiable, and

as such, may not lead to a global optimum solution. Because the second term is convex,

Spiess (1990) relaxed the first term by assuming that the target matrix is almost

accurate and traffic counts can be used to arrive at an OD matrix estimate as close as

possible to the target matrix. Although it is suitable for large networks, the

methodology limits the solution to a local optimum only.

To overcome the limitations of bi-level formulation, Yang (1995) incorporated

a network equilibrium model in terms of variational inequalities, and claimed that, “the

convexity of the upper level formulation is so strong that it is most likely to converge

at global optimum”. Yang (1995) proposed a GLS formulation for the upper level, as

shown in Equation (17). Note that in GLS, and are and , respectively.

(17)


; s.t. (17a)

Lower level: (17b)

subject to ; w ϵ W (17c)

(17d)

Where, is the user-optimal link flows vector; is the set of feasible link

flows solutions for OD matrix, X; and C(Y) is the vector of network link travel costs.

The meanings of the other terms were provided previously.

2.2.2.4 Stochastic bi-level formulation

Although the equilibrium assignment approach might capture congested traffic

conditions, it still lacks the ability to estimate an OD matrix X that can reproduce the

observed flows due to errors and inconsistencies of the observed link flows. Fisk

(1989) mentioned that no OD matrix X assigned to the network can satisfy the

observed link flows, because most of the models assume that traffic counts are

available from all links, and that inconsistencies are removed by certain pre-processing

techniques. To address this, some researchers have proposed a stochastic approach in

the bi-level formulations. Jörnsten and Wallace (1993) considered traffic flows to be

random variables. Because the user equilibrium-based models assume that the user

perception of travel costs does not vary among the travellers, it is more realistic to

consider randomness through stochastic user equilibrium. For instance, in-between-

driver variability, expressed as (a dispersion parameter to describe road users’

perception of travel costs, while larger values of mean little between-driver

variations in perceived costs), can be considered within the logit models for stochastic

loading (Maher, 1998; Maher, et al., 2001).The upper-level formulation is similar to

Equation (17) and the lower-level stochastic formulation is shown in Equation (18).

Lower level:

(18)

Where, is the satisfaction function arising from stochastic loading

based on link flow . It is calculated as shown in Equation (18a, where, is the path

cost though kth path between wth OD pair (Maher, 1998).


= (18a)

Some researchers proposed using even in the upper level formulation (Lo &

Chan, 2003; Wang et al., 2016), as shown in Equation (19).The lower level formulation

is similar to Equation (18a.

(19)

In Equation (19), is the dispersion matrix of , and the other terms have their

usual meanings.

2.2.2.5 StrUE bi-level formulation

Dixit, Gardner, and Waller (2013) introduced stochasticity into the user-

equilibrium through the concept of strategic equilibrium (StrUE), and stated that the

path travelled by each user, in a given demand scenario, is chosen regardless of the

realized travel demand on a given day. Because user-equilibrium link flows are

dependent on demand and its distribution, insensitivity to demand realisation implies

that the actual link flow observations may not be from the user-equilibrium state. In

other words, it can be considered that the user-equilibrium exists stochastically across

all demand realisations.

In the StrUE-based bi-level framework, the upper level provides the total mean

demand and its variance to the lower level (StrUE model), which in turn provides the

mean and variance of link flows to the upper level. Incorporation of higher order

variables; that is, mean and variance, facilitates the optimisation model to incorporate

the daily variations in the link flows (Wen, Cai, Gardner, Dixit, & Waller, 2014). The

upper level formulation tries to minimise the deviation between: a) the mean of the

observed link flows and the estimated mean of the link flows, and b) the standard

deviation of observed link flows and estimated standard deviation of the link flows, as

described in Equations (20) (Wen, et al., 2014).

Upper level:

(20)

Lower level:

(20a)


s.t. (20b)

Where, and are the expected and standard deviations of total demand, T;

and are the expected and standard deviations of ; that is, observed link

flows on link l,; g(T) is log-normal density function of total demand, T; pl is the

proportion of T on link, l; cl is the travel cost for link, l; is the link cost for link,

l at free flow condition; and are Bureau of Public Roads (BPR) parameters

(Manual, 1964). Note that in Equation (20), total demand T is the variable and not OD

demand, which is later estimated from proportions that are assumed to be known.

2.2.2.6 Single-level formulation

While many efforts, as discussed above, have been made with respect to

randomness, errors, and inconsistencies of observations and user-equilibrium models,

the bi-level framework still only depends on traffic counts-based methods.

Considering the limitations of bi-level methods and the under-determinacy problem of

traffic count-based formulations, the search for better methods has always existed.

With the availability of additional data from emerging data sources such as Bluetooth

trajectories, etc., the ability to transcend from purely traffic counts-based methods has

recently taken place.

Michau, et al. (2016) proposed a methodological framework to relax the need

for assignment formulation, which means that there is no need for a bi-level

framework. He developed a link dependent OD matrix (Link-OD) method that directly

includes assignment information through the observations of Bluetooth inferred

trajectories. The observed trajectories are represented in terms of a Link-OD matrix.

The CDA method proposed by Fisk and Boyce (1983) is also a single-level

formulation. However, the only difference between the CDA and Link-OD methods is

that the latter implicitly includes assignment through observed path flows, while the

former estimates it though optimisation. The single-level formulation proposed by

Michau, et al. (2016) is expressed as shown in Equation (21).

(21)


(21a)

(21b)

Where represents the Link OD matrix, is the portion of Bluetooth OD

flows between oth origin and dth destination flowing on link, l; and is the ratio of

Bluetooth counts to observed traffic counts on link, l. Objectives F1, F2, and F3

explicitly deal with the other conditions; that is, consistency constraint, Kirchhoff’s

law, and total variation, respectively. The consistency constraint ensures that the total

flow should be greater than the Bluetooth flow. Kirchhoff’s law (from physics)

conserves the flows at an intersection, and the total variation function lays down the

constraint that two paths with close origins and the same destinations should be the

same. The estimated Link-OD is then converted to the OD matrix through the

formulation of Equation (22).

(22)

Where, E1 is the incidence matrix; that is, a network-based information

connecting nodes to the links ((Michau, et al., 2016) for further details). Note that E1

and Q are multiplied through a Hadamard product.

However, the major limitation of this method is that the penetration rate of

Bluetooth trips is assumed to be equal to the penetration rate of Bluetooth link counts.

This is not true, because the penetration rate of Bluetooth trips is generally unknown

due to unavailability of the ground truth.

2.2.3 Dynamic OD formulation

The methods described in the previous sub-sections have one thing in common

– they are all based on static formulations and cannot capture the dynamics of traffic

flows, such as hourly demand variation, etc. The dynamic expressions of OD demand

are more appropriate compared to static OD demand for real time applications.

However, during the same period that witnessed great surge in static model

formulations there was also growing interest towards capturing traffic dynamics from

time-varying measurements. Several researchers intended to use time-dependent


variables for estimating OD flows as another means to tackle the under-determinacy

problem (Cremer & Keller, 1981; Cremer & Keller, 1987).

The pioneering work in defining the dynamic relationships between time

dependent OD flows and traffic flows can be attributed to Cascetta, Inaudi, and

Marquis (1993), who extended the concept of GLS estimator from static to dynamic

conditions and proposed two methods – simultaneous and sequential. The

“simultaneous” method considers time-dependent link flows from all time intervals in

a single set and estimates time-dependent OD flows in one step. On the other hand, the

“sequential” method estimates the OD matrix for each time interval based on link

counts and OD flows from the previous intervals. Although the “sequential” method

provided the foundation for up-coming dynamic models, it lacks predictive

capabilities. Thus, it is not suitable for real-time applications, such as predicting OD

flows for future time-steps.

Early works on dynamic OD estimation were based on the state space modelling

framework, especially Kalman filter (KF) algorithms (Ashok, 1996). Although the

Kalman filter algorithm first appeared in the transportation field in the early 70’s, it

was limited to the estimation of traffic densities (Gazis & Knapp, 1971). Okutani and

Stephanedes (1984) extended its formulation by considering an auto-regressive

process. However, the method was not appropriate because it considered OD flows as

state variables and the Kalman filter algorithm assumes normal distribution for state

variables. Because OD flows cannot be considered normally distributed, they cannot

form state variables. Most research works (Cremer & Keller, 1981) during that period

were also constrained to small-scale (closed) networks; that is, intersections, and the

dynamic relationship between the state variables and observed traffic measurements is

not complicated because there is no role of travel time in assignment.

However, the decade between 1990-2000 witnessed several research works

related to the dynamic OD demand estimation for open networks (Ashok, 1996; Bell,

1991; Chang & Wu, 1994; Hai, Akiyama, & Sasaki, 1998; Hu, 1996; Kang, 1999; Van

Der Zijpp, 1997). Among them, the concept of deviations of OD flows proposed by

Ashok (1996) has been cited by many other researchers. Ashok (1996) proposed the

use of deviations of OD flows from historical estimates as state variables, and

conducted experiments on open networks (linear). Ashok (1996) method is an


extension of Okutani and Stephanedes (1984), but limited to auto-regressive

formulation.

Most of the previous works assumed the complete availability of information

related to input and output flows, which might not be always true. However, the

beginning of the 21st century witnessed several advancements in information,

communication, and technology that seem to provide additional traffic data. Observed

time-dependent traffic data, such as sample OD demand (AVI), turning ratios, travel

times, and trajectories of probes have begun to find their space into the measurement

equations of state space models (Antoniou, Ben-Akiva, & Koutsopoulos, 2006;

Asakura, Hato, & Kashiwadani, 2000; Barceló Bugeda, Montero Mercadé, Marqués,

& Carmona, 2010; Dixon, 2000; Dixon & Rilett, 2002; Kwon & Varaiya, 2005;

Mishalani, Coifman, & Gopalakrishna, 2002).

Many improvements have been made with respect to KF-based methods. Zhou

and Mahmassani (2006) addressed the limitation of the KF method with respect to

auto-regression assumption (as previously used by Ashok (1996)) that considers OD

flows to be stationary. This assumption is not true, because most of the time, prevailing

demand patterns are different from that of regular patterns.

Most efforts have not focussed on extending the bi-level framework to dynamic

OD space due to lack of analytical dynamic traffic assignment models (Ashok & Ben-

Akiva, 2000). KF-based techniques are predominantly based on a fixed assignment

matrix; however, the possibility of obtaining equilibrium assignment from traffic

simulation models has encouraged some researchers to extend the bi-level framework

to dynamic OD space (Cipriani, Florian, Mahut, & Nigro, 2010, 2011; Lu, Rao, Wu,

Guo, & Xia, 2015; Tavana, 2001; Zhou & Mahmassani, 2007; Zhu, 2007).

Until the year 2010, most studies were limited to linear networks only because

it relaxed the additional complexity of route-choice dimension in the dynamic

assignment formulation and the associated computational burden. However, some

researchers proposed alternative methods that are computationally efficient for larger

networks. To exploit the sparsity of dynamic assignment matrix (Bierlaire & Crittin

2004) developed LSQR algorithm as an alternative to the KF method for large-scale

problems. Verbas, Mahmassani, and Zhang (2011) tried to solve the non-linear

problem in the upper level bi-level formulation through robust optimisation software-

KNITRO. Barceló, Montero, Bullejos, Linares, and Serch (2013) suggested a subset


of the most likely used paths obtained as a result of dynamic user equilibrium to relax

the complexity of dynamic assignment. Their work demonstrated further improvement

in computational efficiency because the Bluetooth-based travel time facilitated a linear

Kalman filter instead of a non-linear Kalman filter. Djukic (2014) proposed the

principal component analysis method to reduce the dimensionality problem of

dynamic OD estimation. Frederix, Viti, and Tampère (2013) tried a different approach

for solving large-scale network problems, and proposed a hierarchy-based approach

where the OD demand estimation is performed on each level separately. The outputs

of higher-level estimation are used as inputs for OD demand estimation at the lower

level.

2.2.4 Quasi-Dynamic formulation

There has been a growing interest in estimating OD matrices using quasi-

dynamic conditions. Considering the dimensionality of dynamic OD matrix estimation

problem, Cascetta, Papola, Marzano, Simonelli, and Vitiello (2013) proposed the

quasi-dynamic approach to minimise the imbalance between knowns (link flows and

OD flows mapping equations) and unknowns (OD flows) by using quasi-dynamic-

based generalised least squares to estimate time-dependent OD matrices. A quasi-

dynamic condition refers to a state that lies in-between static and dynamic conditions

of traffic flows. Traffic dynamics are known to change within-the-day and day-to-day

due to different travel-activity patterns. However, the quasi-dynamic assumption states

that for a larger reference period (say whole day), the distribution shares of OD flows

remain constant even though the number of origin flows change, as shown in Equation

(23):

(23)

In Equation (23), the OD flow between oth origin and dth destination during time-

slice, t is given by ; the trips generated from oth origin during t is ; and the

proportion of trips generated from oth origin to dth destination during t is given by

. The quasi-dynamic assumption states that the factor affecting changes

inherently during the larger time-period (say within a day); however, the factors

affecting are relatively constant.

Experiments by Cascetta, et al. (2013) showed that the quasi-dynamic

assumption yielded better results compared to the simultaneous estimator of dynamic


OD estimation. Dynamic OD matrix estimation/prediction algorithms, such as a

Kalman filter, performed better when the prior time-dependent OD matrices were

estimated using quasi-dynamic approach instead of simultaneous estimators.

Aggregating time-dependent OD matrices estimated based on quasi-dynamic approach

seemed to provide better estimates of OD matrices for larger time periods (say for a

peak-period or even daily OD matrix).

While the quasi-dynamic assumption worked well for the off-line OD matrix

estimation, some efforts have also been made to introduce this concept into online

estimation/prediction algorithms such as a Kalman filter. Marzano, Papola, Simonelli,

and Papageorgiou (2018) proposed a quasi-dynamic augmented extended Kalman

filter, with the results showing better improvement compared to both a simultaneous

estimator, as well as the quasi-dynamic-based generalised least squares technique.

Bauer et al. (2018) also extended the quasi-dynamic assumption to traffic

assignment, by assuming that the proportion of path flows generated from an origin on

any time-period of the day remains constant on days of specific category. Equation

(24) demonstrates this assumption at path-flows level.

(24)

Here, has the same meaning as state previously; represents the

proportion of passing through path kth to reach dth destination ; and represents

the path flows between oth origin and dth destination through kth path.

2.3 THE SOLUTION ALGORITHMS

Many solution algorithms have been proposed to solve the OD matrix estimation

problem (Antoniou, et al., 2016). It is hard to justify which algorithm is better than

another due to the unavailability of ground truth. Prior to direct implementation of any

practical applications the database of the OD matrices needs to be developed through

off-line estimation techniques. The solution algorithms used for off-line OD matrix

estimation can be broadly categorised into four types: fixed-point approaches (Cascetta

& Postorino, 2001), gradient-based techniques (Spiess, 1990), stochastic-optimisation

methods (finite different stochastic approximations, see (Spall, 1992)); and

evolutionary-algorithms (genetic algorithms, see (Kim, et al., 2001)).


Among the above-mentioned methods, gradient-based techniques are quite

popular due to their computational efficiency for large-scale networks. There have

been many improvements suggested in gradient-descent frameworks, such as iterative

estimation-assignment (Yang, et al., 1992), constrained descent method (Florian &

Chen, 1995), mini-batch gradient descent (Li, Zhang, Chen, & Smola, 2014), extended

gradient method (Shafiei, Nazemi, & Seyedabrishami, 2015), projected gradient

method (Lundgren & Peterson, 2008b), and the stochastic gradient method (Masip,

Djukic, Breen, & Casas, 2018).

Gradient-based techniques fundamentally assume that the assignment is locally

constant. However due to a non-linear relationship between the assignment matrix and

OD matrix, it is not possible to directly compute the gradient of link flows deviation.

Several heuristics have been proposed to approximate the gradient, such as the

sensitivity analysis based method (Yang, 1995), simultaneous perturbation stochastic

approximation (SPSA) path search methods (Cipriani, et al., 2011), adaptive SPSA

(Cantelmo, Cipriani, Gemma, & Nigro, 2014), weighted SPSA (W-SPSA) (Lu, Xu,

Antoniou, & Ben-Akiva, 2015), and cluster-wise SPSA (c-SPSA) (Tympakianaki,

Koutsopoulos, & Jenelius, 2018). However, the limitations of the SPSA-based method

is that it is based on several algorithmic parameters and due to this, larger the size of

the network, higher is the computational time (Bullejos, Barceló Bugeda, & Montero

Mercadé, 2014); it is sensitive to the selection of initial parameter values and

adjustment of these values is a cumbersome task (Tympakianaki, et al., 2018); it can

be trapped in the local minima; and the approximated gradient values can be very noisy

leading to the convergence and stability issues (Tympakianaki, et al., 2018).

Addressing the challenges of SPSA based techniques, recently metamodels have been

proposed that approximate the assignment during every iteration and minimise the

objective function through gradient computations (Osorio, 2019).

2.4 OD MATRIX STRUCTURAL INFORMATION

Because traffic counts-based OD estimation is an under-determined problem,

most studies have emphasised methods to bind the “structure” of the OD matrix during

the OD estimation process. The structural knowledge related to OD flows has either

been used as constraints or through deviations from target OD matrix in the objective

function. For instance, Van Zuylen (1978) used the Brillouin information measure to

formulate the total OD information contained in the observed link flows and estimated


the OD matrix by minimising this information. To reduce the under-determinacy

problem, Gur (1980a) proposed using a target trip matrix to provide additional

information on the structure of OD matrix. Willumsen (1984a) proposed a method to

determine whether the structure of an estimated OD matrix is close to that of true OD.

Based on the ratios of link flows, he introduced scale factor ( ) as a proxy to assess

the structure of estimated OD matrix. The formulation is shown in Equation (25).

(25)

In the equation (25), and are the estimated and observed link flows on lth

link on a network with L number of total links. Furthermore, Yang et al. (1992) used

a correlation coefficient, between the observed and estimated link flows in

addition to the scale factor ( ) to explain the structural degradation of OD flows. For

example, X and are the estimated and target OD matrices respectively. Four different

cases are possible from the combinations of and as follows:

Case-1: If and , then both X and are

structurally similar to each other.

Case-2: If and is small, then X and are

structurally different, with random variations.

Case-3: If or and , then X and

have the same structure; however, the total demand in X is greater or

lower compared to .

Case-4: If and is small, then X and are

structurally different at a larger random scale.

However, the limitation of this approach is that the statistical indicators,

and are comparing link flows to interpret the structural variation in OD

matrices, which is generally not true due to one to many relationships between link

flows and OD flows.

Some other efforts have been used to account for the additional structural

information. Bierlaire and Toint (1995) proposed matrix estimation using the structure

explicitly method to exploit the structural information from parking surveys to


improve the structure of estimated OD matrices. Kim et al. (2001) stated that if the

structure of the target OD matrix is different from the structure of true OD matrix, then

the bi-level solution might meet a perfect Stackelberg condition. They proposed the

OD matrix structure as the ratio of OD flows to origin flows, and used it as a constraint

to preserve the structure of OD demand during OD matrix optimisation. Stathopoulos

and Tsekeris (2005) emphasised the necessity to incorporate information degradation

of demand patterns in the general OD estimation framework to account for short-term

and long-term traffic dynamics. Djukic, et al. (2013) emphasised the structural

correlations that exist between the OD pairs and the significance of accounting for

them when comparing OD matrices.

The importance of OD matrix structural information has been widely

acknowledged in the dynamic OD estimation/prediction problem. To incorporate the

structural information (spatial and temporal trip making patterns) of OD demand,

Ashok (1996) formulated state space model in terms of deviations of OD flows from

historical estimates. The structural deviations of OD flows preserve the structural

integrity and capture the uncertainties that can occur due to conditions, such as severe

weather conditions, special events, and travellers’ reactions to information

management measures, etc. Following Ashok (1996) work, most dynamic OD

formulations are based only on the deviations of OD flows. Djukic et al. (2013)

discussed the importance of considering the structural differences between the OD

matrices within a dynamic OD matrix estimation process and as a performance

measure to benchmark various dynamic OD estimation methods.

2.5 STATISTICAL PERFORMANCE MEASURES

The role of statistical performance measures is very significant in OD matrix

estimation. The quality of estimated OD matrices and performance of

estimation/prediction methods is assessed based on the results of statistical

performance measures (Ciuffo & Punzo, 2010; Hollander & Liu, 2008). Some notable

statistical metrics are:

root mean square error (RMSE) is the most common indicator used by

researchers (Ashok & Ben-Akiva, 2002; Barceló Bugeda, et al., 2010; Tamin

& Willumsen, 1989);


normalised root mean square error (RMSN) (Antoniou, Ben-Akiva, &

Koutsopoulos, 2004);

mean square error (MSE) (Cascetta, 1984);

mean absolute error percent (MAE%) (Nanda, 1997)

mean absolute error ratio (MAER) (Kim, Kim, & Rilett, 2005)

mean absolute percent error (MAPE) (Cools, Moons, & Wets, 2010);

global Theil measure (GU) (Barceló, Montero, Bullejos, Linares, et al., 2013);

maximum possible absolute error (MPAE) (Yang, Iida, & Sasaki, 1991);

relative error (RE) (Gan, Yang, & Wong, 2005);

total demand deviation (TDD) (Bera & Rao, 2011);

correlation coefficient ( ) (Yang, et al., 1992); and

R-squared (R2) (Tavassoli, Alsger, Hickman, & Mesbah, 2016a).

For a thorough review on the statistical measures that are widely used in

transport applications see (Ciuffo & Punzo, 2010; Hollander & Liu, 2008).

The formulations of some of the metrics are described in the following equations.

Note that the comparison is made with the target OD matrix, .

(26)

(27)

(28)

(29)

(30)


(31)

The GEH (named after Geoffrey E. Havers) statistic is mostly preferred by

transport practitioners (Lu, Rao, et al., 2015). If 85% of the flow values have a GEH

of less than 5, the model is considered to perform well. Equation (32) represents the

GEH.

(32)

An expression similar to Equation (32) can be used for OD matrices comparison,


(33)

The percentage of OD pairs that have a GEH equal to or less than 5 is computed

to indicate the level of proximity between the two OD matrices.

Some authors have used goodness of Theil’s fit (GU) to compare the target OD

flows with estimated flows (Barceló, Montero, Bullejos, Serch, et al., 2013). Theil’s

inequality is popularly used when comparing two time-series. It lies between 0 and 1.

The value of ‘0’ means there is a perfect fit between two-time series and the value of

“1” implies there is a discrepancy. Equation (34) shows the formulation of GU for

comparing OD matrices.

(34)

There has been growing attention paid to the development of statistical

performance measures that can account for intrinsic details of OD matrix estimation

and the “structure” of OD matrices. For instance, (Bierlaire, 2002) proposed a total

demand scale to measure the intrinsic under-determinacy of the OD matrix estimation

problem that arises due to uncertainty in the network topology and assignment. Ruiz


de Villa, Casas, and Breen (2014) extended the Wasserstein metric (popularly used in

mass-transportation problems) to compare the structural differences between OD

matrices by accounting for the network topology. Djukic, et al. (2013) extended the

Structural Similarity index (SSIM), popularly used in the structural comparison of

images to compare OD matrices.

The SSIM is still theoretical in nature and has a few limitations with respect to

OD matrices comparison, which are further discussed in detail in Section 3.2. In

regards to the Wasserstein metric, Ruiz de Villa et al. (2014) mentioned that “One of

the main drawbacks of this method is in computing the Wasserstein distance on large

networks”. Thus, this method is generally considered impractical for large networks.

Another study that focussed on the similarity of OD matrices is the eigenvalue-

based measure (EBM) by Tavassoli, Alsger, Hickman, and Mesbah (2016b). Here, the

similarity of OD matrices is analysed by comparing their corresponding vectors of

eigenvalues. The lower the distance value, the greater the similarity.

(35)

Where, eig (X) and eig ( ) are the vectors containing the eigenvalues of square

matrices, X and , respectively.

The entropy measure (E) measures the similarity between OD matrices (Ros-

Roca, Montero, Schneck, & Barceló, 2018). The formulation for the entropy measure

is shown in Equation (36).

(36)

2.6 INDIRECT/PARTIAL MEASUREMENTS OF OD FLOWS

Limited traffic observations from single data source generally lead to non-unique

solutions and large estimation errors. Thus, there is a great need for effective modelling

approaches that include provisions for measurements from alternative data sources

(Zhou, 2004). The different types of sensor data that are widely used in OD matrix

estimation problems are discussed below (see Figure 2.1) for a diagrammatic

representation of the few sensors that can aid in estimating flows; for example, for OD

pairs with O1, O2 as origins, and D1, D2 as destinations, respectively).


Figure 2.1: Pictorial representation of some of the widely-used sensor types in OD estimation problem

2.6.1 Point sensors

Point sensors are the most used detectors in transport models. These sensors

include inductive loop detectors generating inductive signatures, laser-based detection

systems (for vehicle-length), and video-based vehicle signatures. These technologies

assist with indirectly identifying anonymous vehicles from their physical features.

However, they do not directly assist with OD demand estimation, because they do not

provide any traffic information beyond detection points and are generally limited to

shorter road segments (Kwon & Varaiya, 2005; Mishalani, et al., 2002).

2.6.2 Point to point sensors (AVI data)

Automatic vehicle identification (AVI) data is widely used in traffic control and

management. The sensors that provide AVI data can detect vehicles at multiple

locations in a network. These are generally license plate/tag-based, mobile phone–

based, global positioning systems (GPS), Bluetooth, and Wifi scanners. Vehicle

detections are used in extracting the turning counts (Alibabai & Mahmassani, 2008),

travel times (Barceló, Montero, Bullejos, Linares, et al., 2013), vehicle trajectories

(Michau, Nantes, Chung, Abry, & Borgnat, 2014), and sometimes partial observations

of OD flows (Antoniou, et al., 2006; Dixon & Rilett, 2002). Further details about the

data from some of these sensors are discussed below.

2.6.2.1 License plate/tag reader–based AVI system

These systems generally consist of two CCD cameras located at a distance of 5-

10 kilometres. The AVI camera/e-rag reader that is fixed above the lane captures a still

picture of the license plate/electronically reads the tag of the passing vehicle (Kwon &

Varaiya, 2005). The travel time of the vehicle is generally calculated by detecting the

CellBMS Scanningradius

Toll road

Arterial road


same vehicle (license plate) between two consecutive AVI cameras (Asakura, et al.,

2000). AVI data in combination with traffic counts has been previously used for

dynamic origin-destination matrix estimation (Van Der Zijpp, 1997). The partial

observed trajectories from AVI data is another source of information that can be used

in OD estimation (Kwon & Varaiya, 2005). Zhou and Mahmassani (2006) developed

exploited AVI data without the need to know the market penetration rates.

2.6.2.2 Mobile data

Some recent researchers (Calabrese, Di Lorenzo, Liu, & Ratti, 2011) have

proposed ways to determine information from mobile phone location data as a source

of data for OD demand estimation. Mobile phone operators have access to the locations

of mobile phone devices, and they primarily use this information for management and

billing purposes. The basic geographic unit for mobile phone-based data is referred to

as a “cell”. Mobile phone-based trips are generally expressed as a sequence of cell-

IDs. As the sample of “active” mobile phones is less than the number of idle mobile

phones, another geographical unit – location area (collection of cells) is generally used.

The data associated with location area is called the location area update. The most

popular type of data available from mobile phone-based datasets is call detail records.

The main attributes of call detail records data sets are: the details of the connection

event (call, message or internet), time-stamps of start and end of the event, duration of

the event, the location of the connected tower, and the cell Id. Mobile phone-based

observations have been used in different areas of transport analysis, such as the

division of traffic analysis zones (Dong et al., 2015), identifying trip end locations

(Ahas, Silm, Järv, Saluveer, & Tiru, 2010), OD matrix estimation (Alexander, Jiang,

Murga, & González, 2015), and human mobility patterns (Jiang, Ferreira, & González,

2017).

With respect to OD matrix estimation, mobile-phone based observations have a

few challenges that need to be addressed. First, obtaining mobile phone data is very

sensitive and expensive. Second, the details of mobile phone trajectories seem to be

very coarse (Perera, Bhattacharya, Kulik, & Bailey, 2015). Third, they might not be

able to capture the actual origins and destinations of users (Iqbal, Choudhury, Wang,

& González, 2014).


2.6.2.3 Bluetooth data

Data collection in the field of transportation has become much easier with the

advent of Bluetooth scanners. The quality of the data is as good as licence plate

recognition and video captured data (Blogg, Semler, Hingorani, & Troutbeck, 2010).

Bluetooth data has a higher penetration rate compared to other technologies, such as

GPS, etc. (Gabriel, 2016).

Bluetooth is a key technology for in-car communication and infotainment

systems and has been identified as a complimentary data source for transport

applications, such as travel time/speed estimation (Bhaskar, Qu, & Chung, 2015;

Khoei, Bhaskar, & Chung, 2013; Respati, Bhaskar, Zheng, & Chung, 2017), pedestrian

mobility patterns (Abedi, Bhaskar, & Chung, 2013, 2014; Abedi, Bhaskar, Chung, &

Miska, 2015), trajectories identification (Michau, Nantes, et al., 2017), and OD

demand estimation (Michau, et al., 2016). The validity of Bluetooth OD data has been

confirmed in the past by using data from other sources, such as video and automatic

license plate recognition OD data (Blogg, Semler, Hingorani, & Troutbeck, 2010) and

vehicle tracking using time lapse aerial photography (TLAP) (Chitturi, et al., 2014).

Interested readers can refer to Bhaskar and Chung (2013) for a fundamental

understanding of Bluetooth MAC scanner (BMS) data as complementary transport

data.

2.7 SUMMARY OF LITERATURE REVIEW

In summary, this comprehensive review of the literature identified the following

major research gaps:

1. Most studies have focused on developing formulations and solution

algorithms for improving the quality of OD matrix estimates.

Specifically, these studies adopted bi-level framework for OD

estimation. The focus has also shifted from static to dynamic, and

recently, to quasi-dynamic formulations. However, there has been less

focus on exploiting the higher-dimensions of OD flows; that is, the

structural information of OD matrices that cannot be neglected in either

the OD matrix estimation process or in the formulation of statistical

performance measures.


2. Most studies are entirely dependent on traffic count-based observations

because loop detectors are the dominant source of traffic data. Although

advancements in technology seem to provide additional data sources,

their integration and contribution into the existing transport models

seems to be still challenging.

By addressing these gaps, this study aims to develop statistical methods to

exploit the structural information of OD matrices for the comparison of OD matrices

and develop methods to incorporate the structural knowledge of Bluetooth trips into

the OD matrix estimation process in the forth-coming chapters.

Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 57

Chapter 3: Development of Statistical

Metrics for the Structural

Comparison of OD Matrices

This chapter begins with a background (Section 3.1); introduces and discusses

the limitations of SSIM (Section 3.2); develops GSSI (Section 3.3); introduces

traditional Levenshtein distance, extends its formulation for the comparison of OD

matrices (NLOD), and compares it with Wasserstein metric (Section 3.4); performs a

sensitivity analysis for the proposed GSSI and NLOD (Section 3.5); and finally

provides summary of the chapter in Section 3.6.

3.1 BACKGROUND

Mathematical formulations of some of the widely used traditional metrics for

comparison of OD matrices were previously discussed in Section 2.5. These metrics

compare the individual cells of OD matrices and compute a single statistic value by

aggregating/averaging the deviation over individual cells. However, they lack the

ability to capture structural information about the matrices. To demonstrate this,

consider an example of comparing OD matrices M1 and M2 with a reference OD

matrix MR (Figure 3.1). Here, M1 is simply 1.1 times MR, and M2 is chosen randomly.

The results of comparing matrices M1 and M2 with MR using traditional metrics (MSE,

RMSE, GU, and MAE) are presented in Table 3.1. The first column of Table 3.1

presents the metrics, and the second and third columns are the values from metrics for

both cases, respectively. When compared with the same reference matrix, visual

representation illustrates that the demand distributions (or structure) of M1 are closer

than that of M2. This is obvious, because matrix M1 is a scaled version (1.1 times) of

the reference matrix. In this example, it is demonstrated that traditional metrics yield

the same results for both cases (Table 3.1) and fail to capture the structural differences

between OD matrices. The importance of structural comparison therefore demands the

need for new metrics in addition to existing traditional ones. Addressing this need, the


Structural Similarity index (SSIM) is applied in the literature, the details for which are

presented in the following section.

Figure 3.1: Comparison of MR with OD matrices M1 and M2

Table 3.1: Comparison results using the traditional metrics

Traditional

Metrics

Comparison of

(M1, MR)

Comparison of

(M2, MR)

MSE 17370 17370

RMSE 131.8 131.8

GU 0.05 0.05

MAE 0.10 0.10

3.2 STRUCTURAL SIMILARITY (SSIM) INDEX

The SSIM is borrowed from the field of image processing. Wang et al. (2004)

discussed the limitations of traditional metrics to capture structural differences in

images. They proposed the SSIM index as a quantitative measure to compare the

quality of two natural images and observed that statistical measures such as MSE may

fail to measure the structural degradation of one image with respect to another. As

shown in Figure 3.2a, two images estimated from two different algorithms, namely

gradient ascent and gradient descent, can each have the same MSE of 2,500 but

different SSIM values of 0.9337 and -0.5411, respectively.

Djukic, et al. (2013) applied the SSIM rationale in the context of an OD matrix

and demonstrated that two OD matrices can have same MSE value but different SSIM


values. For instance, Figure 3.2b shows an MSE of 69 each, while the SSIM values

are 0.8724 and 0.9702.

Figure 3.2: (a) Comparison of Images (source Wang et al., 2004) vs (b) comparison of OD matrices (source Djukic et al., 2013)

The formulation for local SSIM, as provided by Djukic, et al. (2013), is based

on the product of three individual formulations (Equations (37, 37a and 37b) related

to the mean, standard deviations, and coefficient correlations between the groups of

OD pairs.

(37)

(37a)

(37b)

;

> 0; (37c)

Assuming and C3=C2/2

; [-1<=SSIM<=1]; (37d)


; [-1<=MSSIM<=1] (37e)

Where,

and represent the two OD matrices to be compared; while and

represent the group of OD pairs within th local window in both matrices. The concept

of local windows is further explained in Section 3.2.1.

compares the mean values ( ) of the group of OD pairs in

both matrices;

compares the standard deviations ( of the group of OD pairs

in both matrices;

compares the structure by computing correlation between the

normalised group of OD pairs in both matrices. Normalised and with unit

standard deviation and zero mean are equal to and , respectively;

are constants to stabilise the result when either the mean or

standard deviation is close to zero. is generally assumed to be . Previous

studies have suggested values of and for and , respectively (Pollard,

Taylor, van Vuren, & MacDonald, 2013). For the analysis conducted in this research,

the OD values in the SSIM window were not all zero; hence, both and were

assumed to be zero.

The parameters are used to adjust relative importance of mean,

standard deviation and structural components, respectively. Generally, they are

assumed to be equal to 1.

is the structural similarity of the local windows from both

matrices.

is the overall similarity of OD matrices, X and , computed by

taking the average of the SSIM values of number of local windows.

The range of values for SSIM or MSSIM can be between -1 and 1. The value of

1 implies that matrices are the same, while the reverse is true when value is -1.


3.2.1 Local sliding window

The local window is generally a square box of size far less than that of OD

matrix. It is often referred as “local sliding window” because the traditional SSIM

computes statistics on the local window (consisting group of pixels or OD pairs) that

slides pixel by pixel or cell by cell over the entire image or OD matrix. The concept of

sliding was originally used for the comparison of images where it would allow SSIM

to compute local statistical characteristics so that local image distortions were better

accounted for (Brooks, Zhao, & Pappas, 2008). For ease of explanation, consider the

example presented in Figure 3.3. Here, two 4 × 4 OD matrices, X and Y, are presented

in columns one and two, respectively. These two OD matrices are compared using

SSIM. The local sliding window of 2 × 2 sub-matrix is considered and represented as

coloured cells. This window slides cell by cell over the entire OD matrix, and in the

current example, results in 9 matrix comparison pairs, as illustrated in Figure 3.3 5.3.

The local SSIM computes the structural similarity between the sub-matrices

corresponding to the windows from both OD matrices. The final SSIM value,

represented as mean SSIM (MSSIM), is computed by averaging all local SSIM values

computed for all sliding windows. In the example, the SSIM value for local window

in Figure 3.3a is 0.5963 and MSSIM over all local 9 SSIMs is 0.6777.

Figure 3.3: An example of sliding window for SSIM calculation.


The differences between the structural comparison in images and OD matrices

include:

In images, the nearby pixels are correlated with respect to the contrast

and other features. However, in an OD matrix, the correlations between

the OD pairs depend on many factors. Generally, OD pairs sharing

similar activities, trip attractions, trip productions, distances, travel cost

or similar geographical locations, etc., are correlated. According to

Djukic (2014) correlations between OD pairs are reflected in their

demand volumes (especially if volumes are high) and by matrix

reordering, correlated OD pairs can lie in the same neighbourhood; that

is, all high volume OD pairs on one side and remaining on the other side.

Djukic (2014) proposed to re-order the OD matrix (i.e., sorting each row

of OD matrix in the order of OD pair volumes). However, if the

arrangement of zonal IDs in both matrices are different upon re-ordering,

then reordering is avoided.

The cell of an OD matrix is equivalent to the pixel of an image. However,

the pixels values range between 0 and 255 for greyscale images, but the

range of OD flows is large, and it depends on many factors such as

activities, distance etc.

Although the formulation of SSIM seems to be holistic, its existing application

still has the following shortcomings.

Firstly, SSIM is sensitive to the size of the local window, and as such, there is

no clear consensus on the final MSSIM value. To circumvent this ambiguity, Djukic

(2014) suggested computing the SSIM over the entire OD matrix without using any

local window. However, doing so will result in a statistical estimation that is less

sensitive to structural changes within the OD matrix. According to law of large

numbers, the variance of the sample tends to decrease if the sample size increases.

Since larger window dimensions imply a greater number of OD pairs to be compared,

the variance (distortion) and covariance (correlation distortion) parameters that capture

structural changes within and between OD matrices should be reduced. In other words,

SSIM is less sensitive to correlation distortions when the covariance is captured for

larger window sizes.


To demonstrate the sensitivity of SSIM towards window size, consider a mean

SSIM (MSSIM) value computed using different window sizes (3×3 to 20×20) for

Monday and Sunday, and Monday and Tuesday OD matrices pairs constructed from

the BCC data. Figure 5.4 presents the results, where the blue line is for Monday and

Sunday and the orange line is for the Monday and Tuesday comparison. The x-axis

represents the size of the local window and y-axis shows the MSSIM value. The order

of OD pairs is the same in the matrices for Sunday, Monday, and Tuesday. As the size

of sliding window increases, the sensitivity of SSIM towards subtle differences within

the OD matrix decreases. The MSSIM values increase as the sliding window size

increases. Similar results were observed by Brooks et al. (2008) when comparing

images using different window sizes. The rate of increment of MSSIM values was less

for the Monday and Tuesday pair compared to the Monday and Sunday pair. This is

due to similar travel patterns between Monday and Tuesday (both working days) and

less similar patterns between the Monday and Sunday pair. There is no clear consensus

reported in the literature regarding the level of acceptability of the sliding window size

and the resulting SSIM values.

Figure 3.4: Sensitivity of MSSIM towards local window size

Second, the local SSIM value computed on a group of OD pairs does not have

any physical meaning or significance attached to it unless they are correlated. The

group of OD pairs sharing similar structural properties or travel patterns are generally

correlated. Djukic (2014) tried to capture these correlations among the OD pairs from

their flow values (especially if volumes are high) by matrix reordering (i.e. sorting

each row of OD matrix in the order of OD pair volumes). However, the structural

properties of OD matrix include many other underlying factors such as the distribution

(3X3) (6X6) (8X8) (15X15) (20X20)Mon and Sun 0.7337 0.7892 0.807 0.8164 0.8292Mon and Tue 0.9939 0.9975 0.9985 0.9986 0.9986

0.7

0.75

0.8

0.85

0.9

0.95

1

MSS

IM va

lues

Size of the sliding window


of trips, geographical integrity, network topology etc., if accounted, could capture

better OD structural information.

To this end, this study develops mean geographical window-based SSIM (GSSI)

as an extension to Djukic (2014)’s SSIM approach. It is further discussed in the

following section.

3.3 MEAN GEOGRAPHICAL WINDOW-BASED SSIM (GSSI)

The application of the SSIM was undertaken in this study by first arranging the

origins and destinations of the OD matrix in order of geographical similarity, and

subsequently defining the windows for a SSIM analysis consistent with the

geographical boundaries. Here, the window size varied with the geographical

boundaries considered in the rearranged OD matrix. This is different from the

traditional SSIM application, where the size of the window is fixed. The window

associated with the geographical boundary is termed as a geographical window and

the SSIM computed over the geographical windows is termed as geographical window

based SSIM, hereafter. This process is explained with the help of an example from the

Brisbane City Council (BCC), as detailed below.

The proposed geographical window has a physical significance associated with

it, to ensure geographical integrity and capture spatial correlation by computing

statistics on all lower zonal level OD pairs belonging to the same higher zonal level

OD pair. For instance, the higher zonal level is SA4, with SA3 as the lower level for

the BCC region. The size and shape of a geographical window is defined by the

number of SA3 zonal pairs present within the respective SA4 zonal pair. Therefore, in

this approach, the local geographical window need not always be a square matrix.

Figure 3.5 shows that each cell of the OD matrix represents a SA3 level OD

pair. Here, the OD matrix is rearranged so that the SA3 level origins (rows) and

destination (columns) can be grouped into respective SA4 level. For instance, SA3 (1)

to SA3 (j) from SA4 (1) level are arranged together. The SA4 level boundaries now

define the geographical SSIM windows. The yellow shaded region represents a

window covering OD pairs from SA4 (1) to SA4 (2).


Figure 3.5: An example to illustrate the proposed geographical window-based approach

Figure 3.6 demonstrates the application of the SA4 based geographical windows

for comparing SA3 (20 × 20) OD matrices of a Monday (Figure 3.6a) and a Sunday

(Figure 3.6b). The SA4 zones used in designing geographical windows are: Brisbane

East, Brisbane North, Brisbane South, Brisbane West, and Brisbane Inner. For

example, consider the geographical window of SA4 OD pair Brisbane East and

Brisbane North, which consists of SA3 OD pairs 30,101 to 30,201, 30,202, 30,203,

and 30,204; 30,103 to 30,201, 30,202, 30,203, and 30,204. These SA3 OD pairs are

geographically correlated because they belong to same SA4 origin (Brisbane East) and

SA4 destination (Brisbane North). Here, Brisbane East and Brisbane North consist of

2 and 4 lower level (SA3) zones, respectively. The size of corresponding local

geographical window is 2 × 4.

The local SSIM values are calculated for all geographical windows exclusively,

and the mean geographical window based SSIM (GSSI) was the average of all local

SSIM values. In the above example, the total number of geographical windows

considered is equal to the number of higher order OD pairs, which is 25. GSSI for

Sunday-Monday matrices pair is 0.7231. See Table 3.2 for the local geographical

window based SSIM and GSSI.


(a)

(b)

Figure 3.6: Splitting (a) Monday and (b) Sunday OD matrices into geographical (SA4) windows

Note that the afore-mentioned example is explained from the perspective of the

statistical zones used in Australia. However, the proposed geographical window-based

approach holds good for any other study region with its own hierarchical zonal

structure. Although the method demonstrated geographical windows using SA4 zones

on SA3 OD pairs, any combination of higher and lower level OD pairs can be used for

the same purpose; for instance, SA3 OD pairs can be used as higher level geographical

windows for SA1 OD pairs, etc. The geographical window based SSIM approach has

the following advantages over traditional SSIM.


3.3.1 Structural comparison of local travel patterns

While the GSSI value provides the overall structural comparison, the local

geographical window based SSIM value has its own practical significance. For

instance, it provides opportunities to compare the local travel demand distribution

(travel patterns) between different suburbs of a region that a sliding local window is

not capable of determining. Figure 3.7 illustrated that Sunday travel patterns differed

majorly for the suburb pair Brisbane South to Brisbane North. This is reflected by a

local SSIM value of 0.4653 (see Figure 3.7 (left) and the bold value in Table 3.2). On

the other hand, for another suburb pair Brisbane South to Brisbane West the Sunday

travel patterns are similar (if not exact) to that of Monday, with a SSIM value of 0.8037

(see Figure 3.7 (right) and the bold value in Table 3.2).

Figure 3.7: Insights into local travel patterns using geographical local window: (left) Brisbane South to Brisbane North and (right) Brisbane South to Brisbane West

Table 3.2: GSSI and local SSIM values: Monday vs Sunday B-OD matrices

Brisbane

East

Brisbane

North

Brisbane

South

Brisbane

West

Brisbane

Inner

Brisbane East 0.8319 0.2437 0.7650 0.9517 0.7755

Brisbane North 0.3311 0.7353 0.4034 0.7378 0.6299

Brisbane South 0.7771 0.4653 0.8062 0.8037 0.8117

Brisbane West 0.8340 0.7754 0.7562 0.8884 0.8165

Brisbane Inner 0.7716 0.6265 0.8257 0.8385 0.8750

GSSI 0.7231

30201 30202 30203 30204 30401 30402 30403 3040430301 26 54 206 122 30301 23 371 117 4830302 74 178 312 93 30302 135 65 594 22830303 42 54 195 85 30303 51 37 231 10630304 55 104 238 76 30304 71 25 443 16330305 32 40 219 65 30305 184 9 505 6030306 11 25 100 36 30306 38 8 90 26

30201 30202 30203 30204 30401 30402 30403 3040430301 15 32 50 63 30301 16 289 82 4530302 46 163 163 79 30302 86 26 473 21830303 11 33 56 53 30303 44 25 156 7530304 6 36 76 35 30304 54 15 193 3430305 8 24 43 18 30305 102 7 263 2430306 6 14 36 24 30306 31 4 75 21

Origin Dest

Origin Dest Brisbane North Brisbane West

Local SSIM=0.4653 Local SSIM=0.8037

Origin Dest

Brisbane South

Origin Dest

Brisbane South

SUNDAY SUNDAY

MONDAY MONDAY

Brisbane North

Brisbane South

Brisbane West

Brisbane South


3.3.2 Geographical window vs sliding window

The size of the geographical window is defined by the size of the SA4 suburb

(i.e., the number of SA3 OD pairs present in a SA4 OD pair). Thus, the local window

has a physical meaning associated with it, since it takes geographical integrity into

account through physical SA4 boundaries. Regarding size, the proposed geographical

windows are not of fixed dimensions. They are different sizes, such as 2 x 2, 2 x 4, 4

x 4, 6 x 6, etc., as shown in Figure 3.6. However, the GSSI values so computed are

proven to be equivalent to sliding local windows of smaller dimensions, as explained

below.

The sliding window equivalence of geographical window is demonstrated in

Figure 3.8 for weekends and Figure 3.9 for weekdays. Figure 3.8 illustrates the

comparison between the Monday OD matrix, with 40 OD matrices, from both

Saturdays (Figure 3.8a) and Sundays (Figure 3.8b). A similar analysis with nearly 45

OD matrices from Tuesday, Wednesday, Thursday, and Friday is illustrated in Figure

3.9. The GSSI value is shown to be equivalent to that of a 2 × 2 sliding window for

weekends and to that of a 3 × 3 sliding window for weekdays. Both Figure 3.8 and

Figure 3.9 demonstrate 12 different plots (11 of which correspond to sliding windows

of sizes ranging from 2 x 2 to 20 x 20, and one is based on GSSI). For each plot, the

x-axis corresponds to different daily OD matrices and y-axis reflects GSSI values.

(a) (b)

Figure 3.8: GSSI vs sliding windows based MSSIM for weekends

(a) (b)

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1 5 9 13 17 21 26 30 34 38

MG

eoS

SIM

OD matrices

Saturdays vs Typical Monday

2X23X34X46X68X810X1012X1214X1416X1618X1820X20GeoSSIM 0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1 5 9 13 17 21 25 30 34 39

MG

eoSS

IM

OD matrices

Sundays vs Typical Monday

2X23X34X46X68X810X1012X1214X1416X1618X1820X20GeoSSIM

0.94

0.95

0.96

0.97

0.98

0.99

1

1 5 9 13 17 21 25 29 35 39 43

MG

eoSS

IM

OD matrices

Tuesdays vs Typical Monday

2X2

3X3

4X4

6X6

8X8

10X10

12X12

14X14

16X16

18X18

20X20

GeoSSIM 0.94

0.95

0.96

0.97

0.98

0.99

1

1 5 10 14 18 22 26 30 36 40 44

MG

eoSS

IM

OD matrices

Wednesdays vs Typical Monday

2X2

3X3

4X4

6X6

8X8

10X10

12X12

14X14

16X16

18X18

20X20

GeoSSIM


(c) (d)

Figure 3.9: GSSI vs sliding windows based MSSIM for weekdays

3.3.3 Computational efficiency

Computationally, GSSI was proven to be 10-11 times more effective as

compared to MSSIM computed using sliding window of size 2 × 2. The test was

performed on a Dell computer with Intel(R) Core(TM) i7-4770 CPU, 16GB RAM

(3.40GHz). Figure 3.10 illustrates that the computational time of GSSI required to

compare 415 OD matrices with Monday OD matrix (Figure 3.6a) was 3.92 seconds,

and that of a 2 x 2 sliding window based MSSIM was 39.5 seconds. This is because

the comparison of 20 × 20-dimension OD matrices via 2 × 2 sliding window had to be

performed (20-1) × (20-1) = 361 times. On the other hand, GSSI was an average value

of all of the local SSIM values computed 25 times.

Figure 3.10: Comparison of computational costs: Sliding windows based SSIM vs SSIM

3.4 LEVENSHTEIN DISTANCE

The distribution of the origin flows to different destinations provides insights

into the structural knowledge of travel patterns. For example, the preference of

destinations could differ on different types of days, such as the choice of destinations

0.94

0.95

0.96

0.97

0.98

0.99

1

1 5 9 13 17 21 25 29 34 38 42 46

MG

eoSS

IM

OD matrices

Thursdays vs Typical Monday

2X2

3X3

4X4

6X6

8X8

10X10

12X12

14X14

16X16

18X18

20X20

GeoSSIM 0.94

0.95

0.96

0.97

0.98

0.99

1

1 5 9 13 17 21 25 29 35 39 43

MG

eoSS

IM

OD matrices

Fridays vs Typical Monday

2X2

3X3

4X4

6X6

8X8

10X10

12X12

14X14

16X16

18X18

20X20

GeoSSIM


during Mondays differing compared to that during a Sunday. This is due to different

activities and their schedules during both days. Even if destination choices are the same

during both days, the number of trips could differ. This implies that the structure of

traffic flows differs if destination choices and the number of trips differ between the

same set of OD pairs. OD estimation problem is another example, where target and

estimated OD matrices hardly differ in the structure (order of destination choices).

Comparing OD matrices from this perspective requires a statistical metric that

can exploit this additional structural information. For this purpose, an extended

traditional Levenshtein distance (details provided in Section 3.4.1) is proposed as a

new approach (presented in Section 3.4.2) to suit its applicability for the structural

comparison of OD matrices.

3.4.1 Traditional Levenshtein distance

Levenshtein distance, developed by Levenshtein (1966), is a measure of

proximity between two strings, mainly applied to compare sequences in the linguistics

domain, such as plagiarism detection and speech recognition, and in molecular biology

for comparing sequences of macro molecules, etc. For transport applications, the

metric is used to compare license plates (Oliveira-Neto, Han, & Jeong, 2012) and

cluster activity-travel patterns (Zhang, Kang, Axhausen, & Kwon, 2018).

The Levenshtein distance calculates the least expensive set of insertions,

deletions, or substitutions required to transform one string into another. For example,

when comparing two strings, such as “MONDAY” and “SATURDAY”, one of the

optimum ways is to insert the letters “S” and “A” and substitute “M”, “O” and “N”

with “T”, “U” and “R”, respectively leading towards a generalised Levenshtein

distance (GLD) of 5 (assuming a unit distance for each operation), as shown in Figure

3.13.

Figure 3.11: Example to demonstrate Generalised Levenshtein Distance


To understand the GLD technique and its formulation, in the following example,

X represents any string expressed as X = where, is the ith character of

X. The substring of X is represented as that includes characters from to where

1 ≤ i ≤ j ≤ q. While its length is defined as =j-i+1, it is termed as null string (ε)

if =0. Any general edit operation for a pair of characters (a, b) is expressed as

.

If string X is the result of the operation to string Y, then it can be written

as Y X via . The notations for the three operations are expressed as follows:

Insertion: if ;

Deletion: if ; and

Substitution: if ; a and b

If S is defined to = , as the sequence of edit operations to

transform Y X and then the cost associated with each edit operation as

. The GLD is the minimum total cost required to transform Y to X (see

Equation (38).

GLD (X, Y) = (38)

The normalised Levenshtein distance (NLD) is the GLD normalised by the sum

of the lengths of two strings (Equation (39). This metric always lies between 0 and 1

(Yujian & Bo, 2007).

NLD (X, Y) = (39)

Algorithm 1 presents the pseudo code for computing GLD and NLD for two

strings X and Y, where X =x1…xq and Y = y1…yp (Heeringa, 2004). The lengths of

strings X and Y are q and p, respectively. For ease of explanation, the matrix

demonstration of Algorithm 1 is given in Figure 3.12. The computation of GLD via

Algorithm 1 for two strings presented in Table 3.3 is illustrated in Figure 3.13.

Table 3.3: Algorithm 1 for Normalised Levenshtein distance for strings comparison (see Figure 3.12)

Create an empty matrix “K” of size of size (p+1) *(q+1), where the row and

column headers correspond to character of the string Y and X, respectively.

Assign values 0....q and 0….p to the first row and first column, respectively


for j = 1 to q

for i= 1 to p

Estimate cost as

Set the cell K (i, j) = min (K (i-1, j) + 1, K (i,j-1) + 1, K (i-1, j-1) + Ci, j ) Where:

o K (i-1, j) + 1 represents the cell value immediately above the current

cell plus 1

o K (i, j-1) + 1 represents the cell value immediately to the left of

current cell plus 1

o K (i-1, j-1) + represents the cell value immediately in diagonal

above and to the left of current cell plus the cost

The GLD is the value of the cell K (p+1, q+1) and the NLD =

The explanation to the above pseudo code in terms of edit operations is shown

with a matrix demonstration in Figure 3.12. Here, we can see that there are multiple

paths (i.e. different combination of arrows) possible to arrive at the final K(p+1,q+1).

Each path is a combination of editing operations represented as the following moves

on the matrix grid: downward movement along the diagonal is for substitution

operations, eastward movement is for deletion operation, and vertical downward

movement is for insertion operation (Oliveira-Neto, et al. (2012)).

Figure 3.12: Matrix demonstration of traditional Levenshtein approach (Algorithm 1)

x 1 x 2 . x j-1 x j . . x q

0 1 2 . j-1 j . . q

y 1 1 K(2,2)

y 2 2

. .

y i-1 i-1 K(i-1,j-1) K(i-1,j)

y i i K(i,j-1)

K(i,j) = Min {K(i-1,j)+1 ,K(i,j-1)+1, K(i-1,j-1)+C i,j }

. .

. .

y p p K(p+1,q+1)

Stri

ng Y

String X

Strin

g Y

String XMatrix of size (p+1,q+1)


In the traditional Levenshtein approach, the numbering of rows and columns

of matrix (K) commence with “0”. This is done to facilitate the comparison of the first

character from both strings X and Y and store the value in K (2, 2) (see Figure 3.12).

The comparison is made by traversing the matrix row by row and then column wise

until all characters in both strings are compared. Because the overall comparison of all

characters ends at the last cell of the matrix, K (p+1, q+1) is chosen as the GLD value.

Figure 3.13: Comparison of strings “Monday” and “Saturday” using GLD

In literature, the use of Levenshtein distance for transport applications is

relatively scarce. Oliveira-Neto, et al. (2012), for instance, applied this technique to

compare license plates. Here, the sequence of characters on the license plates observed

at upstream and downstream stations were compared. Zhang, et al. (2018)) applied

Levenshtein technique to compare the sequences of trip purposes and cluster activity-

travel patterns. Other researchers have used similar techniques (such as sequence

alignment method (SAM)) to compare any two activity-travel patterns (as by

Allahviranloo and Recker (2015)); and sequence of trips (as by Crawford, Watling,

and Connors (2018)). The commonality among these studies is that they were similar

to comparison of one-dimensional strings with unit cost for each operation. However,

OD matrices are two-dimensional arrays consisting of OD flows between different

origin and destination pairs, which means direct application of such traditional

techniques is not possible. In light of this, the following section proposes a detailed

methodology to extend the applicability of traditional Levenshtein distance for the

structural comparison of OD matrices.


3.4.2 Proposed Levenshtein distance for structural comparison of OD matrices

As discussed in the previous section, the Levenshtein distance is an effective

metric to identify differences in the order/arrangement of any string. For the

applicability of Levenshtein distance on OD matrices comparison we propose to:

a) Consider each row of an OD matrix independently. The values in each row

corresponds to the flow from an origin to different destinations. For a given

origin we define a ‘string’, where each character is a destination ID arranged

in the descending order of OD flows and is referred as ‘sorted row’. To

compare the structure of two OD matrices, we compare the order of destination

IDs in each sorted row of the OD matrix.

b) Include OD flows in the formulation of Levenshtein distance, the details for

which are presented later.

Hereon, the proposed modified approach is termed as Levenshtein distance for OD

matrices. Before describing the proposed formulation, let us consider an example as

shown in Figure 3.14a, where two OD matrices X (reference matrix) and Y (query

matrix), each of dimensions M * M, are to be compared. Here, the origin IDs are

expressed as O1, O2, O3 and O4 and destination IDs are expressed as N, E, W and S

(thus, M=4 in this example). In Figure 3.14b, the rows of each matrix are sorted

individually in descending order of their OD volumes. For instance, for origin O1

(row-1) of matrix Y, the sequence of destinations in descending order of demand is S,

W, N and E with 16, 12, 10 and 9 trips, respectively (refer in Figure 3.14b for

matrix Y).

Figure 3.14: Example to demonstrate Levenshtein distance application for OD matrices comparison

N E W S N E W SO1 3 4 6 10 O1 10 9 12 16O2 7 4 5 11 O2 17 10 13 11O3 12 8 5 6 O3 11 14 12 18O4 13 7 9 6 O4 12 13 19 15

Dest Origin

(Dest., Trips) Choice 1




Dest Origin





O1 (S,10) (W,6) (E,4) (N,3) O1 (S,16) (W,12) (N,10) (E,9)O2 (S,11) (N,7) (W,5) (E,4) O2 (N,17) (W,13) (S,11) (E,10)O3 (N,12) (E,8) (S,6) (W,5) O3 (S,18) (E,14) (W,12) (N,11)O4 (N,13) (W,9) (E,7) (S,6) O4 (W,19) (S,15) (E,13) (N,12)

b) Row sorted Reference (left) and Query matrices (right)

(Y) Query Matrix

a) Original OD matrices: Reference(X) and Query(Y)

(X) Reference Matrix


For OD matrix Y, the sorted set of destination IDs and the corresponding

demand from nth origin is expressed as = ( ) =

[ ]. Here, and are the ith preferred

destination and its corresponding demand value, respectively from nth origin of Y.

Similarly, we express = ( ) for matrix X. The null pair is represented as (ε,

0). Length of the sets, ( ) and ( ) is each. If ) is the result of any

edit operations to ( ), then it can be written as ).

3.4.2.1 Proposed edit operations

As compared to the traditional Levenshtein approach, the edit operations in the

proposed Levenshtein distance for OD matrices is different in the following ways:

a) We compute cost in each of the edit operations in terms of flows because OD

demand is another attribute besides the destination IDs.

b) Destination IDs in both the OD matrices are same, while their order varies, so

we do not need any substitution operation.

c) We propose additional edit operation –absolute trips-difference that accounts

for the differences in the OD flows when the ith preferred destination is same in

both sorted rows.

Any edit operation towards the transformation of to can be expressed as

. Following are the possible operations:

1) Absolute trips-difference: if the destination ID, D, is same in both and ,

i.e., , then associated cost is the absolute difference in the

demand = | .

2) Insertion of trips i.e., : Here, the destination ID, is inserted

in . The associated cost is the demand, .

3) Deletion of trips, i.e., : Here, the destination ID, is deleted

from . The associated cost is the demand, .

Let, S be the sequence of edit operations or edit sequence to

transform , and the cost (in terms of trips) associated with each edit operation

are , respectively. Then, Levenshtein distance for OD matrices

computed for nth row (LODn) is the minimum total cost needed for .

(Equation (40)). As the minimum cost is required, so it is an optimization problem.


Refer Figure 3.15 that demonstrates two possible combinations of edit operations for

of the example shown in Figure 3.14b.

While the LODn formulation is an absolute comparison of sorted rows, we can

have a relative comparison with respect to each row of the OD matrix. This can be

achieved by considering trip productions (sum of OD flows in a row) from both sorted

rows during comparison. This relative comparison is a normalised version of LODn,

and can be expressed between a scale of 0 and 1. It is referred as NLODn and is

expressed as shown in Equation (41). Here, NLODn is obtained by normalising over

the sum of origin flows for nth row from both matrices. If the number of origins is N,

then we have N values of LODn and NLODn.

The overall comparison between the OD matrices is obtained through mean

Levenshtein distance i.e. LOD is the average of all LODn values, and the mean

normalised Levenshtein distance i.e. NLOD is the average of all NLODn as shown in

Equation (42) and Equation (43), respectively.

LODn ( , ) = (40)

NLODn ( , ) = (41)

LOD ( , ) = (42)

NLOD ( , ) = (43)

To explain the possible combinations of the edit operators, an example is

presented in Figure 3.15. Consider the sorted rows, and from the previous

example (refer Figure 3.14b). The transformation of can be achieved by

multitude of edit operation combinations. Two such possible combinations are

presented in Figure 3.15. The cost of operation in Figure 3.15a is higher than that of Figure

3.15b; that is, a total cost of 54 trips (NLOD2 = 54/ (27+51) = 0.69) and 46 trips

(NLOD2 = 46/ (27+51) = 0.59), respectively.


Figure 3.15: A possible combination of edit operations vs minimum total cost of edit operations

3.4.2.2 Algorithm to compute Levenshtein distance for OD matrices

The Algorithm 2 presented in Table 3.4 demonstrates the approach adopted to estimate

Levenshtein distance for comparing OD matrices Y and X each of size M * M. LODn

and NLODn are estimated for each origin (n= 1 to M) individually that is later

aggregated to estimate LOD and NLOD, respectively.

Note that when destination IDs are different, the total cost ( ) in Algorithm 2

is estimated as the sum of demands i.e. | . One can argue why the cost is not

the average of the two demands. Average is always lower than summation, and to be

conservative we would like to have a higher cost for different destination IDs.

The self-explanatory matrix demonstration of Algorithm 2 is illustrated in Figure

3.16. Similar to traditional approach, the numbering of rows and columns of matrix

(L) commence with 0. However, to account for the OD flows, in Algorithm 2, we

replace the first row and column with cumulative sum of trips distributed to the

destinations of sorted reference and query rows.

Table 3.4: Algorithm 2 for Levenshtein distance for OD matrices (see Figure 3.16)

For each origin n (n = 1 to N) =0

Define ) and ) where,

) = [ ] and

)=[ ],

S N W E S N W E11 7 5 4 11 7 5 4

11 trips (deletion) 11 trips (deletion)N W E N W E

7 5 4 7 5 4

10 trips (absolute trips-differnce) 10 trips (absolute trips-differnce)N W E N W E

17 5 4 17 5 4

8 trips (absolute trips-differnce) 8 trips (absolute trips-differnce)N W E N W E

17 13 4 17 13 4

4 trips (deletion) 11 trips (insertion)N W N W S E

17 13 17 13 11 4

11 trips (insertion) 6 trips (absolute trips-differnce)N W S N W S E

17 13 11 17 13 11 10

10 trips (insertion)N W S E17 13 11 10

(a) =54 trips (NLOD2=0.69) (b) =46 trips(NLOD2=0.59)Total cost of edit operations

One of the possible ways of edit operations

Optimal edit operations for minimum total cost

Total cost of edit operations


Create an empty matrix L of size (M+1)*(M+1), where the row header and column header corresponds to ) and ) respectively (refer Figure 3.16).

Assign cumulative flows [ ] and

[ ] to the first row and column, respectively for j = 1 to M

for i = 1 to M

Estimate cost as

Set the cell L(i,j) = min (L(i-1,j)+ , L (i,j-1)+ , L (i-1,j-1)+ Ci,j ) Where: a) L(i-1,j) + represents the cell value immediately above the current

cell plus b) L(i, j-1) + represents the cell value immediately to the left of current

cell plus . c) L(i-1, j-1) + represents the cell value immediately in diagonal above

and to the left of current cell plus the cost . The local Levenshtein distance i.e. = L (M+1, M+1) and Normalised

Levenshtein distance is = / .

Mean Levenshtein distance values are computed as LOD = ( /N and NLOD = ( /N.

Figure 3.16: Matrix demonstration of Algorithm 2

Similar to the traditional Levenshtein approach, we have multiple possible

paths (i.e. different combination of arrows) to arrive at the final L(M+1,M+1). Each

path is a combination of editing operations represented through the following moves

Matrix of size (M+1,M+1) . . .

. . .

0 . . .

. . .

L(i,j) L(i,j+1)

L(i+1,j)

L(i+1,j+1) = Min {L(i,j+1)+ , L(i+1,j)+ , L(i,j)+C ij }

. . .

. . .

L(M+1,M+1)

Sorte

d Q

uery

row

f

or o

rigin

n

Sorted Reference row for origin n

(

)


on the matrix grid: downward movement along the diagonal is for absolute-trips

difference operation, eastward movement is for deletion operation, and vertical

downward movement is for the insertion operation.

The application of Algorithm 2 on in the example shown in Figure

3.14 is presented in Figure 3.17. Here, the direction of arrows points towards the

optimal combination of edit operations for minimum total cost. The last cell of matrix

L i.e. L(5,5) is the value of =46 trips. This value is same as the operations shown

in Figure 3.15b; that is, first we have a deletion operation (east ward arrow); two

consecutive absolute trips-difference operations (diagonal downward arrows); one

insertion operation (vertical downward arrow); and finally one more absolute trips-

difference operation (diagonal downward arrow).

Figure 3.17: Matrix (L) demonstration for

3.4.3 Levenshtein vs Wasserstein distances

The mathematical formulation for most traditional metrics is straightforward and

does not involve an optimisation approach. On the contrary, the NLOD comparison is

based on optimisation formulation, and as a result, it is computationally expensive

compared to GSSI. In the literature, the Wasserstein distance is another metric that can

structurally compare OD matrices based on optimisation formulation. Thus, this

section compares the proposed Levenshtein distance with the Wasserstein distance in

the context of OD matrices comparison.

3.4.3.1 Wasserstein distance

The Wasserstein distance is primarily used in mass transportation problems. It is

based on the Monge-Kantorovich mass transportation technique initially developed by

French mathematician Monge (1781) and major advances to it were later added by

Soviet mathematician Kantorovich (1942). The Wasserstein distance is defined as the

S N W E11 7 5 4

0 11 18 23 27

N 17 17 28 21 26 30

W 13 30 41 34 29 33

S 11 41 30 37 40 44

E 10 51 40 47 50 46


minimum cost required to optimally transfer objects from one set of locations to

another set of locations. Thus, in terms of formulation, the Wasserstein distance can

be expressed as follows (Equation (44)):

Wasserstein distance (s, h) = [ ] ; (44)

For example, in Equation (44), is the amount of sand transferred via distance

between the locations of sand (s) and holes (h), as discussed in the example

shown in Figure 3.18. Here, the Wasserstein distance is used to calculate the minimum

amount of work (optimum cost) required to transfer “sand” (amount in kgs) into

“holes” (capacity in kg). While the grid lines are the paths to be traversed between the

locations of the “sand” and “holes”; the x and y axes represent the distance in meters.

The amount of sand to be transferred from locations s1, s2, and s3 is 5, 6, and 9 kg,

respectively. The capacity of holes h1, h2, and h3 is 6, 3, and 11 kg, respectively.

Figure 3.18: Demonstration of Wasserstein distance through an example

In this example, if v (in kg) is the amount of sand to be transferred from its

location (si) to the hole location (hj) via distance (d) in meters, then cost (c) is computed

as v*d (kg-meters). The total minimum cost is achieved from the optimal combinations

of si and hj, as shown in Table 3.5. The Wasserstein distance is then computed as the

total cost (in kg-meters) divided by the total amount (in kg), which is equal to 33/20 =

1.65 meters.


Table 3.5: Computation of Wasserstein distance for the example problem

si hj v d c=v*d

s1 h1 3 2 6

s1 h3 2 3 6

s2 h1 3 3 9

s2 h2 3 1 3

s3 h3 9 1 9

Total Wasserstein distance in kg-meters 33

Mean Wasserstein distance in meters 1.65

3.4.3.2 Wasserstein distance for OD matrices comparison

Ruiz de Villa et al. (2014) used the concept of the Wasserstein distance for

structural comparison of OD matrices by accounting for network topology in terms of

travel time. This is solved as a linear programming problem (see Villani, (2003) for

further detail). It is defined as the minimum vehicle-minutes required to assign trips

between OD pairs of query matrix (XQ) with a distribution similar to that of reference

OD matrix (XR) and vice-versa. The Wasserstein distance of matrix XQ to the matrix

XR is defined as shown in Equation (45).

Wasserstein (XQ, XR) = [ ] ; (45)

Here, XQ and XR are the OD matrices to be compared; are the pair of

OD pairs; is the volume of traffic assigned from . The travel cost between the

OD pairs is given by and defined as the mean travel time between the centroids.

For example, origins and destinations of OD pairs and are , and , ,

respectively. Then is computed as follows (Equation (46).

= ( , ) + ( , ) (46)

Like Equation (45), the Wasserstein distance of matrix XR to the matrix XQ (i.e.,

Wasserstein (XR, XQ)) is computed. The minimum of Wasserstein (XQ, XR) and

Wasserstein (XR, XQ) gives the final comparison between OD matrices XQ and XR.

To demonstrate the Wasserstein approach with an example (Figure 3.19), if O1

and O2 are origins; D1 and D2 are destinations in both matrices XR and XQ. While OD

pairs from XQ considered in the analysis are (O1-D2)Q and (O2-D2)Q, those from XR


reference are (O1-D1)R and (O2-D1)R. In this example, it is assumed that one vehicle

corresponds to one trip and travel time between origin and destination is not included

in the comparison of OD matrices.

Figure 3.19: (a) Sample network and (b) OD matrices XR and XQ with their corresponding paths and travel costs.

In this example, the paths traversed by vehicles in XR and XQ are shown in Figure

3.19(b). For instance, l1-l4 is the path for OD pair (O1-D2)Q. Here, the trips of XQ are

assigned using the distribution of XR in Case-1, and vice-versa in Case-2, respectively

as discussed below.

Case1: Optimal assignment of XQ (i.e. OD flows (O1-D2)Q and (O2-D2)Q ) using

the distribution of XR

1) Assignment of (O1-D2)Q : Here, 80 trips between (O1-D2)Q need to originate

from O1, travel via the paths of XR before reaching D2. The only paths used

in XR are l2 and l5 with the distribution of 60 and 100 flows between OD

pairs (O1-D1)R and (O2-D1)R, respectively. Thus, either 80 trips can be

assigned to l5 or 60 trips to l2 and 20 trips to l5. Since the latter option results

in the optimal assignment, this is chosen. Now, the travel cost between OD

pairs, (O1-D2)Q and (O1-D1)R is =10 minutes (since distance between O1-

O1 is zero and distance between D2-D1 is 10 minutes) and the distance

between (O1-D2)Q and (O2-D1)R is =2+10=12 minutes (here 2 is the

travel cost between O1-O2 and 10 is between D2-D1). Thus, the cost

associated with this assignment is 60*( ) + 20*( ) = 840 veh-minutes.

2) Assignment of (O2-D2)Q : OD pair (O2-D1)R can still accommodate 80 trips

after assigning 20 trips from the above assignment. This implies that

remaining 80 trips from (O2-D2)Q can be assigned to (O2-D1)R via l5. The


distance between OD pairs (O2-D2)Q and (O2-D1)R is =10 minutes. Thus,

the cost associated with this assignment is 80*( ) = 800 veh-minutes.

Thus, the total travel cost between XQ and XR is 840+800 = 1640 veh-minutes.

In terms of travel time per trip, it is 1640/(80+80) = 10.25 minutes per trip.

Case2: Optimal assignment of XR using the distribution of XQ

1) Assignment of (O1-D1)R: The distance between (O1-D1)R and (O1-D2)Q is

=10 minutes and between (O1-D1)R and (O2-D2)Q is =2+10=12

minutes. Since < , the optimum method is to assign 60 trips of (O1-D1)R

via path of (O1-D2)Q i.e. l1-l4. Thus, the cost associated with this assignment

is 60*( ) = 600 veh-minutes.

2) Assignment of (O2-D1)R: Following the above assignment, (O1-D2)Q can

only accommodate 20 more trips. Thus, amongst 100 trips of (O2-D1)R, 20

are sent to (O1-D2)Q and the rest of the 80 trips are assigned to (O2-D2)Q.

The distance between (O2-D1)R and (O1-D2)Q is =2+10=12 and between

OD pairs (O2-D1)R and (O2-D2)Q is =10 minutes Thus, the cost

associated with this assignment is 20*( ) + 80*( ) = 1040 veh-minutes.

Thus, the total cost between both OD matrices in Case-2 is 600+1040 = 1640

veh-minutes. In terms of travel time per trip, it is 1640/(60+100) = 10.25 minutes per

trip.

From the above two cases, the Wasserstein distance between the two OD

matrices is the minimum distance value of 10.25 minutes per trip.

Note: In this example, both cases yielded the same results. This might not be

possible if both OD matrices have unequal OD flows. For cases of different OD

volumes, Ruiz de Villa et al. (2014) proposed building a virtual OD pair to equalise

the total OD flows.

Although both the Levenshtein and Wasserstein metrics are optimisation-based,

they differ from each other, as discussed below.

First, LOD computes the structural differences between OD matrices in terms of

OD flows. On the other hand, the Wasserstein metric is expressed in terms of travel

cost.


Second, the Wasserstein approach ignores the travel time between the origin and

destination zones when comparing OD matrices. If the purpose is to compare OD

matrices from different solution algorithms but from the same time-period, then it

might be justified to ignore the travel time. However, if the OD matrices to be

compared belong to different time instances/days, then the travel time cannot be

ignored. This is because the travel time between two locations could differ for different

days (e.g., during Sunday and Monday or a regular weekday and weekday during a

school holiday), and even within a day, such as peak/off-peak periods. Thus, the travel

time between the zones plays a significant role and cannot be neglected. On the other

hand, LOD has no such issues, as it is not based on travel time.

Third, the Wasserstein metric is computationally expensive compared to LOD.

This is because the solution search space for the Wasserstein metric (Equation (45) is

spread over the entire OD matrix. That is, the travel cost for all combinations of OD

pairs need to be checked for an optimum distance; whereas the local LOD is computed

separately for each row, and as such, the solution search space is constrained to OD

pairs originating from a specific origin only.

To compare the computational strength of two metrics, a Monday matrix was

compared with a Sunday matrix (see Figure 3.6). As mentioned before, evaluating

travel time is an issue with respect to Wasserstein metric; thus, the experiment is

conducted using travel distance between the zones. The test was conducted on a Dell

computer with Intel(R) Core(TM) i7-4770 CPU, 16GB RAM (3.40GHz) and the time

taken for computation was 0.33 seconds for LOD and 1690 seconds for the

Wasserstein approach. According to Ruiz de Villa et al. (2014), sparseness in OD

matrix could reduce the computational cost of Wasserstein method. However, if OD

matrices are not sparse (as shown in Figure 3.6) the Wasserstein approach is not

computationally efficient.

3.5 SENSITIVITY ANALYSIS OF GSSI AND NLOD

The primary aim of the sensitivity analysis was to test the robustness of the proposed

metrics - GSSI and NLOD. If both metrics were well designed, then their

similarity/distance values should increase/decrease as OD matrices are structurally

closer to each other and decrease/increase otherwise. Also, the structure component of

both metrics should observe no change for uniform scaling effects.


While the sensitivity of NLOD can be tested directly from its formulation, there is no

explicit representation of the “structure” component of NLOD. This is because NLOD

captures both “skeleton” and “mass” together, the structural information of OD

matrices (i.e. the “skeleton”), in terms destination preferences, is implicitly considered

in its formulation. However, the sensitivity of it’s latent “structure” component can be

analysed by deploying NLOD on the normalised OD flows. To this end, two set of

experiments are designed to analyse the sensitivity of both NLOD and its latent

“structure” component to different structural changes within OD matrices. The first

experimental set up performs sensitivity analyses towards uniform scaling effects; and

the second set of experiments analyse their sensitivity towards random scaling effects

in OD flows.

The study site, data and the design of experimental set up are briefly discussed

below.

1) Study site and data: The Brisbane City Council (BCC) region is the study

area, and the data consists of Bluetooth observations observed from more

than 845 Bluetooth scanners located within the study region (refer Figure

1.9). The reference OD matrix, X is the Bluetooth based OD matrix (20 x

20) observed on Monday, 7th March 2016 (refer to Behara, Bhaskar, and

Chung (2018)). The OD pairs are represented at statistical area (SA)-3 level.

More details about the development of Bluetooth OD matrix from Bluetooth

observations is described by Michau, et al. (2014) (also refer Appendix-A)

The query OD matrices are developed specific to the experiments and are

obtained by perturbing the reference OD matrix. The details for which are

provided in section 3.5.1.

2) Experiments: We generally encounter three possible situations while

comparing OD matrices. They are:

Situation-1: OD matrices have the same structure and different OD flows;

Situation-2: OD matrices are structurally different and have different OD

flows; and

Situation-3: OD matrices are exactly similar

It would be interesting to see how the structure component of both metrics

perform in all these situations. The structure component of GSSI has an explicit

formulation (similar to Equation (37b); however, it is implicit in the formulation


of NLOD. Thus, the performance of NLOD’s structural component is tested by

deploying NLOD on the normalised OD flows. This is done to nullify the effect

of mass/OD flows while comparing OD matrices. Two experiments for each

metric are designed for the sensitivity analysis. They are as follows:

1) Uniform scaling effect- Here, the query matrices have the same

skeleton/structure as that of reference OD matrix while the mass/OD flows

vary. If the uniform scale factor is one, then both OD matrices are exactly

similar. Thus, abovementioned situation-1 and situation-3 are tested here.

2) Random scaling effect- Here, the skeleton/structure and mass/OD flows vary

between query and reference OD matrices. Thus, abovementioned situation-

2 is tested here.

3.5.1 Experimental criteria

3.5.1.1 Criteria for uniform scaling effects

Here, sensitivity of GSSI and NLOD along with their corresponding structural

components are tested for different uniform scaling percentages. The reference OD

matrix, X is compared with Yi where Yi = *X, and is chosen from [0.1, 0.2,

0.3…1.9, 2.0].

1) The condition for GSSI’s structure component to be robust:

a) The value should be equal to one for any value of between 0.1 and 2.

2) The conditions for GSSI to be robust:

a) It should increase with increase in scaling percentage for 0.1 <= < 1.

b) It should be equal to one for .

c) It should decrease with increase in scaling percentage for 1 < <=2.

3) The condition for NLOD’s structure component to be robust:

a) The results should be zero for any value of i.e. 0.1 <= <= 2.

4) The conditions for NLOD to be robust:

a) NLOD should decrease with increase in scaling percentage for 0.1 <=

< 1.

b) NLOD should be zero for .

c) NLOD should increase with increase in scaling percentage for 1 <

<=2.


3.5.1.2 Criteria for random scaling effects

Here, sensitivity of GSSI and NLOD, and their structural components are tested for

four different cases of random scaling percentages i.e. = [5%, 10%, 15%, 20%] over

three types of demand scenarios. These demand scenarios are generally encountered

in traffic demand modelling (refer Djukic et al. (2015)) and are as follows:

1) Outdated surveys (low demand),

2) The best historical estimates (medium demand), and

3) Congested traffic conditions (high demand).

Note that the scenarios are named as low (l), medium (m), and high (h) in

reference to the total daily demand on the network, and do not refer to the demands of

individual OD pairs. In each case of the demand scenario, reference OD (X) is

compared with 100 replications of query ODs ( ). The details of the demand scenarios

are as follows:

Low demand scenario: Here, GSSI/NLOD compare X and where,

and i . For instance, if =20%, then ranges

between 60% and 80% of X, and similarly for other values of .

Medium demand scenario: Here, GSSI/NLOD compare X and

where, and i . For instance, if =20%,

then ranges between 80% and 100% of X, and similarly for other values of .

High demand scenario: Here, GSSI/NLOD compare X and where

and i . The OD matrices for the high demand

scenario represent demand during congested periods. Say, high daily demand can be

witnessed during major events, such as Commonwealth games etc. For instance, if

=20%, then, ranges between 105% and 125% of X and, similarly for other values

of .

The conditions for both GSSI, NLOD and their structural components to be robust

towards random effects are:

1) They should reflect the random structural differences that exist between the OD

matrices. The GSSI (and its structural component) values should decrease/

increase with increase/decrease in the magnitude of random scaling effects for


all three demand scenarios; and the vice-versa for NLOD and its structural

component.

3.5.2 Results of uniform scaling effects:

The results of uniform scaling for GSSI and NLOD along with their corresponding

structural components are shown in Figure 3.20(a) and Figure 3.20(b), respectively.

The plots illustrated that GSSI and NLOD satisfied the conditions specified in section

3.5.1.1 i.e. GSSI values increased from 0.04 to 1 for 0.1 <= < 1 and decreased from

1 to 0.64 for 1 < <=2; and NLOD values decreased from 0.8 to 0 for 0.1 <= < 1

and increased from 0 to 0.3 for 1 < <=2. Similarly, the structural components of

GSSI and NLOD remained unaffected (i.e. equal to 1 and equal to 0, respectively) for

both scaling-up and scaling-down cases (as described in section 3.5.1.2). Thus, it is

proved that both metrics and their structural components are robust towards uniform

scaling effects.

(a)

(b)

Figure 3.20: Results of uniform scaling for GSSI and NLOD

0.000.100.200.300.400.500.600.700.800.901.00

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

GSS

I/ st

ruct

ure c

ompo

nent

va

lues

Scaling factor

GSSI GSSI's structure component

0.000.100.200.300.400.500.600.700.800.90

0.10.20.30.40.50.60.70.80.9 1 1.11.21.31.41.51.61.71.81.9 2

NLO

D/ S

truct

ure c

ompo

nent

va

lues

Scaling factor

NLOD Structure component of NLOD


3.5.3 Results of random scaling effects

The box plots shown in Figure 3.21 demonstrate that as the magnitude of random

fluctuations increase, the similarity measure by both GSSI and its structural component

decrease. For instance, the values for GSSI, as illustrated in Figure 3.21 (a), for low

demand scenario are 0.7759, 0.7675, 0.7489 and 0.7307 for = 5%, 10%, 15%, 20%,

respectively. The results showed similar decreasing trend for all three demand

scenarios in Figure 3.21 (a) and Figure 3.21 (b).

(a)

(b)

Figure 3.21: Results of random scaling effects for (a) GSSI and (b) its structure component


The plots shown in Figure 3.22(a) demonstrate that as the magnitude of random

fluctuations increase, the distance measure by NLOD and its structural component also

increase. For instance, the values for NLOD, as illustrated in Figure 3.22(a), for low

demand scenario are 0.14, 0.24, 0.31 and 0.39 for = 5%, 10%, 15%, 20%,

respectively. The results showed similar increasing trend for all three demand

scenarios (Figure 3.22(a) and Figure 3.22(b)). Thus, the results prove that NLOD and

its structure component are robust towards random scaling effects.

(a)

(b)

Figure 3.22: Results of random scaling effects for (a) NLOD and (b) its structure component


From the results of both experiments, it can be concluded that GSSI and NLOD

are sensitive to the structural differences within the OD matrices and are robust

statistical measures. Following the sensitivity test, a real case study analysis to

demonstrate the practical application of NLOD.

To further demonstrate their potential over the limitations of traditional metrics,

the same example discussed in Section 3.1 could be considered. The proposed metrics,

NLOD and GSSI were deployed to see if they could account for structural differences

between M1 and M2 in comparison to MR (see Figure 3.1). The results of the GSSI

(considering one window of size 4*4) and NLOD are presented in Table 3.6. Both

metrics identified the differences and this indicates that M1 was structurally closer to

MR than M2. Note that the GSSI is a similarity value, which means the higher the

similarity, the lower the distance value.

Table 3.6: Structural comparison of sample OD matrices using the proposed metrics

Proposed metrics M1 and MR M2 and MR

GSSI 0.9910 0.8213

NLOD 0.0476 0.0734

3.6 SUMMARY

To summarise, the chapter began with a discussion on the limitations of

traditional metrics that are generally based on cell by cell comparison and often neglect

OD matrix structural information within their formulations. To overcome this

problem, this chapter adopts and extends two existing metrics: Structural Similarity

Index (SSIM) and Levenshtein distance. The proposed metrics named, Mean

Geographical window based SSIM (GSSI) and Mean Normalised Levenshtein

Distance for OD matrices (NLOD) exploit the structure of OD matrices and provide

comparison results with physical significance.

Compared to traditional SSIM, the GSSI technique is computationally effective;

can capture local travel patterns and preserves geographical integrity. Further,

proposed NLOD is an optimisation-based metric and is computationally better than

another popular metric – Wasserstein distance. While GSSI computes statistics on


group of OD pairs that are geographically correlated, NLOD performs analysis on OD

pairs belonging to one specific origin.

The sensitivity of the proposed metrics is further tested towards uniform scaling

and random scaling effects. The findings of the sensitivity analysis suggest that GSSI

and NLOD approaches are robust statistical metrics and have potential for practical

applications involving OD matrices comparison.

Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 93

Chapter 4: Assignment-based OD

Matrix Estimation:

Exploiting the Structure of

Bluetooth Trips

This chapter presents the background about the issues related to traffic counts-

based OD estimation methods in Section 4.1, description of the study network in

Section 4.2, Matlab-Aimsun bi-level framework in Section 4.3, OD estimation using

the additional structural knowledge of B-OD flows in Section 4.4, using structural

knowledge of B-SP flows in Section 4.5, comparison of B-OD and B-SP methods in

Section 4.6, demonstration of the B-SP method for lower penetration rates of Bluetooth

trajectories in Section 4.7 and finally, summary of the chapter in Section 4.8.

4.1 BACKGROUND

This chapter discusses assignment-based methods for estimating OD matrices based

on the structural knowledge about Bluetooth trips and observed link counts. As

discussed in Chapter 2, most OD matrix estimation methods are dependent on traffic

counts observations only. One of the key challenges of the traffic counts-based method

is the problem of under-determinacy. In the past, several efforts (Bierlaire & Toint,

1995; Gur, 1980b; Kim, et al., 2001) have been made to minimise this problem by

maintaining structural consistency within the OD matrix estimates. This is generally

achieved by either incorporating target OD information within the objective function

formulation or using additional constraints based on trip productions and attractions in

the solution algorithms. However, this additional information is based on outdated

surveys and the solution is always biased. Nevertheless, with the availability of

additional up-to-date structural information from emerging data sources, such as

Bluetooth, the current problem of under-determinacy can be reduced, and may

therefore improve the quality of estimated OD matrices.


In this light, the current chapter proposes methods to incorporate this additional

information within the objective function formulation of bi-level optimisation. The

structural knowledge from Bluetooth trips can be represented in two ways: Bluetooth

OD (B-OD) flows, and Bluetooth subpath (B-SP) flows. Two methods, namely B-OD

structure-based method and B-SP structure-based method are proposed to exploit the

structural knowledge of B-OD flows and B-SP flows, respectively. The B-OD method

is applicable for networks that have a good connectivity of Bluetooth scanners. For

instance, the sub-network comprising regions in and around the Brisbane inner city

has very good connectivity of Bluetooth scanners. In the situations where the

penetration rate of Bluetooth trajectories is low, the B-SP method can be implemented.

Both the proposed methods depend on Aimsun simulation for assignment, and due to

which they are also referred to as assignment-based methods.

4.1.1 B-OD structure-based method (or B-OD method)

This method was designed from the structural perspective of B-OD flows and

further divided into two scenarios: an ideal scenario and near-ideal scenario. In the

ideal scenario, the Bluetooth OD demand sample rate was assumed to be η=20% of

the true OD flows, and this scenario is termed as “ideal” because of the following

reasons. First, the trips-ends inferred from Bluetooth are assumed to be the actual

origins and destinations of the trips. Second, the penetration rate of Bluetooth OD

flows is assumed to be fixed (here, η=20%), and; thus, the structure of the B-OD is

assumed to be an exact representation of the true OD structure.

However, the penetration rate of Bluetooth OD flows might not be same for all

OD pairs in reality. Thus, the near-ideal scenario was designed to include randomness

in the penetration rate of B-OD flows. To do this, 20% of total trajectories were

randomly selected to represent Bluetooth trips (trajectories), which were further used

to construct an observed B-OD matrix of random structure. Despite introducing

randomness in the B-OD flows, this approach is referred to as near-ideal because it is

assumed that the Bluetooth trip ends are the actual origins and destinations (in other

words it is assumed that Bluetooth trajectories infer complete paths traversed by

vehicles), and the penetration rate of Bluetooth trajectories is assumed to be known

(i.e. 20%).

Both scenarios were tested for different percentage connectivity (Ω) of OD pairs

with Bluetooth. However, the ideal case scenario was meant to be the proof of the


concept; it was experimented for Ω=25%, 50%, 75% and 100%, and tested for one

random prior OD demand only. On the other hand, the near-ideal case scenario was

tested for Ω=20%, 40%, 60%, 80%, and 100%, and based on three random prior OD

demands.

4.1.2 B-SP structure-based method (or B-SP method)

The formulation of the B-SP method was similar to the B-OD method; however,

with the difference being that it incorporates the structural knowledge directly from

Bluetooth subpath flows (B-SP) and not from B-OD. The underlying concept behind

the B-SP method formulation is that the actual observations of Bluetooth trajectories

might not be the complete representation of trips, and trip ends might not be the actual

ones. In other words, the Bluetooth paths are only subpaths of actual paths traversed

by vehicles. Thus, exploiting the structure of Bluetooth trips from the perspective of

subpaths is more realistic.

The fundamental difference between B-OD flows and B-SP flows is further

explained with an example network, as shown in Figure 4.1a. The true OD for the

sample network is shown in Figure 4.1c. The path flows per OD pair are shown in the

Table 4.1.

Figure 4.1: Sample network (with installed BMS), paths and OD matrices

Table 4.1: Path flows for example network

O1D1 O1D2 O2D1 O2D2

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

20 30 50 100 75 25 100 150 50 150 150 100


Assuming Ω = 100%, all OD pairs (i.e., O1-D1 to O2-D2), and all paths (i.e.,

P1 to P12) are Bluetooth connected. This implies that the path connecting any OD pair

can be completely represented as a sequence of Bluetooth scanners. For instance, the

path P1 of O1-D1 (see Figure 4.1b) can be represented as Bv1–B5-Bv3. Here, Bv1 and

Bv3 indicate virtual Bluetooth scanners that directly connect to zonal centroids, and

thus are essential for building B-OD matrices. In other words, the B-OD method

depends on the complete sequence of Bluetooth inferred trajectories.

The ideal scenario considers B-OD to be 20% of the true OD and is shown in

Figure 4.1d, and the structures of true OD and B-OD (ideal scenario) are the same. The

B-OD for the near-ideal scenario (Figure 4.1e) is developed by randomly selecting

20% of total trajectories; that is, randomly selecting 200 out of 1,000 trips (note that

the sum of all OD flows in true OD matrix is 1,000). The random selection ensures the

structure of B-OD is random and differs from that of true OD.

To explain the concept of B-SP flows, assume that the virtual scanners; that is,

Bv1, Bv2, Bv3 and Bv4 are not present, and the scanner B5 is either unavailable or not-

working. In such situations, the Bluetooth trajectories are not the complete

representation of actual trips, and as such, they can only provide trip information at the

subpath level. For instance, in Figure 4.1a, trips through paths P1, P4, P7, and P10 are

not available (due to unavailability of B5 and the virtual scanners), and the available

subpaths are only B1-B3-B4, and B1-B2-B4.

The complete paths that pass through subpaths B1-B3-B4 are P2, P5, P8, and

P11. Similarly, the paths consisting of subpaths B1-B2-B4 are P3, P6, P9, and P12.

The true subpath flows for B1-B3-B4 and B1-B2-B4 are shown in Table 4.2. Thus, the

total subpath flows are (30+75+150+150) + (50+25+50+100) = 630.

The experiments related to B-SP method (see section 4.5.3) are being conducted

for different penetration rates of Bluetooth trajectories that can be explained with the

help of the same example as follows. In this example, a random selection of η = 10%

means 63 out of 630 sub-trajectories (since each sub-trajectory corresponds to one unit

of subpath flow value) are selected randomly, and let’s say, it yielded B-SP flows of

20 and 43 for subpaths B1-B3-B4 and B1-B2-B4, respectively. Since it is a random

selection, the penetration rate of subpath (B-SP) flows for individual subpaths is also

random. For instance, the penetration of B-SP flows is 5% and 19% for flows in


subpaths B1-B3-B4 and B1-B2-B4, respectively, and are different from the overall η

=10% (see 3rd column of Table 4.2).

Table 4.2: Demonstrating the difference between true and Bluetooth subpath flows for the given

example

Subpaths True subpath flows B-SP flows

B1-B3-B4 30+75+150+150 = 405 20 (5% of 405)

B1-B2-B4 50+25+50+100 = 225 43 (19% of 225)

4.2 STUDY NETWORK AND DATA

To test the proposed methodology, the study network should have the following

properties:

1. It should be realistic and representative of the existing infrastructure;

2. It should have sufficient route choice options;

3. It should have a combination of at least two different types of road hierarchy

i.e. motorway and arterial;

4. OD pairs should have sufficient overlap between the paths;

5. It should have sufficient Bluetooth connectivity; and

6. Loop detectors to be located on major paths.

The analysis for this study was performed in Aimsun Next (2019), traffic

simulation controlled environment. A synthetic Brisbane city network was built from

the open street map imported into Aimsun Next (Figure 4.2a) that comprised 15

centroids, 24 loop detectors (red squares in Figure 4.2a), and 51 Bluetooth scanners

(blue circles in Figure 4.2a). The loop detectors are placed on the major roadways such

as Pacific Motorway, Clem Jones Tunnel, Coronation Drive, Inner City Bypass, and

Kelvin Grove Road etc. The OD matrix was designed at a zonal level equivalent to

Statistical Area 2 (SA2) (ASGS, 2017) and was 15 x 15 in size. Internal trips were

excluded in the analysis; thus, the total number of OD pairs considered was 15*15-15

=210.


(a)

(b)

Figure 4.2: (a) Study site installed with Bluetooth scanners and loop detectors (b) spatial structure of Brisbane City core network

Figure 4.2b shows the spatial structure of Brisbane City, its neighbouring

suburbs, and the primary transport network. The 15 zonal centroids shown are: 1) West


End-South Bank-Highgate Hill; 2) Gabba; 3) Brisbane (BNE) Inner East; 4) New

Farm; 5) Fortitude Valley; 6) Spring Hill; 7) Central Business District (CBD); 8)

Newstead-Bowen Hills; 9) Kelvin Grove–Herston; 10) Red Hill–Milton–

Auchenflower; and five external zonal centroids; that is, 11) Ext-1, 12) Ext-2, 13) Ext-

3, 14) Ext-4, and 15) Ext-5, respectively.

To check the efficiency of the proposed methods, the OD matrix estimates

resulting from these methods were compared with those of Xtrue, using RMSE

(Equation (47)), StrOD (Equation (48)), and GSSI (Equation (49)), as described below:

(47)

(48)

(49)

Where, and are the means of the OD vectors X and Xtrue; and

are the OD flows from th geographical window of X and Xtrue ; and , and

and are the mean and variances of and , respectively. See the

notations section for information about the other terms.

Because GSSI depends on the knowledge of higher zonal level OD pairs, the 15

statistical zones shown in the Figure 4.2a were further classified into higher level zones

based on their geographical proximity (see Figure 4.2b). The OD matrix that is split

into geographical windows is illustrated in Figure 4.3 (refer Section 3.3 for further

details about geographical window concept).


Figure 4.3: Splitting the study OD matrix into geographical windows

The study network was loaded with a true OD vector (Xtrue) and the link flows

that thus resulted from Xtrue were the observed link flows ( l) at lth link. The total link

flows from the selected 24 links are represented by vector = [ 1... l… L]. Note that

the analysis conducted in this chapter assumed no errors in observed link flows. The

prior OD matrix considered for both methods was generated using Equation (50).

Xprior = (50)

By generating three replications of Xprior from Equation (50), three random prior

OD matrices were generated, namely Xprior1, Xprior2, and Xprior3. The RMSE, StrOD,

and GSSI values of the three Prior ODs as compared to the Xtrue are shown in Table

4.3.

Table 4.3: Comparison of Xprior with Xtrue for all three replications

Replications RMSE

(Xprior, Xtrue)

StrOD

(Xprior, Xtrue)

GSSI (Xprior1, Xtrue)

Replication-1 14.02 0.8142 0.7248 Replication-2 13.24 0.8178 0.7406 Replication-3 12.34 0.7964 0.7297

4.2.1 Development of observed B-OD flows ( )

The B-OD method depends on observations of Bluetooth OD (B-OD) flows

(represented by ). Because this method is further categorised into ideal and near-

ideal, the way is generated is different for both scenarios. Equation (51) represents

the way is generated for the ideal scenario of the B-OD method.

Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 Z10 Z11 Z12 Z13 Z14 Z15WestEnd-SouthBank-Highgate Hill Z1Ext-5 Z2Gabba Z3BNE Inner East Z4New Farm Z5Valley Z6Spring Hill Z7CBD Z8Newstead-Bowen Hills Z9Ext-2 Z10Ext-4 Z11Ext-1 Z12Kelvin Grove-Herston Z13RedHill-Milton-Auchenflower Z14Ext-3 Z15

HZ1 HZ2 HZ3 HZ4 HZ5 HZ6

HZ6

HZ5

HZ4

HZ3

HZ2

HZ1


= (51)

The true OD flows for OD pairs that are Bluetooth connected are represented by

. Thus, for 100% connectivity = Xtrue. From Equation (51), it can be said

that StrOD ( , ) = 1 for the ideal scenario. However, for the near-ideal scenario,

the approach described in Figure 4.4 is adopted to generate random B-OD flows. After

averaging over 100 replications, the StrOD ( , Xtrue) was observed to be 0.8778.

Figure 4.4: Generation of for the near-ideal scenario of the B-OD method

4.2.2 Development of observed B-SP flows ( )

For the B-SP method, the B-SP flows ( ) are used as observations in addition to

the observed link counts (Y) in the objective function. Since this method tests the

effectiveness of Bluetooth SP-based structural information, the observed B-SP flows

are generated for different penetration rates ( ) of Bluetooth sub-trajectories, as shown

in the flowchart depicted in Figure 4.5. The vector corresponding to a “ ”

penetration rate is produced by averaging over I=5 replications.

Replication-100

Replication-i

Bluetooth trajectories

Aimsun modelXtrue BMS data

Average B-OD flows ( )

Start

Replication-1

B-OD flows ( )

B-OD flows ( )

B-OD flows ( )

Stop

Random selection of Ƞ=20% trajectories

Development of OD matrices


Figure 4.5: Generation of for the B-SP method

The development of B-SP vector ( ) is explained as follows:

First, assign Xtrue in the study network model in Aimsun next. The resulting

trajectories are stored as a complete sequence of scanner IDs. The first and

last scanner IDs of the complete trajectory are directly linked to the actual

origin and destination zones of the complete trip. The resulting number of

trajectories in this study was 5,273.

Convert the trajectories to sub-trajectories by de-selecting the first and last

scanner ID from the complete trajectory sequence (this is done because the

actual Bluetooth trajectories do not always begin with and end into true trip

ends). The resulting number of sub-trajectories after this process was 3,875

for this study.

Now, randomly select ƞ % of the sub-trajectories. The sub-trajectories with

the same sequence of BMS IDs are identified as a Bluetooth subpath, and

their total number refers to the subpath flows. The vectors of such subpath

flows, averaged over 5 replications, forms B-SP flows vector ( ).

Replication-I

Replication-i

Bluetooth trajectories

Aimsun modelXtrue BMS data

Average B-SP flows ( )

Start

Trimming the ends of trajectories to form

sub-trajectories

Replication-1

B-SP flows ( )

B-SP flows ( )

B-SP flows ( )

Stop

Random selection of Ƞ%sub-trajectories


4.3 BI-LEVEL FRAMEWORK: MATLAB - AIMSUN

INTEGRATION

The OD matrix estimation algorithms for both the B-OD method and B-SP

method were based on a bi-level framework where the objective function was

minimised in the upper level and user-equilibrium assignment in the lower level. The

codes for optimisation were written in MATLAB (2017 version), and Aimsun next

(2019) was used to run the microscopic simulation. The default parameter values were

used for both demand scenarios and experiments in Aimsun Next. A Python script,

Autorun.py (see Appendix E) was written to integrate the optimisation model (in

MATLAB) with the traffic assignment (in Aimsun next). However, MATLAB is the

primary platform that writes OD data into Aimsun next OD format, runs the

simulation, executes the Python script, and reads the simulation outputs for further

optimisation process. The integration of both platforms is further shown in Figure 4.6.

Figure 4.6: MATLAB-Aimsun integration framework

4.4 B-OD METHOD: OD MATRIX ESTIMATION USING B-OD

STRUCTURE

This section details the approach adopted for the B-OD method, along with the

design of the experiments. The B-OD method differs from traditional traffic counts-

based approach from the way objective function is modified using additional structural

Aimsun simulator

OD matrix in Aimsun

formatAimsun.m

Start

OD estimation algorithm coded

in Matlab

Is convergence achieved? End

Python script

Simulation outputs

Link flows &Assignment

matrix

Yes

No


knowledge from B-OD matrix (see Figure 4.7 for the comparison between traditional

method and proposed B-OD method).

Figure 4.7: (a) Traditional link counts-based method vs (b) proposed B-OD method

The basic difference between the two flow charts lies in the two additional input

boxes; that is, the and IX (represents random Ω% of OD pairs that are Bluetooth

connected), and the objective function formulation.

4.4.1 Objective function formulation

This study proposes an approach to integrate the observed B-OD structural

information into the traditional formulation, as shown in Equation (52).

Aimsun model

Xprior

Startk=1

Simulated link flows ( ) and

assignment ( )

Step length,

End

Is convergence

achieved?

Prior Step length prior

Upd

ate

OD

mat

rix

No

Yes

(a)

Obs. Link flows,

Update step length, based on and

Aimsun model

Xprior

Startk=1

Simulated link flows ( ) and

assignment ( )

Step length,

End

Is convergence

achieved?


Update OD matrix

No

Yes

Bluetooth OD vector,

Bluetooth OD connectivity matrix, Ix

(b)

Obs. Link flows,


Upd

ate

OD

mat

rix


(52)

; (52a)

Where, is the vector of OD flows from Bluetooth observations. The matrix IX

is an incidence matrix that selects only those random Ω% of OD pairs that are

Bluetooth connected. Thus, when it is multiplied with X, the vector X is transformed

to X*. The formulation comprises of two sub-functions: The deviation of user

equilibrium link flows (Y) from the observed flows ( ), and the structural comparison

of the estimated (X*) and observed Bluetooth ( ) OD flows expressed as through

StrOD ( , X*). See Equation (53) for the formulation of StrOD ( , X*).

StrOD( , X*)= (53)

In Equation (53), is a vector of dimensions equal to that of with each cell value

equal to ; similarly corresponds to . Here, the constant c is used to convert a

similarity measure StrOD ( , X*) into a dissimilarity measure i.e. .

The dissimilarity measure acts as a scaling factor to the main objective function i.e.

deviation of traffic counts. The range of values for StrOD ( , X*) lies between -1 and

1, which means this part of the objective function ranges from c+1 to c-1. For to

be stable, the minimum value of c needs to be greater than 1. Assuming c=2, the second

part of the formulation becomes , and this means that when

structures of and are same, , is equal to “1”, reduces to a

traditional link counts deviation; that is, . This implies that

simulated trip distribution matches the actual distribution, and simply minimising

traffic counts deviations should be sufficient to estimate OD. On the other hand, when

the structures of and are extremely opposite, reaches its minimum

value of “-1”, and the objective function multiplies (c+1)2 times; that is,

, if c=2. This implies that deviation between traffic counts has

amplified by c+1=3 times due to extreme variations in the distribution of trips. In other

words, considers any structural differences between the


estimated/simulated and observed trip distribution from the perspective of subpath

flows.

The advantages of the proposed formulation are two-fold. First, the Bluetooth

OD observations are up-to-date. Second, the need for unknown weight factors is

relaxed because StrOD ( , X*) is a normalised value. Note: since Bluetooth

observations are only a fraction, StrOD (which is the structure component of GSSI)

suits well because it captures the structure through correlation coefficient and does not

require Bluetooth penetration rates that are generally unknown.

4.4.2 OD matrix estimation algorithm

This study adopted the gradient descent algorithm proposed by Spiess (1990) for

the OD matrix estimation. The property of this algorithm is that it always has the

direction of the largest yield with the goal of minimising the objective function. It was

coded in MATLAB and run for different experiments of the B-OD method which are

further discussed in Sections 4.4.3.

The gradient descent optimisation method is based on two major factors: search

direction and step-size ( ). One of the ways to arrive at the search direction is by

computing the gradient of the objective function at the current solution. On the other

hand, the step size ( ) parameter determines the number of iterations required for the

convergence. Lower step length values ensure that the path of the gradient is smooth

but computationally expensive. On the other hand, higher values of step length can

lead to higher values of the objective function, and the convergence could be affected.

Thus, both search direction and step-size play a crucial role in the gradient descent

optimisation. Regarding the proposed objective function formulation, the search

direction and step-sizes are further discussed in detail, as follows.

4.4.2.1 Search direction

The gradient of the objective function is computed using Equations (54) and its

subsets.

(54)


(54a)

The derivation of the StrOD ( , IXX) with respect to X is further explained as

follows:

First, the formulation is simplified into Equation (55), and the

derivative of StrOD ( , IXX) with respect to X is given by Equation (55)a.

(55)

(55a)

4.4.2.2 Step-size

After determining the search direction, the step size needs to be defined for

updating OD matrix, Xk for the next iteration, “k+1” i.e. Xk+1. The updating step is

performed using Equation (56). Here, Z1 and X in refer to the values

corresponding to iteration k.

= (56)

(56a)

Because OD flows are always non-negative, and optimum can be derived by

solving Equation (57) subject to the constraint shown in Equation (56)a.

(57)

However, in the current study, the bold-driver technique is proposed to adapt the

value of step-size ( ) to the value of objective function. This technique is commonly

used in annealing the learning rate (Battiti, 1989; Vogl, Mangis, Rigler, Zink, & Alkon,

1988). According to this approach, a prior value of step-size is chosen that is

modified in every iteration based on the value of the objective function in consecutive

iterations. For instance, if the value of the objective function in the (k-1)th step is less


than that of value in the current step, k (i.e., Z1(k-1) < Z1(k)) then = .

Otherwise, reset the optimisation parameters (i.e., Xk) to that of (k-1)th iteration (i.e.,

Xk=Xk-1) and set = * . The values of and are generally chosen

based on the examination of convergence.

For the experiments discussed in this chapter, the values of were chosen as

either 1.25 or 1.5, and were tested for values from 0.7, 0.8 and 0.9, respectively.

In the current study, the maximum number of iterations was predetermined to be 20,

which represents the termination criteria for the optimisation problem (in the past,

researchers such as Bullejos, et al. (2014) also conducted convergence for 20

iterations).

The sequence of steps for the proposed B-OD method are explained below:

Step 1: Choose prior OD demand (Xprior), observed B-OD flows, , and link

flows, .

Step 2: Set k=1; Xk = Xprior and = .

Step 3: Run Aimsun_matrix.m function (refer Function 2, Appendix D) in

MATLAB that converts Xk to OD in Aimsun format (say, ) and then

loads the study network in Aimsun with demand followed by

dynamic user equilibrium (DUE) assignment. After executing Aimsun.m (see

Function 3, Appendix D), the following simulation outputs are obtained: a)

SQLITE database of link flows (see Function 4, Appendix D); and b)

assignment matrix text file (see Function 6, Appendix D).

Step 4: Aggregate the link flows (Yk) from the SQLITE database for one hour

(note that the OD matrix input is also for one hour).

Step 5: For the Bluetooth connectivity rate of Ω%, estimate .

Step 6: Compute Z1 using Equation (52), and calculate the gradient of Z1

using Equation (54).

Step 7: For k>1, if Z1(k) <= Z1(k-1) then = and go to Step 8; else

= ,and set Xk=Xk-1 and GOTO Step 3.

Step 8: k=k+1; update the demand (Xk) for the next iteration using Equation

(56).


Step 9: Check for termination criteria, and if it is not met, go to Step 3. Else

terminate the optimisation and value of Xk is the final estimated OD matrix

(Xest).

Step 10: Check the quality of estimated OD matrix, Xest with Xtrue using

RMSE, StrOD and GSSI.

4.4.3 Experiments – ideal and near-ideal scenarios of B-OD method

The experiments for the B-OD method were divided into ideal and near-ideal

scenarios and compared with the traditional traffic counts-based formulation, as

discussed below:

Traditional case: Here, the deviation between the observed and user-equilibrium

link flows were minimised.

The ideal scenario of the B-OD method was further tested for four different cases

(Case-1 to Case-4) where the structural comparison of the B-OD flows with OD matrix

estimates was deployed in the formulation for different values of Ω. The four different

cases were:

B-OD-ideal Case-1: Here, Ω was 25%. Thus, 53 OD pairs were randomly

selected to provide B-OD structural knowledge.





B-OD-ideal Case-4: Here, Ω was 100%. Thus, all 210 OD pairs contributed

towards B-OD structural information.

The experiments for the ideal scenario were tested for only one prior OD matrix;

that is, Xprior1.

The near-ideal scenario of B-OD method was further divided into five different

cases (Case-1 to Case-5) for different values of Ω.

B-OD-near-ideal, case-1: Here, Ω was 20%. Thus, the structural information

from 42 random OD pairs were only selected.


B-OD-near-ideal, case-2: Here, Ω was 40%. Thus, only 84 OD pairs were

randomly selected.


randomly selected.


randomly selected.

B-OD-near-ideal, case-5: Here, Ω was 100%. Thus, all 210 OD pairs were

selected.

The experiments for the near-ideal scenario were tested for three prior OD

matrices: Xprior1, Xprior2, and Xprior3.

4.4.4 Results for the ideal scenario of B-OD method

In this section, the estimated OD matrices from all four cases of the ideal

scenario of B-OD method, traditional method, and prior OD (Xprior1) are compared

using performance measures – RMSE, StrOD, and GSSI, as further discussed in the

following sections.

4.4.4.1 RMSE results

Figure 4.8 shows that the RMSE results of the ideal-scenario cases were better

than those of Xprior (14.02) and the traditional approach (12.34). The percentage

improvement with respect to Xprior (as shown in Figure 4.9) was 11.98% for the

traditional method. On the other hand, the improvement increased from 20.11% to

31.38% as Ω increased from 25% to 100%, respectively for the ideal scenario cases.

Figure 4.8: RMSE w.r.t. Xtrue for the traditional and ideal scenario cases of the B-OD method

14.02

12.34

11.210.41

9.83 9.62

9

10

11

12

13

14

15

Prior Tradtional Ω=25% Ω=50% Ω=75% Ω=100%

RMSE

Experiments


Figure 4.9: Percentage of improvement in RMSE w.r.t. Xprior for traditional and ideal scenario cases of the B-OD method

Compared to the traditional method, Figure 4.10 illustrates that the results for

the cases of the ideal scenario showed improvement in RMSE from 9.24% to 22.04%

as Ω increased from 25% to 100%, respectively.

Figure 4.10: Percentage of improvement in RMSE w.r.t. traditional method for ideal scenario case of B-OD method

4.4.4.2 StrOD results

On the other hand, the results based on the StrOD measure demonstrated the

level of structural consistency maintained within the OD matrix estimates for the

traditional, as well as ideal scenario experiments. Figure 4.11 shows that the StrOD

(Xprior, Xtrue) was 0.8142. Although the RMSE results showed improvement (11.98%)

in the previous section, the quality (in terms of structure) of the OD matrix estimated

from the traditional method did not show any improvement. Instead, there was a

decrease in the value from 0.8142 to 0.8107 in Figure 4.11. This could be attributed to

the fact that the traditional link counts-based method is highly under-specified, and as

such, although there was improvement in the RMSE, the structure of the matrix could

not be improved. The percentage degradation of the OD matrix structure was 0.43%

for the traditional method (see Figure 4.12).

11.98

20.11

25.75

29.8931.38

10.00

15.00

20.00

25.00

30.00

35.00

Tradtional Ω=25% Ω=50% Ω=75% Ω=100%

% im

prov

emen

t in

RMSE

w.r.

t. Pr

ior O

D

Experiments

9.24

15.64

20.3422.04

5.00

10.00

15.00

20.00

25.00

Ω=25% Ω=50% Ω=75% Ω=100%% im

prov

emen

t in

RM

SE w

.r.t.

Trad

tiona

l OD

Experiments


On the other hand, the results showed an increase in the quality of OD estimates

as Ω increased from 25% to 100% for the ideal scenario cases from 0.8440 to 0.8880,

respectively (see Figure 4.11). The percentage improvement in the structure of the OD

matrix with respect to Xprior (except the traditional case), and with respect to traditional

method are shown in Figure 4.12 and Figure 4.13, respectively.

Figure 4.11: StrOD w.r.t. Xtrue for the traditional and ideal scenario cases of the B-OD method

Figure 4.12: Percentage of improvement in the StrOD w.r.t. Xprior for the traditional and ideal scenario cases of the B-OD method

Figure 4.13: Percentage of improvement in the StrOD w.r.t. traditional method for the ideal scenario cases of the B-OD method

0.8142 0.8107

0.844

0.87160.8805

0.888

0.80.810.820.830.840.850.860.870.880.890.9


StrO

D

Experiments

-0.43

3.66

7.058.14

9.06

-1.00

1.00

3.00

5.00

7.00

9.00


% im

prov

emen

t in

StrO

D w.

r.t.

Prio

r OD

Experiments

4.11

7.51

8.619.53

2.00

4.00

6.00

8.00

10.00

Ω=25% Ω=50% Ω=75% Ω=100%

% im

prov

emen

t in

StrO

D w

.r.t.

Trad

tiona

l OD

Experiments


4.4.4.3 GSSI results

Figure 4.14 demonstrates that GSSI (Xprior, Xtrue) was 0.7248, which improved

to 0.7556 (with 4.25% improvement in Figure 4.15) using the traditional method. The

value of GSSI and its percentage improvement was even better for other cases of the

ideal scenario, as shown in Figure 4.14 and Figure 4.15, respectively. Figure 4.16

shows that the ideal scenario cases outperformed the traditional method with a higher

percentage improvement of 10.35% for the Ω=100% scenario.

Figure 4.14: GSSI w.r.t. Xtrue for the traditional and ideal scenario cases of the B-OD method

Figure 4.15: Percentage of improvement in the GSSI w.r.t. Xprior for the traditional and ideal scenario cases of the B-OD method

Figure 4.16: Percentage of improvement in the GSSI w.r.t. traditional method for the ideal scenario cases of the B-OD method

0.7248

0.7556

0.8026

0.8248 0.8269 0.8338

0.7

0.72

0.74

0.76

0.78

0.8

0.82

0.84


MG

eoSS

IM

Experiments

4.25

10.73

13.80 14.0915.04

4.00

6.00

8.00

10.00

12.00

14.00

16.00


% im

prov

emen

t in M

GeoS

SIM

w.

r.t. P

rior O

D

Experiments

6.22

9.169.44

10.35

6.00

7.00

8.00

9.00

10.00

11.00

Ω=25% Ω=50% Ω=75% Ω=100%

% im

prov

emen

t in

MGe

oSSI

M

w.r.t

. Tra

dtio

nal O

D

Experiments


4.4.5 Results for the near-ideal scenario of B-OD method

This section presents the comparison of the estimated OD matrices from all five

cases of the near-ideal scenario of the B-OD method, traditional method, and the Xprior

made using the performance measures of RMSE, StrOD, and GSSI. These experiments

were conducted for three different replications of Xprior; that is, Xprior1, Xprior2, and

Xprior3, as discussed in Table 4.3. Similar to the ideal scenario observations, the near-

ideal scenario experiments also showed improvement with respect to the traditional

method. The performance measures of RMSE, StrOD, and GSSI discussed below also

demonstrate this.


For experiments with Xprior1, RMSE was reduced from 14.02(prior) to

10.71(Ω=100%) (see Figure 4.17). Similarly, the results for the experiments initiated

with Xprior2 and Xprior3 also showed improvement.

Figure 4.17: RMSE results w.r.t. Xtrue- Near-ideal, B-OD method

The percentage improvement in RMSE with respect to prior OD matrices Xprior1,

Xprior2, and Xprior3 are illustrated in Figure 4.18 for the traditional method and all near-

ideal experiments. The percentage improvements for Ω=20% (13.34, 12.46, and 6.23)

and Ω=40% (13.41, 18.28, and 8.17) were better than the improvement for the

traditional method (11.98, 10.88, and 3.88). However, a significant increase in

improvement was observed at Ω=60% (18.90, 21.22, and 11.49), and continued to be

relatively stable for Ω=80% (22.40, 21.37, and 13.03) and Ω=100% (23.61, 22.05, and

13.03), respectively.

Prior Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 14.02 12.34 12.15 12.14 11.37 10.88 10.71Xprior2 13.24 11.8 11.59 10.82 10.43 10.41 10.32Xprior3 12.36 11.88 11.59 11.35 10.94 10.75 10.75

10

11

12

13

14

15

RMSE

( X ,

X tru

e)


Figure 4.18: The percentage of improvement in the RMSE w.r.t. Xprior for near-ideal B-OD method

The percentage improvements for RMSE with respect to the traditional method

are shown in Figure 4.19 for all near-ideal cases. The percentage improvement in

RMSE was significant at Ω=60% (for Xprior1 and Xprior3) and 40% (for Xprior2),

respectively.

Figure 4.19: The percentage of improvement in the RMSE w.r.t. traditional method for the near-ideal B-OD method


For experiments with Xprior1, the StrOD value improved from 0.8142 to 0.8604

(see Figure 4.20). Similarly, the results for experiments initiated with Xprior2 and Xprior3

also witnessed improvements.

Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 11.98 13.34 13.41 18.90 22.40 23.61Xprior2 10.88 12.46 18.28 21.22 21.37 22.05Xprior3 3.88 6.23 8.17 11.49 13.03 13.03

2.00

7.00

12.00

17.00

22.00

% im

prov

emen

t in

RMSE

w.r.

t. Pr

ior O

D

Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 1.54 1.62 7.86 11.83 13.21Xprior2 1.78 8.31 11.61 11.78 12.54Xprior3 2.44 4.46 7.91 9.51 9.51

0.00

3.00

6.00

9.00

12.00

15.00

% im

prov

emen

t in

RMSE

w.r.

t. Tr

adtio

nal m

ethod


Figure 4.20: StrOD results w.r.t. Xtrue- near-ideal B-OD method

The StrOD plots in Figure 4.21 showed a sudden rise in the structural

improvements at Ω=60% for Xprior1 (3.97) and Xprior3 (5.04), and at Ω=40% for Xprior2

(3.91), respectively.

Figure 4.21: The percentage of improvement in the StrOD w.r.t. Xprior for the near-ideal B-OD method

With respect to the results of traditional method, the percentage improvements

for StrOD are shown in the Figure 4.22 for all near-ideal cases. A significant structural

enhancement was seen for Ω greater than or equal to 60% for Xprior1 and Xprior1, and

Ω greater than or equal to 40% for Xprior2, respectively.

Prior Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 0.8142 0.8107 0.8219 0.824 0.8465 0.8567 0.8604Xprior2 0.8178 0.8196 0.8302 0.8498 0.8599 0.8608 0.863Xprior3 0.7964 0.8054 0.8172 0.8244 0.8365 0.8413 0.842

0.78

0.8

0.82

0.84

0.86

0.88

StrO

D ( X

,X t

rue)

Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 -0.43 0.95 1.20 3.97 5.22 5.67Xprior2 0.22 1.52 3.91 5.15 5.26 5.53Xprior3 1.13 2.61 3.52 5.04 5.64 5.73

-0.50

1.00

2.50

4.00

5.50

% im

prov

emen

t in

StrO

D w

.r.t.

Prio

r OD


Figure 4.22: The percentage of improvement in the StrOD w.r.t. traditional method for the near-ideal, B-OD method


Experiments with Xprior1 GSSI (Figure 4.23) improved from 0.7248 to 0.7969

and a similar improvement was observed for experiments with Xprior2 and Xprior3.

Figure 4.23: GSSI results w.r.t. Xtrue- near-ideal B-OD method

The comparison results with Xprior show that the near-ideal experiments

performed better than the traditional method (see Figure 4.24). The improvement was

more significant for Ω>=40% for Xprior2 and for Ω>=60% for Xprior1 and Xprior3,

respectively.


1.00

2.50

4.00

5.50

7.00

% im

prov

emen

t in

StrO

D w.

r.t.

Trad

tiona

l meth

od

Prior Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%

Xprior1 0.7248 0.7545 0.7698 0.7729 0.7847 0.7946 0.7969Xprior2 0.7406 0.7705 0.7781 0.794 0.7946 0.7998 0.8033Xprior3 0.7297 0.7536 0.7615 0.7692 0.7820 0.7829 0.7842

0.72

0.74

0.76

0.78

0.8

GSS

I( X

,Xtr

ue)


Figure 4.24: The percentage of improvement in the GSSI w.r.t. Xprior for the near-ideal B-OD method

The percentage improvements for GSSI for the traditional method are shown in

Figure 4.25. The results illustrate that the rate of improvement increased from Ω=40%

for Xprior1 and Xprior2, and was almost stable after Ω=60% for Xprior3 respectively.

Figure 4.25: The percentage of improvement in the GSSI w.r.t. traditional method for the near-ideal B-OD method

4.4.6 Discussion

The experiments conducted for the ideal and near-ideal scenarios of the B-OD

method demonstrate the ability of Bluetooth trips (in the form of B-OD structure) to

improve the quality of OD estimates.

Although the ideal scenarios showed significant improvement, it is unlikely that

the structure of the observed B-OD ( ) are error free. As a remedy for this, the near-

Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%

Xprior1 4.25 6.21 6.64 8.26 9.63 9.95Xprior2 4.04 5.06 7.21 7.29 7.99 8.47Xprior3 3.28 4.36 5.41 7.17 7.29 7.47

3.00

5.00

7.00

9.00

% im

prov

emen

t in

GSS

I w.

r.t. P

rior O

D


0.00

2.00

4.00

6.00

% im

prov

emen

t in

GSS

I w.

r.t. t

radi

tiona

l m

etho

d


ideal experiments were conducted based on random B-OD structure. Despite

introducing the randomness, the near-ideal scenarios demonstrated significant

improvement in the quality of OD estimates as the percentage of Bluetooth

connectivity increased from Ω=20% to Ω=100%. For instance, RMSE reduced from

14.02 to 9.62 for Ω=100% in the ideal scenario (see Figure 4.8), and the random

structure of performed fairly well by reducing the RMSE. For instance, RMSE

improved from 14.02 to 10.71 for Ω=100% in the near-ideal scenario for Xprior1 (see

Figure 4.17).

Moreover, the randomness in reduced the value of the StrOD ( , Xtrue) from

the value of 1 (ideal scenario) to 0.8778 (near-ideal scenario). Thus, the maximum

StrOD value of the estimated OD matrices could only reach up to 0.8778 in the near-

ideal and for 1 in the ideal scenarios, respectively. For the same reason, the StrOD

values for Ω=100% could not be improved beyond 0.8778 (for instance, StrOD=

0.8604, 0.863, and 0.8420 for Ω=100% for the experiments based on Xprior1, Xprior2,

and Xprior3 respectively), while on the other hand, the StrOD reached 0.8880 for the

ideal scenario (Figure 4.11). Similar improvements were observed for GSSI as well.

The percentage improvement in the RMSE, StrOD, and GSSI was higher for

Ω>=50% in the ideal scenario. Similarly, for the near-ideal experiments, sudden

improvements were observed at Ω=40% and Ω=60% among the three different

replications. This shows that even though the Bluetooth observations were random, a

significant improvement in the quality of OD estimates could be achieved for a

Bluetooth connectivity rate of between 40%-60%. In other words, the overall quality

of OD matrix estimate could be significantly improved if at least 84 to 126 OD pairs

out of a total 210 OD pairs were randomly connected by Bluetooth sensors.

4.5 B-SP METHOD: OD MATRIX ESTIMATION USING B-SP

STRUCTURE

This section discusses the development of the B-SP method followed by the set

of experiments and results to demonstrate its efficiency compared to the traditional

method (Figure 4.7a). The proposed B-SP method incorporated the observed subpath

flows in the form of the B-SP vector ( ) within the objective function formulation, as

shown in Figure 4.26.


Figure 4.26: Proposed B-SP method

The differences between the B-SP method and B-OD method are outlined below:

While the B-OD method assumes a complete sequence of trajectories, the

B-SP method computes statistics on sub-trajectories that are randomly

selected to develop Bluetooth subpath flows. Because the B-SP method

depends on subpath flows, the percentage connectivity (Ω) of Bluetooth

connected OD pairs is not relevant.

Because the penetration rate of Bluetooth trips is low, random and unknown,

the experiments in B-SP method are performed for different penetration

rates of Bluetooth trajectories (η= 10% to 50%), while the B-OD method is

based on fixed penetration rate of 20%.%. In reality, the penetration rate of

Bluetooth trajectories greater than 20% is very unlikely. However, for

demonstration purposes the maximum value of η is chosen to be 50%. Refer

section 4.7 for lower penetrates rates-based B-SP method.

In B-OD method, the structural consistency in Xk (i.e., X for kth iteration) is

maintained by minimising the structural deviation between X*k and B* in

every iteration. However, in the B-SP method, the structural deviation

between the subpath flows; that is, Pk and is minimised in every iteration,

k. Note that Pk is the result of assigning Xk over the network.

Aimsun model

=Xprior

Startk=1

Est. link flows ( ) ;assignment ( ); path-

proportion matrix ( )

Step length,

End

Is convergence

achieved?


Upd

ate

OD

mat

rix

No

Yes

Bluetooth Subpath flows

vector,

Obs. Link flows,

= *



Third, in the B-OD method, the maximum improvement in the quality of

OD estimates is only limited to the quality of used in the objective

function; that is, StrOD ( , X*true). On the other hand, in the B-SP method,

the maximum improvements in the final OD estimates are controlled by

StrSP ( , Ptrue) for different values of ƞ (as shown in Figure 4.27).

Figure 4.27: StrSP ( , Ptrue) for different ƞ%.

4.5.1 Objective function formulation

The objective function (Equation (58)) is expressed in terms of the deviation

between the observed and estimated link flows and the structural comparison of the

estimated B-SP flows (P) with the observed B-SP flows ( ) expressed as StrSP ( , P).

(58)

; (58a)

StrSP ( , )= (58b)

The matrix in Equation (58)a is the path proportion matrix that maps OD

flows to path flows. In Equation (58)b, is a vector (of dimensions same as ) with

each cell value equal to mean of the vector ; similarly for . The range of StrSP

( , ) values lies between -1 and 1. The stability of this combined formulation can be

explained in a similar fashion to that of the B-OD method. When the structure of

and is the same, StrSP ( , )is equal to “1”, and the objective function reduces to a


traditional link count deviation; that is, . On the other hand, when

StrSP ( , ) reaches its minimum value of “-1”, the objective function becomes

(assuming c=2), a physical interpretation of which was explained in

Section 4.4.2.

4.5.2 OD matrix estimation algorithm

This study also adopted a gradient descent algorithm for the B-SP method. The

search direction was defined by computing the gradient of the objective function, Z2

(Equation (58)) at the current solution. The prior step-size ( ) was chosen as 0.005

and was adjusted for every iteration using the bold-driver technique with parameters

and as 1.05 and 0.9, respectively. The termination criteria for the OD matrix

adjustment was 20 iterations.

The gradient of the objective function, Z2 was computed using Equation (59)

and its subsets.

(59)

(59b)

The derivation of the StrSP ( , APX) with respect to X is further explained as

follows:

First, Equation (59)a is simplified into Equation (60), and the derivative of StrSP

( , APX) with respect to X is given by Equation (60)a.

(60)

(60a)

After determining the search direction (Equation (59)), the updating step is

performed using Equation (61). Here, Z2 and X in refer to the values

corresponding to iteration k.


= (61)

(61a)

4.5.3 Experiments for B-SP method

This section discusses the experiments for the B-SP method designed for

different penetration rates (η) of Bluetooth trajectories, which are then further

compared with the traditional traffic counts-based approach. The experiments for the

B-SP method were divided into the following five cases

B-SP case-1: Here, η = 10% with 5 number of replications. Thus, only 388 out of

3875 sub-trajectories were randomly selected in each replication.

B-SP case-2: Here, η = 20% with 5 number of replications. Thus, only 775 out of

3875 sub-trajectories were randomly selected in each replication.

B-SP case-3: Here, η = 30% with 5 number of replications. Thus, only 1163 out

of 3875 sub-trajectories were randomly selected in each replication.





The experiments for the B-SP method were tested for three prior OD matrices

i.e. Xprior = Xprior1, Xprior2, and Xprior3.

4.5.4 Results for B-SP method

The results (Xest) of all five cases were compared with the OD estimates from

the traditional method and the prior OD (Xprior) using RMSE, StrOD, and GSSI and

are discussed in the following sections.


The plot illustrated in Figure 4.28 shows a gradual improvement in RMSE from

14.02 (prior) to 11.29 (for η=50%) for Xprior1. Similarly, the results for the experiments

initiated with Xprior2 and Xprior3 also showed improvement.


Figure 4.28: RMSE w.r.t. Xtrue ,B-SP experiments

The rate of improvement with respect to Xprior (Figure 4.30) began to rise at η =

10%, and slight improvement was observed for η > 30% for all three prior ODs.

Similar observations were found with respect to the traditional method (Figure 4.31).

Figure 4.29: Percentage of improvement in RMSE w.r.t. Xprior for the traditional and B-SP experiments

Figure 4.30: Percentage of improvement in the RMSE w.r.t. traditional method

Prior Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 14.02 12.34 11.84 11.76 11.68 11.41 11.29Xprior2 13.24 11.80 11.34 11.26 11.16 11.08 10.93Xprior3 12.36 11.88 11.42 11.31 11.30 11.22 11.17

10

11

12

13

14

15

RMSE

Experiments

Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 11.98 15.55 16.12 16.69 18.62 19.47Xprior2 10.88 14.32 14.94 15.71 16.28 17.46Xprior3 3.88 7.58 8.47 8.56 9.25 9.64

3.00

8.00

13.00

18.00

% im

prov

emen

t in

RMSE

w.

r.t. P

rior

OD

ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 4.05 4.70 5.35 7.54 8.51Xprior2 3.87 4.56 5.42 6.07 7.39Xprior3 3.84 4.77 4.87 5.59 5.99

3.00

4.00

5.00

6.00

7.00

8.00

9.00

% im

prov

emen

t in

RMSE

w.

r.t. T

radt

iona

l O

D



The StrOD results in Figure 4.31 show that there was improvement in the quality

of structure as η increased from 10% to 50%. The rate of improvement appeared to be

better than the traditional method, even for a penetration rate of η=10% (Figure 4.32).

However, the next rise in the rate was observed after η>=30%. On The other hand,

some decent improvements were observed with respect to the traditional method, at

η=10%, and it was significant for η>=30% (Figure 4.33).

Figure 4.31: StrOD w.r.t. Xtrue for the prior, traditional, and B-SP experiments

Figure 4.32: Percentage of improvement in the StrOD w.r.t. Xprior for the traditional and B-SP experiments

Figure 4.33: Percentage of improvement in the StrOD w.r.t. traditional method


0.7900

0.8000

0.8100

0.8200

0.8300

0.8400

0.8500St

rOD

Experiments

Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 -0.43 2.03 2.27 3.23 3.35 3.64Xprior2 0.22 2.50 2.60 2.77 3.23 3.54Xprior3 1.28 3.46 3.57 4.13 4.23 4.58

-0.50

0.50

1.50

2.50

3.50

4.50

% im

prov

emen

t in

StrO

D

w.r.t

. Prio

r O

D


2.00

2.50

3.00

3.50

4.00

4.50

5.00

% im

prov

emen

t in

StrO

D

w.r.t

. Tra

ditio

nal

met

hod



The GSSI results in Figure 4.34 also demonstrate a decent improvement as η

increased from 10% to 50%. The results appear to be better than the traditional method,

even for a penetration rate of η=10%. The rate of improvement also seemed to be stable

after η>=30% (see Figure 6.35 and Figure 4.36).

Figure 4.34: GSSI w.r.t. Xtrue for the prior, traditional, and B-SP experiments

Figure 4.35: Percentage of improvement in GSSI w.r.t. Xprior for the traditional and B-SP experiments

Figure 4.36: Percentage of improvement in the GSSI w.r.t. traditional method for B-SP experiments


0.72

0.73

0.74

0.75

0.76

0.77

0.78

0.79

GSS

I

Experiments

Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 4.10 6.18 6.80 7.71 7.73 7.80Xprior2 4.04 5.06 5.78 6.01 6.15 6.19Xprior3 3.28 4.86 5.12 5.91 6.06 6.08

3.00

4.00

5.00

6.00

7.00

8.00

% im

prov

emen

t in

GSS

I w.

r.t. P

rior

OD


0.00

1.00

2.00

3.00

4.00

% im

prov

emen

t in

GSS

I w.

r.t. T

radi

tiona

l m

etho

d


4.5.5 Discussion

The results of the B-SP method indicate that with the increase in the penetration

rate (η) of Bluetooth trajectories, the quality of OD estimates did improve. The

experiments in the B-SP method were more realistic compared to those of the B-OD

method, because the Bluetooth inferred trips were not the complete sequences of trips

(thus, they were termed as subpaths) with the actual trips ends generally being

unobserved. The knowledge about Bluetooth subpath flows seemed to perform better

even for η=10% (i.e., 388 out of 3875 trips). This shows that few samples of Bluetooth

trajectories; for example, through key corridors, such as major arterials and

motorways, should serve the purpose of enhancing the quality of OD estimates for

large scale urban networks. From the results of the experiments, significant

improvement with respect to both prior OD and traditional method was observed for

η >30% from the RMSE and StrOD comparison and for η >=30% the improvement is

almost stable from the GSSI comparisons.

4.6 COMPARISON OF B-OD AND B-SP METHODS

Although the design of the experiments for the B-OD (based on Ω) and B-SP

(based on η) methods were different, they could be compared when Ω=100% for B-

OD methods and η =20% for B-SP methods. The B-OD method at Ω=100% and 20%

trajectory penetration rate implies 1,054 out of 5,273 trajectories (of complete length)

were used. On the other hand, η =20% for B-SP method implies that 775 sub-

trajectories were used. Intuitively, the B-OD method should therefore yield results

better than the B-SP method because B-OD matrices were developed from complete

trajectories.

Comparisons of B-OD-ideal Case-4 (i.e., Ω=100%), B-OD-near-ideal case-5

(i.e., Ω=100%) with B-SP case-2 (i.e., η =20%) for Xprior1 are shown in terms of

RMSE, StrOD, and GSSI in Figure 4.37, Figure 4.38, and Figure 4.39, respectively.

The B-OD-ideal case-4 (i.e., Ω=100%) results were superior to the results from

other methods because the structure of the B-OD was an exact representation of the

true OD. The results from B-OD-near-ideal case-5 (i.e., Ω=100%) are next to that of

the ideal case despite having a random B-OD structure. The results from the B-SP

Case-2 (i.e., η =20%) were next in the order followed by traditional method and prior


OD. The computational time required for both B-OD and the B-SP methods is roughly

around 15 minutes for each experiment.

Figure 4.37: RMSE comparison of the B-OD (ideal, near-ideal) and B-SP methods with prior OD and traditional methods

Figure 4.38: StrOD comparison of B-OD (ideal, near-ideal) and B-SP methods

Figure 4.39: GSSI comparison of the B-OD (ideal, near-ideal) and B-SP methods

14.02

12.34

11.1810.71

9.62

9101112131415

RMSE

Experiments

0.8142 0.8107

0.84910.8604

0.888

0.8

0.82

0.84

0.86

0.88

0.9

StrO

D

Experiments

0.7248

0.7556

0.78410.7969

0.8338

0.7

0.75

0.8

0.85

GSS

I

Experiments


4.7 B-SP METHOD FOR LOWER PENETRATION RATES OF

BLUETOOTH TRAJECTORIES

Since the penetration rate of Bluetooth trajectories (ƞ) is generally very low and

random, the B-SP vector ( ) can have lower flow values for most subpaths. To address

the low sample rate, one of the ways is to generate a B-SP vector by combining B-SP

flows observed from several days of similar travel patterns. For instance, the observed

B-SP flows from “D” regular working Mondays can be used to develop a consolidated

vector of observed B-SP flows (denoted by ) for a typical working Monday.

Thus, consolidating observations from several days with similar travel patterns in

a controlled environment can be achieved through the following step-by-step

procedure (refer Figure 4.40).

Step-1: Develop a database of “n=5” OD matrices that are structurally similar to

each other. One of these matrices would be the base OD matrix (Xtrue) and the rest

are generated by randomly perturbing Xtrue with a standard deviation of 5%; and

set i=1.

Step-2: Load the Aimsun network with ith OD matrix= Xtrue,i and run r=5

replications. This implies that we have n*r = 25 simulations in total. The resulting

trajectories from each replication are stored as a complete sequence of scanner

IDs. The first and last scanner IDs of the complete trajectory are directly linked to

the actual origin and destination zones of the complete trip. The total number of

trajectories are identified to be 5,273 for this study.

Step-4: Convert the trajectories to sub-trajectories by de-selecting a few scanner

IDs from the beginning and ending of the complete trajectory sequence (this is

done because the actual Bluetooth trajectories do not always begin with, and end

into true trip ends). From the resulting number of sub-trajectories identify total

unique subpaths. In this study, the maximum number of unique subpaths identified

is 113.

Step-5: Since the penetration rate (ƞ %) of Bluetooth trajectories is very low, the

B-SP vector for each simulation is generated for a range of ƞ % = 2.5%, 5%, 7.5%

and 10%. To mimic the randomness in real-world scenario, ƞ % Bluetooth

trajectories are randomly selected from the total pool of sub-trajectories (3,875 for

this study) and are used to generate B-SP vector for ith OD matrix and rth


replication ( ) of size 113*1. Note that random selection of ƞ% might not

account all subpaths. This implies some of the subpaths can contain zero flow

values.

Step-6: Combine the subpath flows from all r=5 replications i.e.

and to create a consolidated vector of B-SP flows per ith OD matrix ( ). Each

B-SP vector can be considered as a representation of Bluetooth subpath flows

(ƞ%) from a particular day. In other words, we have database from D=25 different

days with similar traffic patterns for a particular ƞ% .

Step-7: Set i=i+1. If i is less than equal to n=5 then GO TO Step-2. Else GO TO

Step-8.

Step-8: Combine the B-SP flow vectors from each to develop the consolidated

vector of observed B-SP flows ( ) and terminate the simulation.

Step-9: Repeat Step-1 to Step-8 for the rest of ƞ values.

Figure 4.40: Generation of consolidated B-SP flows vector, ( ) for a particular ƞ%

B-SP flows ( ,i)

.

Traj-rRep-r

Aimsun model

Xtrue,i

Start

Stop

i=1

Rep-1 B-SP flows ( ,i)

.

Average B-SP flows per OD type ( )

i<=n

Average B-SP flows of “n” OD types with

“r” replications each ( )

Yes

Traj-1

Subtraj-r

Subtraj-1

Unique subpaths-2

.Unique

subpaths-1.

Select % of sub-trajectories


4.7.1 Experiments for B-SP method (lower penetration rates):

Here, the experiments are conducted using traditional traffic counts-based approach

and four different penetration rates (η) for each of Xprior. Thus, 5 cases for 3 prior OD

scenarios imply 15 experiments in total. The description of 5 cases are as follows:

1. Traditional case: Only observed link flows are used for OD estimation.

2. B-SP case-1: Observed link flows and observed Bluetooth subpath flows at ƞ =

2.5% are used in OD estimation.


5% are used in OD estimation.


7.5% are used in OD estimation.


10% are used in OD estimation.

4.7.2 Results for B-SP method (lower penetration rates)

The quality of the OD estimates (Xest) from all experiments are assessed using the

goodness of fit criteria described in the below sections.


The plot illustrated in Figure 4.41 shows a gradual improvement in

from 14.02 (prior) to 11.34 (for η=10%) for the set of experiments initiated with

Xprior1. Similarly, the results for the experiments initiated with Xprior2 and Xprior3 have

also demonstrated improvement.

Figure 4.41: for all experiments compared with prior OD


Based on the average of results from three prior OD

scenarios, the percent improvements in for all 5 cases are illustrated

in Figure 4.42. We can see that traditional case (based on link flows only) showed

8.89% improvement in RMSE, which increased significantly to 13.41% for ƞ=2.5%

(consolidation of 25 days). The results showed a gradual improvement for rest of the

penetration rates.

Figure 4.42: Average percentage of improvement in w.r.t. Xprior

for all cases


The results shown in Figure 4.43 demonstrate that there is structural

improvement in the OD estimates as η increased from 2.5% to 10%. The Figure 4.43

also highlights that the traditional traffic counts-based approach could not bring any

significant structural improvements in the OD estimates unless additional information

from Bluetooth subpath flows is introduced.

Figure 4.43: for all experiments compared with prior OD

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

Traditional ƞ% = 2.5%, 25 days

ƞ% = 5%, 25 days

ƞ% = 7.5%, 25 days

ƞ% = 10%, 25 days

8.89

13.41 13.87 14.29 15.26

% R

MSE

impr

ovem

ent

Cases


Based on the average of results from three prior OD

scenarios, the percent improvement in for all 5 cases is illustrated

in Figure 4.44. It can be seen that rate of improvement for ƞ =2.5% to ƞ =10% are

better than that of traditional method. The traditional method could achieve only 0.36%

improvement in , which increased drastically to 2.72% for ƞ=2.5%,

and then to 3.09%, 3.15%, 3.64% for ƞ=5%, 7.5%, and 10% respectively.

Figure 4.44: Average percentage improvement in w.r.t. Xprior for

all cases

4.7.3 Discussion

The goodness of fit measurements namely, and

showed significant improvement with respect to both prior OD and traditional method

even at lower penetration rates of Bluetooth trips (i.e. ƞ =2.5% observed from 25 days).

We can also see that the results for ƞ >2.5% were better than ƞ =2.5%. However, in

practice, the chances of ƞ =2.5% is higher than ƞ =10%, and significant improvement

in the results at ƞ =2.5% demonstrated the practical significance of the proposed

methodology. For instance, few samples of Bluetooth trajectories through key

corridors that serve higher traffic demand such as, major arterials and motorways,

should serve the purpose of enhancing the quality of OD estimates for large scale urban

networks.

The trend of improvement in both and is same for

the cases based on Bluetooth subpath flows. However, the traditional method did not

show any significant structural enhancements (see Figure 4.43) although the

measure is improved (see Figure 4.41). This shows that preserving


the structure of OD matrix using additional path-based information from Bluetooth

short trips (which we referred as subpath flows) helps to direct OD convergence

towards a better solution estimate instead of ‘getting stuck’ in the local optima.

4.8 SUMMARY

This chapter presented methods for integrating the structural information about

Bluetooth trips into the objective function of bi-level formulation with the purpose of

improving the quality of OD matrix estimates. To achieve this, the study proposed –

B-OD and B-SP methods. The B-OD method is applicable for networks (such as

Brisbane city) that have a good connectivity of Bluetooth scanners. However, when

the penetration rate of Bluetooth trajectories is low (for instance if the OD needs to be

estimated for a sub-network defined at the outer suburbs of Brisbane city), the B-SP

method is more practical.

The proof of the concept was first tested by assuming that the structure of

Bluetooth trips exactly represented the structure of the true OD through an ideal

scenario of the B-OD method. Having achieved considerable improvements in the

results, randomness was then introduced into the structure of the B-OD flows in the

near-ideal scenario of the B-OD method. The experiments for the near-ideal scenario

of the B-OD method demonstrated significant improvements in the quality of the OD

matrix estimate for different rates (Ω) of Bluetooth connected OD pairs.

The B-SP method was specifically designed to closely represent the realistic

observations of Bluetooth trajectories through the concept of subpath flows. The

experiments for the B-SP method also demonstrated significant improvements in the

quality of OD matrix estimates as measured through RMSE, StrOD, and GSSI. While

the results of the B-SP method were not superior to those of the B-OD methods, it

must be understood that: a) the results of the B-SP method were far better than those

of the traditional method, and b) the B-SP method was more practically applicable to

real Bluetooth observations compared to the B-OD methods.

The results of the experiments among all methods suggest another interesting

finding with respect to the Bluetooth connectivity rate and penetration rate of

Bluetooth trips/trajectories. The ideal scenario suggested 50% and the near-ideal

scenario suggested that 40%-60% of Bluetooth connectivity of OD pairs (with 20%

penetration rate of Bluetooth trajectories) for significant improvement in the results.


The B-SP experiments concluded that even a minimum penetration rate of 10% of

Bluetooth trajectories (subpaths) would result in considerable improvements

compared to the traditional method and further penetration rate of 30% would achieve

greater improvement in the quality of the OD estimates.

The study also showed that the proposed B-SP method is robust for lower sample

rate (i.e. ƞ =2.5%) of random Bluetooth observations from several days of similar

travel patterns. The Brisbane City Council (BCC) and the Department of Transport

Main Roads (TMR) have been recording the Bluetooth observations on a continuous

basis, and it is possible to have the database of traffic observations from several days

representing similar travel patterns (Behara, et al., 2018). Thus, the B-SP method is

ready for practical implementation on real world networks with trajectories and loop

counts database.

Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure

of Bluetooth Trips 136

Chapter 5: Non-Assignment-based OD

Matrix Estimation:

Exploiting Observed

Turning Proportions and

Structure of Bluetooth Trips

This chapter presents the background about the issues related to the assignment-

based OD estimation methods in Section 5.1, a comparison between traditional bi-

level models and the proposed approach is provided in Section 5.2, the study networks

is described in Section 5.3, the underlying concept of possible paths is outlined in

Section 5.4, the OD matrix estimation methodology is discussed in Section 5.5, the

experiments and results for the two networks are described in Section 5.6, and Section

5.7, and finally the summary of the chapter is provided in Section 5.8.

5.1 BACKGROUND

The design of a bi-level OD estimation framework is such that the dependence

on “assignment” has become crucially important. Since both are unknown, the OD

matrix and “assignment” are mutually estimated until convergence to obtain the final

estimate of the OD matrix. The non-separable relationship between both (see Equation

(3)) makes the bi-level problem non-convex and non-differentiable.

While the mapping relationship between link flows and OD flows is well-

established, there are several problems (as discussed in Chapter 1) associated with

respect to the accuracy of the assignment models, and most importantly, the

computational costs associated with the bi-level framework.

Several researchers have proposed alternative methods/heuristics to simplify the

problem’s complexity, especially related to the assignment formulation (either

analytical/simulation). Minimising the use of an assignment matrix has been the prime

focus of most recent studies involved with OD matrix estimation. For example,



Cheung, Wong, and Tong (2006) demonstrated the method of successive averages

(MSA) to approximate a simulated assignment. However, these methods ignore the

discrepancy that exists between a fixed assignment matrix and the updated OD matrix

and its corresponding assignment matrix. Some researchers have proposed an update

to the assignment matrix during the OD matrix estimation iterative process (Yun &

Park, 2005; Zhu, 2007), while others have suggested linearizing the assignment;

however, this requires two simulations per iteration (Lundgren & Peterson, 2008a;

Maher, et al., 2001). Some researchers have suggested one simulation per iteration but

to linearize the assignment using first order Taylor expansion (Toledo & Kolechkina,

2013), while others have suggested a weighted average of previous assignments

(Masip, et al., 2018). Osorio (2019) recently proposed a metamodel to derive the

analytical formulation of the simulated link counts as a function of OD flows so that

gradient-based algorithms could be easily employed. However, there is always a trade-

off between the computational cost and the accuracy of the OD matrix estimates due

to assignment approximations.

On the other hand, integration of different big traffic data sources could

potentially provide more opportunities for a good blend of empirical (data-driven) and

theoretical methodologies such as relaxing the complete dependence on explicit

assignment formulation in an OD optimisation formulation. Probably, as we gain more

confidence on these data sources, non-assignment-based OD estimation methods

might not be far from being achieved. Sprung from one such data driven ideas, this

chapter develops a non-assignment-based method to estimate OD matrices from

observed turning proportions and the structure of Bluetooth OD flows. The complexity

of bi-level optimisation is reduced to single-level formulation in this chapter.

Further discussion about this proposed methodology is outlined in the following

sections.

5.2 OD MATRIX ESTIMATION: TRADITIONAL VERSUS

PROPOSED APPROACH

The flowchart in Figure 5.1 illustrates the proposed non-assignment-based OD

matrix estimation method. This method can be compared with the traditional bi-level

method discussed in Figure 1.6. The major differences between both methods are:



The key idea of the proposed approach is that it considers link flows to be

the proportion of origin flows and not the proportion of OD flows (as used

in the usual traditional models);

The traditional approach is based only on observed link counts, while the

proposed approach depends on observed turning proportions and the

structure of Bluetooth OD flows in addition to observed link counts; and

The traditional approach is a bi-level process where, the OD matrix and

assignment matrix are optimised in the upper level and lower levels,

respectively. However, the proposed approach is independent of the

assignment matrix and is therefore a single-level approach. The mapping

relationship is directly derived from observed turning counts; thus, only the

OD matrix is updated.

Figure 5.1: Non-assignment-based OD matrix estimation methodology

Before providing a detailed explanation of the proposed methodology (Figure

5.1) in Section 5.4.2, the study network and the key difference between the traversed

and possible paths are described in Sections 5.3 and 5.4, respectively.

5.3 STUDY NETWORKS

The proposed methodology was tested on both a toy network with sufficient

route choice options between OD pairs that could represent realistic traffic behaviour

(see Figure 5.2 for the sketch of the network) and a realistic network of Brisbane city

(see Figure 5.3). The details of both networks are discussed in more detail below.

Prior OD flows, Xprior

Startk=1

Est. link flows ( )

Step length,

End

Is convergence

achieved?

Prior Step length, prior

No

Yes

Bluetooth OD vector,

Bluetooth OD connectivity,Ix

OD flows,

Origin flows,

All possible sub-paths

Obs. Turn. proportions

Turning proportion matrix, S

Incidence matrix,

Obs. Link flows,


Upd

ate

OD

mat

rix



5.3.1 Toy network

The number of origins was two ( & ) and the number of destinations was

three ( , & ), with a total of 6 OD pairs (in the order

of , ). There were 23 links in total, but the

number of selected links with installed detectors- d2, d5, d9, d14, d15 and d16, was L = 6.

Figure 5.2 shows the origins, destinations, nodes, links, detector locations, and

observed turning proportions at all intersections.

Figure 5.2: Sketch of the toy network

5.3.2 TMR network

A medium scale network was developed from the Brisbane City road network

that is under the control of Transport Department of Main Roads (TMR, 2016). It had

9 zones, with each acting as both origin and destination. Ignoring the internal trips,

there were 9*9-9=72 OD pairs. Among the nine zones, only two zones corresponding

to the CBD and Garden City were chosen as internal, while the remaining even zones

were chosen as external zones. The network had 12 loop detector count locations and

turning count proportions were observed at all 35 intersections. Because each

intersection was equipped with a BMS unit, there were 35 Bluetooth scanners. The

study network is shown in the Figure 5.3.

To facilitate the computation of GSSI, the OD matrices were further split into

geographical windows using the knowledge of higher-level zones (see the

geographical window concept in 3.3 for further details). The details of the higher-level

zones (hz) that were formed as a combination of lower level zones are: hz1 included



Z1 and Z8; hz2 included Z6 and Z5; hz3 included Z4, Z3 and Z7; and hz4 included Z9

and Z2, respectively.

Figure 5.3: TMR network

5.4 CONCEPT OF POSSIBLE PATHS

The proposed methodology attempted to estimate link flows from the origin

flows and the turning proportion (S) matrix, and not from the OD flows and assignment

matrix. The turning proportions, as the critical construct of the matrix S, were observed

from the sequence of intersections that connected the link count locations with all

origins. In this study, the turning proportions were assumed to be known. The

sequences of intersections from each origin represented the set of all possible paths

among which the paths traversed by vehicles were only a subset (see Section 5.4.2 for

further details).

5.4.1 Possible paths in the toy network

For the toy network shown in Figure 5.2, the total possible paths between all OD

pairs is K=54, and paths traversed by vehicles is =8 (see Figure 5.4 for all eight paths

traversed by vehicles in simulation).



Figure 5.4: Paths traversed by vehicles in simulation

Using the same concept, the paths leading to any link can be categorised as

traversed and possible paths. To demonstrate this, consider the traversed paths (Figure

5.5) and possible paths (Figure 5.6) until link, l14.

Figure 5.5: Traversed paths from all origins until link, l14

Figure 5.6: Possible paths from all origins until link, l14

5.4.2 Possible paths in the TMR network

Although numerous possible paths are feasible from each origin until each of the

counting location (detector), the study has chosen only a few key corridors to minimise

the complexity. The design parameter maximum likely possible paths (MLPP) was



chosen to be a maximum of 8 per OD pair. The Figure 5.7 illustrates the number of

possible paths chosen from each origin zone (Zi) to the detector location.

Figure 5.7: The number of possible paths from all origins until the detector locations of TMR network

5.5 OD MATRIX ESTIMATION METHODOLOGY

The proposed method updated the prior OD matrix by iteratively minimising the

objective function until convergence. The objective function formulation was based

on the deviations of the link flows (observed and estimated) and structural comparison

of the OD flows (observed Bluetooth and estimated), which were expressed as a

function of the OD matrix. The estimated link flows were obtained through the new

mapping relationship based on the turning proportion matrix explained in Section

5.5.1. The structural comparison of OD flows is explained in Section 5.5.2. Finally,

the proposed objective function formulation is described in Section 5.5.3.

5.5.1 Link flows estimation from turning proportion matrix

Turning proportions/probabilities generally refer to the ratio of turning volume

to the approach volume at an intersection. Figure 5.8 represents an isolated intersection

with turning movements and associated proportions. The flows on link l5 are diverted

towards l12, l15 and l11 via, left, through and right movements, respectively. Thus, the

number of turn movements, m=3 and their corresponding turning proportions are

0.076, 0.836 and 0.088, respectively.

2

1

1

4

5

2

3

0

1

2

2

1

1

2

2

2

4

1

2

0

1

1

1

1

3

0

1

3

3

1

5

1

1

4

2

3

0

2

8

3

1

1

3

3

2

2

2

0

1

1

3

0

1

2

2

1

1

2

3

1

3

2

5

0

0

2

1

1

1

3

3

2

3

4

1

2

4

1

1

1

3

1

2

2

3

5

1

3

2

2

2

2

2

2

4

3

2

4

1

5

1

1

1

0

4

0 5 10 15 20 25 30 35 40

D1

D2

D3

D4

D5

D6

D7

D8

D9

D10

D11

D12

Number of possible paths until each detector location

Det

ecto

rs

Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9



Figure 5.8: Schematic representation of an isolated intersection and associated turning proportions

The turning proportion (S) matrix is a tableau representation of the proportions

of origin flows that pass through selected links. It has dimensions of L* and each

cell value is represented by S (l, o) = (see Equation (62)).

For example, (kl,o)th path connects lth link with oth origin, and the total number of

such possible paths are . The turning proportion at intersection present along

(kl,o)th path is denoted by . There are intersections present along the (kl,o)th

path.

(62)

(62a)

The product of the turning proportions along the (kl,o)th path yields the

probability of origin flows passing through (kl,o)th path and observed at lth link, and is

represented by (see Equation (62)a).

Summing up the probabilities along all paths ( ) connecting oth origin to lth

link should give the total probability of trips generated from oth origin observed at lth

link. This total probability with respect to oth origin is represented by as shown in

Equation (62).

The total link flow generated from oth origin is given by multiplying with the

total trips produced from oth origin (i.e., ), as shown in Equation (63), and the total

link flows that result due to the flows produced from all origins is given by Equation

(64).



) (63)

)

(64)

To demonstrate the above formulations, consider Figure 5.6a for all possible

paths from until link l14. Table 5.1 shows the computations of aforementioned

formulations to further estimate link flows ( ) at l14 .

Table 5.1: Demonstration of equation (62) for l14

l1- l2- l14 0.25*1=0.25 l8- l7- l4-l2- l14 0.59*0*0.25*1=0

l1- l3- l5- l12 -l14 0.75*1*0.076*1

=0.057

l8- l7- l5-l12- l14 0.59*1*0.076*1=0.045

l1- l3- l6- l9 -l10 – l12 -l14 0.75*0*1*0*0*1=0 l8- l9- l10-l12- l14 0.41*0*0*1=0

= 0.25+0.057+0=0.307 0+0.045+0=0.045

If the number of trips generated from are =425 and are =490, then

link flows from and until are: = * = 425*0.307=130.48, and =

* =490*0.045=22.05. Thus, total flow on is approximately = +

=130.48+22.05=153. This is exactly equal to the flow observed from simulation.

The proof of this concept is further demonstrated using the network (see Figure

5.9) considered by Bar-Gera et al. (2006) in their study.

Figure 5.9: Sample network used by Bar-Gera et al. (2006)

The paths and path flows for the sample network shown in Figure 5.9 are

described in Table 5.2.

Table 5.2: Paths and path flows for Bar-Gera et al. (2016) network

1

4

2

5

3

6

A

C

B

D



Origin A Origin B Path Path Flow Path Path Flow

[A,1,4,5,6, D] 10 [B,3,2,1,4, C] 6 [A,1,2,3,6, D] 5 [B,3,6,5,4, C] 4

Origin C Origin D Path Path Flow Path Path Flow

[C,4,5,2,3, B] 8 [D,6,5,2,1, A] 12 [C,4,5,6, D] 20 [D,6,5,4, C] 1

From Table 5.2, the total traffic counts observed on link, l5-2 is 8+12=20. The

total possible paths contributing to the flows on link, l5-2 are shown in Table 7.3

Table 5.3: Link flows at link, l5-2 estimated using the proposed approach

Origin Origin flows

Possible paths Product of Origin flows and Turn Proportions along the path until link, l5-2

A 15 A-1-4-5-2 A-1-2-3-6-5-2

15*0.667*0.625*0.211 = 1.316 15*0.333*1*0.385*0.444*0.706 = 0.603

B 10 B-3-6-5-2 B-3-2-1-4-5-2

10*0.4*0.556*0.706 = 1.569 10*0.6*1*0.667*0.625*0.285 = 0.714

C 28 C-4-5-2 28*1*0.211 = 5.895 D 13 D-6-5-2 13*1*0.8 = 10.400 Estimated link flows at link, l5-2 20.497

From Table 5.3, it is clear that the estimated link flows on link l5-2 is 20.497 and

the estimated flows are close to the actual flow values; that is, 20. Thus, the proposed

approach proves valid and is a good alternative to assignment-based estimation of link

flows.

While the proposed turning proportions-based approach seems promising,

numerous paths can add to the complexity of the problem for realistic medium to large

scale networks. In such situations, it is recommended to consider the number of most-

likely possible paths (MLPP) as the design parameter. For instance, considering MLPP

as less than or equal to 10 might be a solution. The TMR network shown in Section

5.3.2 was designed especially to demonstrate that this approach performs well with

MLSP as a design parameter. Note: The analysis conducted in this chapter assumed

no errors in observed turning proportions and link flows.

5.5.2 The structural comparison of OD flows

Another goal of the proposed objective function formulation is to minimise the

structural deviation (or maximise the structural similarity) between the estimated and



observed Bluetooth OD flows expressed using the formulation described in Equation

(53). The concept of incorporating “structural comparison of OD flows” was discussed

in Section 4.4.1.

5.5.3 OD matrix estimation formulation

The OD vector, X is related to origin flows, G through an incidence matrix, I, as

shown in Equation (65).

(65)

For the study network shown in Figure 5.2, the incidence matrix is represented


(66)

In Equation (66), the first and second rows correspond to origins, O1 and O2;

and the columns corresponds to the OD pairs, , ,

respectively. The value of “1” indicates that the OD pair, belongs to

the origin , and is “0” otherwise.

Based on Equation (66), Equation (65) can be shown as Equation (67) for the

study network.

(67)

Where, X1 to X6 are OD flows corresponding to the OD

pairs, , , ; and G1 and G2 are the origin flows

of O1 and O2 of the study network.

The relationship between the link flows vector (Y) and the OD vector, X is

shown in Equation (68). Here, S is the matrix that consists of (see Equation (62)).

For the study experiments (see Section 0), the size of S was 6 x 2; thus, the size of Y

was 6 x 1.

(68)



The formulation of the objective function (Z3) consisting of the deviations of

link flows and the structural comparison of OD flows is shown in Equation (69).

(69)

The notations and the stability of this combined formulation are explained in

Section 4.4.1 and the value of c is assumed to be 2 for entire analysis discussed in this

chapter.

5.5.3.1 Gradient (search direction) computation

The gradient descent algorithm explained in Section 4.4.2 was adopted for this

method. The search direction was defined by computing the gradient of the objective

function, (Z3) with respect to OD vector (X) as explained through equations (70)a-

(70)b. The derivative of with respect to X is already explained in

Equation (55).

(70)

(70a)

Where, (70b)

5.5.3.2 Updating step of OD matrix

The OD matrix X is updated using the formulation shown in Equation (71). Here,

and are the OD matrices for the current (kth) and next ((k+1)th) iterations,

respectively. The optimum step length ( ) is obtained using Equation (57).

= (71)

Where, (71a)

The termination criteria for the optimisation problem in this study was chosen as

100 iterations.

5.6 EXPERIMENTS AND RESULTS: TOY NETWORK

True OD flows (Xtrue), observed Bluetooth OD flows ( ), observed link counts

( ), and turning proportions for every turning movement at the intersection ( ) for



the study network (Figure 5.2) were synthesized using Aimsun Next. The true OD

flows for OD pairs that are Bluetooth connected are represented by .The

Bluetooth OD matrix ( ) was generated using 20% of true with random fluctuations

of +/- 5%, as shown in Equation (72), where rand () function choses any value between

0 and 1.

) (72)

The experiments performed in this study consisted of six different cases, as

discussed below:

Case 1: Here, the objective function only minimised the deviation of link

counts.

Case 2: In this case, the objective function minimised the deviation of link

counts and maintained structural consistency using the structure of

Bluetooth OD flows from Ω=33% of OD pairs (i.e., only 2 out of 6 OD pairs

were Bluetooth connected).

Case 3: Here, the objective function minimised the deviation of link counts

and maintained structural consistency. Here, Ω=50% (i.e., only 3 out of 6

OD pairs were Bluetooth connected).

Case 4: The objective function minimised both deviation of link counts and

maintained structural consistency. Here, Ω=67% (i.e., only 4 out of 6 OD

pairs were Bluetooth connected).

Case 5: The objective function minimised both link counts deviation and

maintained structural consistency using Ω=83% (i.e., only 5 out of 6 OD


Case 6: The objective function minimised both link counts deviation and

maintained structural consistency using Ω=100% (i.e., all 6 OD pairs were

Bluetooth connected).

The results of the above-mentioned six cases were compared using two statistical

performance measures: and as previously discussed

in Equation (47) and Equation (49), respectively.



5.6.1 Convergence of gradient descent algorithm

The convergence of the gradient descent algorithm is demonstrated here by

plotting the RMSE and StrOD from consecutive iterations for all six cases, as shown

in the Figure 5.10 and Figure 5.11.

Figure 5.10: Convergence of RMSE for all cases

Figure 5.11: Convergence of StrOD for all cases

Figure 5.10 shows that the RMSE converged for all six cases, with the highest

value for Case-1 and lowest for Case-6. Similarly, in Figure 5.11, the similarity of the

structures between estimated and true OD improved from Case-1 to Case-6.

5.6.2 Structural consistency

Although the chosen prior OD had a high error value (102.59) and poorer

structure (0.2782) compared with that of true OD, Figure 5.12 demonstrates a

significant reduction in the RMSE value from 102.59 to 67.83 in both Case-1 and

Case-2. However, Figure 5.13 illustrates not much improvement in structure (StrOD

=0.3104 and 0.3105 for Case-1 and Case-2, respectively). This is because no additional

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

110.00

1 3 5 7 9 111315171921232527293133353739414345474951535557596163656769

RMSE

com

paris

on o

f tru

e an

d es

timat

ed O

D

Iterations

Case 1 Case 2 Case 3 Case 4 Case 5 Case 6

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.8000

0.9000

1.0000

1 3 5 7 9 111315171921232527293133353739414345474951535557596163656769

Str c

ompa

rison

of

true

and

estim

ated

OD

Iterations

Case 1 Case 2 Case 3 Case 4 Case 5 Case 6



structural knowledge of Bluetooth OD flows was used in Case-1, and only two OD

pairs were Bluetooth connected in Case-2 respectively. It is also clear from Table 5.5

that Case-1 overestimated and under-estimated the OD flows for O1-D1 (224.7) and

O1-D3 (111.3), respectively. Thus, if the quality of the prior OD is poor, a dependence

on only the deviations of link counts might not improve the quality of OD matrix

estimates. However, with the availability of additional structural knowledge, the

quality of OD estimates can be enhanced by maintaining structural consistency despite

starting with a poor prior OD. Referring to other cases in Figure 5.12 and Figure 5.13,

these show that the rate of improvement increased with Ω%.

Figure 5.12: RMSE comparison with

Figure 5.13: StrOD comparison with

5.6.3 Under-determinacy problem

The results also highlight the under-determinacy problem of traffic counts-based

OD matrix estimations. In other words, there could be many possible solutions for OD

matrices reproducing the same set of link counts. For instance, the link flows estimated

from Case-1 exactly matched the true link counts (see Case1 in Table 5.4). In fact, the

estimated link flowed from all six cases (Table 5.4) exactly matched the true flows

102.59

67.83 67.83

51.63

26.23 22.0814.36

0.00

20.00

40.00

60.00

80.00

100.00

Prior OD Case1 Case2 Case3 Case4 Case5 Case6

RMSE

com

paris

ion

OD from differnet cases

0.2782 0.3104 0.3105

0.6155

0.9507 0.9539 0.9729

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prior OD Case1 Case2 Case3 Case4 Case5 Case6

Stru

ctur

al co

mpa

risio

n

OD from differnet cases



( ), while their corresponding OD matrix estimates were different from each other

(Table 5.5). In Table 5.4, YPrior shows the initial estimate of link counts from prior OD

matrix.

Table 5.4: Comparison of link flows for the selected links

Links YPrior case1 case 2 case 3 case 4 case 5 case 6 l2 107 48 107 107 107 107 107 107 l5 609 265 609 609 609 609 609 609 l9 199 84 199 199 199 199 199 199 l14 153 68 153 153 153 153 153 153 l15 509 222 509 509 509 509 509 509 l16 253 107 253 253 253 253 253 253

Table 5.5: Comparison of OD demand flows

OD pairs Xtrue XPrior X case1 X case 2 X case 3 X case 4 X case 5 X case 6

O1-D1 107 101 225 225 142 106 108 115 O1-D2 91 40 89 89 128 72 84 93 O1-D3 227 50 111 111 155 247 233 217 O2-D1 199 80 190 190 151 169 174 224 O2-D2 92 36 86 86 68 76 74 72 O2-D3 199 90 214 214 272 246 242 194

5.6.4 Optimal percentage of Bluetooth connectivity

The results show a sudden jump and then later stabilisation after a certain

percentage of Ω. There was only a marginal improvement in both RMSE and StrOD

values, with an increase in Bluetooth OD connectivity from 67% (i.e., Case-4). The

network considered in this study might be simple, but for realistic networks,

knowledge of optimum Bluetooth connectivity has immense financial implications,

such as a reduction in installation and maintenance costs of the infrastructure.

5.7 EXPERIMENTS AND RESULTS: TMR NETWORK

True OD flows (Xtrue), Bluetooth OD flows ( ), observed link counts ( ), and

turning proportions for every turning movement at the intersection ( ) for the

study network (Figure 5.3) were synthesized in Aimsun. The Bluetooth OD matrix ( )

was randomly generated, with a mean 0.1 times of and standard deviation of +/-

10%, as shown in Equation (72), where the rand () function choses any value between

0 and 1.



(73)

The true OD flows for OD pairs that are Bluetooth connected are represented by

.To compare the performance of the proposed non-assignment-based approach

with a bi-level approach, two sets of experiments: non-assignment-based and

assignment-based, were designed. Both experiments were further divided into five

cases, and these are discussed in the following sections. Note that Xprior is generated

as described in Equation (50). The quality of Xprior can be expressed through RMSE

(Xprior, Xtrue) = 64.8 and GSSI (Xprior, Xtrue) = 0.6956.

5.7.1 Non-assignment-based vs assignment-based experiments

Case 1: Here, the objective function in both experiments was based only on

the deviation of link counts, without any additional knowledge of Bluetooth

OD. Note that traditional method for the non-assignment-based approach

implies that link counts were obtained from the proposed turning

proportions-based formulation, as discussed in Section 5.5.1. On the other

hand, they were obtained from simulation in the assignment-based method.

Case 2: In this case, the objective functions in both experiments included

the deviation of link counts and structural comparison of estimated OD

flows with Bluetooth OD flows for Ω=25% of OD pairs (i.e., only 18 out of

72 OD pairs were Bluetooth connected).

Case 3: This is similar to Case-2, with only a difference in the number of

Bluetooth connected OD pairs; that is, Ω=50% (i.e., only 36 out of 72 OD


Case 4: This is similar to Case-2 except that Ω=75% (i.e., only 54 out of 72

OD pairs were Bluetooth connected).

Case 5: This is similar to Case-2 except that Ω=100% (i.e., all 72 OD pairs

were Bluetooth connected).

The results of the above-mentioned six cases were compared using two statistical

performance measures: and GSSI as previously discussed

in Equation (47) and Equation (49), respectively. These are further discussed in the

following sections.



5.7.2 RMSE results

The RMSE results of both non-assignment-based and assignment-based methods

showed that the OD matrices estimated from all cases were better than Xprior (Figure

5.14).

Figure 5.14: RMSE results for non-assignment-based and assignment-based approaches

The percentage improvement with respect to the Xprior increased from 11.11%

to 32.72% and from 8.95% to 27.93% for both non-assignment-based and assignment-

based methods, respectively (Figure 5.15)

Similarly, the percentage improvement with respect to the traditional method

(i.e., Case-1) increased from 6.25% to 24.31% and 5.85% to 20.85% for both non-

assignment and assignment-based methods, respectively (Figure 5.16).

Figure 5.15: Percent improvement in RMSE with respect to Xprior - non-assignment vs assignment-based methods

64.8

57.60

54.00

49.6047.40

43.60

64.8

59.0

55.653.8

50.2

46.7

40

45

50

55

60

65

70


RMSE

Experiments

Non-Assgn. Assgn. Based

11.11

16.67

23.4626.85

32.72

8.95

14.2716.98

22.53

27.93

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

Tradtional Ω=25% Ω=50% Ω=75% Ω=100%% im

prov

emen

t in

RMSE

w.r.

t. Pr

ior

OD

Experiments




Figure 5.16: Percent improvement in RMSE with respect to traditional method- non-assignment vs assignment-based methods

5.7.3 GSSI results

The GSSI results of both the non-assignment-based and assignment-based

methods proved that the OD matrices estimated from all cases were better than Xprior

(Figure 5.17).

Figure 5.17: GSSI results for non-assignment-based and assignment-based approaches

The percentage improvement with respect to Xprior increased from 1.81% to

19.67% and from 0.92% to 17.80% for both non-assignment-based and assignment-

based methods respectively (Figure 5.18)

Similarly, the percentage improvement with respect to the traditional method

(i.e., Case-1) increased from 4.12% to 17.54% and 5.33% to 16.72% for both non-

assignment and assignment-based methods, respectively (Figure 5.19).

6.25

13.89

17.71

24.31

5.858.81

14.92

20.85

0.00

5.00

10.00

15.00

20.00

25.00

Ω=25% Ω=50% Ω=75% Ω=100%% im

prov

emen

t in

RMSE

w.r.

t. Tr

adtio

nal

OD

Experiments


0.69560.7082

0.7374

0.7738

0.7948

0.8324

0.6956 0.7020

0.7394

0.76390.7822

0.8194

0.650.670.690.710.730.750.770.790.810.830.85


GSS

I

Experiments




Figure 5.18: Percent improvement in GSSI with respect to Xprior - non-assignment vs assignment-based methods

Figure 5.19: Percent improvement in GSSI with respect to traditional method- non-assignment vs

assignment-based methods

5.7.4 Computational time: Non-assignment-based vs assignment-based

While both methods performed fairly well compared to the prior OD (Xprior) and

traditional method, there were two key differences between the two approaches. First,

the rate of RMSE values proved that the non-assignment-based method performed

better than that of assignment-based method which could be due to a) no errors

considered in the observations of turning proportions, and b) modelling errors in the

traffic assignment. Secondly, the bi-level method was computationally expensive

compared to the non-assignment-based method. While assignment-based method took

11.70 to 15.14 minutes for 20 iterations, the non-assignment-based method barely

required around 0.17 to 0.24 minutes for 100 iterations (refer Table 5.6).

1.816.01

11.24

14.2619.67

0.92

6.30 9.82

12.45

17.80

0.002.004.006.008.00

10.0012.0014.0016.0018.0020.0022.00


% im

prov

emen

t in

GSS

I w.

r.t. P

rior

OD

ExperimentsNon-Assgn. Assgn. Based

4.12

9.2612.23

17.54

5.33

8.8211.42

16.72

0.002.004.006.008.00

10.0012.0014.0016.0018.0020.00

Ω=25% Ω=50% Ω=75% Ω=100%

% im

prov

emen

t in

GSS

I w.

r.t.

Trad

tiona

l O

D

Experiments




Table 5.6: Comparison of computational times: Non-assignment-based vs assignment-based methods

Case Assignment-based method

(minutes)

Non-Assignment-based

method (minutes)

Case-1 15.14 0.17 Case-2 14.00 0.24 Case-3 14.01 0.19 Case-4 11.70 0.20 Case-5 15.11 0.24

5.8 SUMMARY

This chapter discussed a novel approach for estimating OD matrices using

observed turning proportions and the structure of Bluetooth OD flows. The

contribution of this methodology is twofold.

Firstly, the observations of turning proportions relax the dependence on

conventional assignment-based models. This implies that there is no longer a bi-level

framework and the issues associated with it, especially the computational cost, are

therefore minimised.

Secondly, the structural knowledge from observed Bluetooth OD flows was used

to maintain structural consistency in OD matrix estimates. This implies that a better

estimate can be obtained even with a poor structure of prior OD matrix.

A few methods have been proposed in the past to minimise the dependency on

the assignment; for instance, Nie, Zhang, and Recker (2005) and Barceló, Montero,

Bullejos, Serch, et al. (2013) proposed methods to estimate OD from estimated path

flows. While Nie, Zhang, and Recker (2005) proposed to decouple the user-

equilibrium based OD estimation problem through K-Shortest path ranking procedure,

Barceló, Montero, Bullejos, Serch, et al. (2013) considered path flows as the state

variables and used travel times from Bluetooth for mapping the link flows to origin

flows. However, the difference between their approach and the one proposed in this

thesis are as follows:

1. While both expressed objective function in terms of the path flows, in this

research, the objective function is expressed directly in terms of OD flows.



2. In both approaches, link flows are estimated from the estimated path flows. In

this thesis, link flows are estimated from the origin flows and observed turning

proportions.

While the study demonstrated an alternate non-assignment-based approach, it

has a few limitations. The following points discuss such limitations and plausible

solutions to address them.

While the study demonstrates that turning proportions observed at every

intersection are required to replace the need for assignment, practically it

might not be easy to obtain. For instance, the presence of shared lanes might

make it difficult to estimate turning movements from loop detectors.

However, in such situations, the turning proportions from the intersections

of critical paths connecting higher level zones can be used to estimate the

OD matrix at higher zonal level (say SA4) using the proposed non-

assignment-based approach. This higher zonal level OD can further be used

as additional constraint for estimating OD at lower zonal level (say SA2 or

SA3) using assignment-based approaches (such the one proposed in Chapter

4). This way the proposed non-assignment-based approach can act as a

higher order constraint in OD optimisation.

In this chapter, the methodology was demonstrated using the structure of

Bluetooth OD flows. As discussed in Section 4.5, the actual observations

from Bluetooth do not provide the complete sequence of trajectories and are

therefore not true trip ends. To address this, the proposed methodology can

be re-formulated using the structural knowledge of Bluetooth subpath flows

instead of Bluetooth OD flows.



Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application

of the BCC region 159

Chapter 6: Methodology to Cluster B-

OD Matrices and Identify

Typical Travel Patterns:

Case Study Application of

the BCC region

The previous chapters focussed on the development of methods primarily from

the structural perspective of OD matrices, such as: a) development of statistical metrics

for the structural comparison of OD matrices, and b) integrating the structural

information of Bluetooth trips into OD matrix estimation methods (both assignment

and non-assignment-based methods). However, this chapter focusses on the practical

application of the structural knowledge of B-OD matrices and the proposed statistical

metrics i.e. GSSI and NLOD. Both metrics are deployed independently as structural

proximity measures for a clustering algorithm to identify typical travel patterns and

the corresponding typical OD matrices from real Bluetooth data of the BCC region.

Due to the lack of a large-scale database of loop counts, the travel patterns from B-OD

matrices constructed from 415 days of Bluetooth data are analysed in this study.

The outline of this chapter is as follows: first, background about the travel

patterns is provided with a detailed review of similar studies in Section 6.1; the

proposed clustering-based methodology is discussed in Section 6.2; the experiments

and results based on structural proximity metrics - GSSI, NLOD, and a traditional

metric -RMSN are discussed in Section 6.3; and finally the summary to the chapter is

provided in Section 6.4.

6.1 BACKGROUND

A pattern means the “repeated or regular way in which something happens”

(Dictionary, 2018). A travel pattern can be defined as a repeated travel behaviour

related to various features, such as the origin and destination (OD) of travel (Kieu,



Bhaskar, & Chung, 2015), mode selection (De Haas, 2016), route selection (Lee &

Sohn, 2015), and activity (Lee & McNally, 2003). The focus of this chapter was OD

related travel patterns, and in this study, the term “travel pattern” should be considered

for the same.

The previous studies on the analysis of travel patterns, based on the level of detail

they provide, can be categorized into three types. They are traffic counts-based, travel

time/speed-based; trajectories-based patterns.

The traffic counts/travel times-based patterns analyse the patterns from the time-

series plots. For instance, few researchers analysed travel patterns by classifying traffic

volumes time series (Weijermars & Van Berkum, 2005; Wild, 1997); and travel time

series (Chung, 2003). With the availability of location-aware technologies such as

GPS, Mobile Phones and Bluetooth more studies have focussed on mobility patterns

in the form of trajectories. Based on varying space-time characteristics, Guo, Zhu, Jin,

Gao, and Andris (2012) classified trajectories-based information into three types:

point-based trajectories, point-based OD pairs and area-based OD pairs. Among these

types, most studies focused on spatial clustering of trajectories that have common

attributes such as spatial contiguity (Guo, et al., 2012), similar sub-trajectories (Lee,

Han, Li, & Gonzalez, 2008), and link flows/speeds (Laharotte et al., 2015). In regards

to their application, some studies inferred spatio-temporal patterns of activities (Gong,

Liu, Wu, & Liu, 2016); origin-destination hotspots (Gonzalez, Hidalgo, & Barabasi,

2008) etc. While the trajectory-based information provides more mobility detail than

that of OD flows, the latter is computationally effective for analysing larger spatio-

temporal dimensions of travel patterns (say daily mobility of any large-scale city)

(Guo, et al., 2012).

Very limited studies are found in the literature in regards to classification of days

based on traffic data such as speed/occupancy (Rakha & Van Aerde, 1995); travel time

series (Chung, 2003); traffic load profiles (Friedrich, Immisch, Jehlicka, Otterstätter,

& Schlaich, 2010) and OD flows (Andrienko, Andrienko, Fuchs, & Wood, 2017; Guo,

et al., 2012; Yang, Yan, & Xu, 2017). With respect to OD flows related patterns, graph

partitioning methods have becoming more popular. For example Guo, et al. (2012)

applied dynamic graph partitioning to represent day-of-the-week patterns using smart

card and Bluetooth data; and Naveh and Kim (2018) used trips ends of taxi trajectory

data to spatially cluster the GPS points and analyse their patterns across space and



time. However, representation of OD flows in dynamic graphs is computationally

expensive due to huge spatio-temporal dimensions of OD. Addressing this, previous

studies have proposed dimensionality reduction methods such as Principal Component

Analysis (PCA), Singular Value Decomposition (SVD) (Yang, et al., 2017); Non-

Negative Tensor Factorization methods (Guo, et al., 2012); and spatial abstraction

methods (converting graphs into multi-dimensional vectors) (Andrienko, et al.,

2017).While these methods can capture most of OD flow information they might miss

the subtle differences within the underlying patterns. For instance, PCA and SVD may

not be appropriate if the data points lie in different subspaces/density regimes

(Steinbach, Ertöz, & Kumar, 2004); and in spatial abstraction methods discretization

of flows and distances might fit different values within the same class.

In regards to exploiting hidden structure of OD matrices, Laharotte, et al. (2015)

presented Latent Dirichlet Allocation (LDA) approach to identify temporal patterns of

the Brisbane network based on different LDA templates such as high level of traffic,

even peak (high) or leisure etc. While Laharotte, et al. (2015) had reduced the B-OD

matrices into LDA (B-OD) templates and clustered those B-OD pairs, the study

proposed to cluster daily B-OD matrices to identify day-to-day variations. Although

past studies (Djukic, et al., 2013; Ruiz de Villa, et al., 2014) proposed structural

similarity metrics to compare OD matrices, clustering of daily OD matrices and

identifying typical OD matrices based on their structural proximity has not been

addressed before.

With respect to the travel patterns, many questions were raised in Section 1.2.5.

To answer these questions, this chapter explores a clustering-based approach to

classify an individual B-OD matrix into specific groups, where OD matrices within

the same group should have similar travel patterns. Raw Bluetooth data from 845

BMSs (Figure 1.9) were obtained for 415 days (June, July, August and December

months of 2015 and all months excluding April of 2016). In the following section, a

detailed methodology is discussed to cluster high-dimensional (Osorio (2017)

emphasizes that dimension of 200 is generally high dimensional) and multi-density B-

OD matrices and identify typical OD matrices of typical travel patterns.



6.2 METHODOLOGY TO CLUSTER B-OD MATRICES AND

IDENTIFY TYPICAL TRAVEL PATTERNS

The following sections discuss the traditional DBSCAN approach followed by

the proposed three-level approach and distance measures for the clustering algorithm.

6.2.1 Traditional DBSCAN approach

A density-based spatial clustering of applications with noise (DBSCAN)

algorithm was selected for the current application. The algorithm, originally proposed

by (Ester, Kriegel, Sander, & Xu, 1996), is widely used to cluster data points based on

their density. The advantage of a DBSCAN algorithm is that it does not require any

predetermined number of clusters and the size of a cluster is not fixed (Kieu, et al.,

2015). The following sections provide a conceptual framework for the algorithm,

where the data point in the current application should be read as a B-OD matrix.

The algorithm first marks all of the data points as “non-visited”, starting with an

arbitrary selection of a “non-visited” point and identifying all other data points within

ε distance (distance threshold). These data points, if any, are termed as neighbourhood

points. If the number of neighbourhood points is at least MinPts (size threshold) then

the data point under consideration becomes the first point of a new cluster where the

neighbourhood points are part of the same cluster; otherwise, the data point is labelled

as noise. In either case, the data point is now marked as “visited”. If a cluster is

identified, then the above process for defining neighbourhood points is repeated for all

of the new points identified as neighbourhoods in the current cluster and the number

of points in the cluster is extended. Thereafter, a new “non-visited” point is selected,

and the process is repeated until all of the points are marked as “visited”. This leads to

each point being defined as either a cluster or a marked as noise.

From the above, it is clear that the algorithm does not require a number of pre-

determined clusters, as in k-NN (Altman, 1992), and is able to define clusters with

varying density. It also identifies outliers as noise. However, as the algorithm is

sensitive to the setting of its parameters (ε and MinPts), the algorithm does not perform

well for multi-density data sets (Huang, Yu, Li, & Zeng, 2009). Moreover, in the

current application, where data points are high dimensional matrices, a relevant

indicator is required to define the ε. To address these needs, the following sub-sections



discuss setting DBSCAN parameters and distance measures for B-OD clustering.

Following this, the experiments and results are discussed.

6.2.1.1 Setting DBSCAN parameters

The optimum DBSCAN parameters in the traditional approach are identified using

a simple and interactive heuristic proposed by Ester et al. (1996), as discussed below

(see Figure 6.1):

Step 1: First, a k-dist function is defined to maps each data point, p, to the

distance values (k-dist (p)) corresponding to their kth-nearest neighbour.

Step 2: For a given value of k, choose the kth neighbourhood of every point in

the database and plot the points (x-axis) in the descending order of k-dist values

(y-axis). The graph resulting from this distribution is referred to as sorted k-

dist graph.

Step 3: The shape of the sorted k-dist graph further helps to identify the

threshold point. The parameter MinPts is set to k and is chosen corresponding

to the valley of the sorted k-dist graph. The valley points are identified through

a visual observation, and as such, this technique is an interactive approach. All

points on the left side of the threshold point (i.e., higher k-dist value) are

considered noise and the remaining points are assigned to some clusters.

Figure 6.1: Typical shape of sorted k-dist graph

For ease of explanation, the above technique is presented with an example.

Figure 6.2 (left) shows five data points (P1, P2, P3, P4, and P5) that need to be

clustered using their k-dist values corresponding to their kth nearest neighbour. Here,

the values presented on the link joining the points is the distance between the points.

The kth nearest neighbour and k-dist (within brackets) of all points are shown in Figure



6.2 (right) for k=1, k=2, and k=3. For instance, the 1st, 2nd, and 3rd nearest neighbours

of P3 are P2, P4, and P1, respectively. The sorted k-dist plots for k=1, k=2, and k=3

along with corresponding valley points are illustrated in Figure 6.3. Here, the y-axis

represents k-dist values and the x-axis shows the order of points. It should be noted

that the order of points changes as k changes. After setting MinPts equal to k, the

optimal values are nothing but the k-dist values corresponding to the valley points of

sorted k-dist plots. For instance, the optimal values for MinPts=1 is 3, MinPts=2 is

3.2, and MinPts=3 is 7. The points on the left side of the valley correspond to noise

and the rest form clusters, as shown in Figure 6.3. As can be seen, for MinPts=1,

clusters can be formed using points that are in proximity (i.e., P1, P2, P3, and P4) while

considering one point (P5) as noise. Similarly, for MinPts=2 and MinPts=3, clusters

can be formed using P1, P3, and P4 while considering P4 and P5 as noise.

Alternatively, it can also be observed from Figure 6.2 that P2 and P5 are slightly away

from rest of the points. Thus, they have a higher possibility of forming noise as

compared to others.

Ester et al. (1996) identified that k-dist graphs for k > 4 did not significantly

differ from the 4-dist graph. Thus, they fixed MinPts to be 4 and identified the

threshold corresponding to the valley of 4-dist graph.

Figure 6.2: Sample data points (left) along with kth nearest neighbour and k-dist of all points (right)

Figure 6.3: Sorted k-dist graphs for k=1, k=2 and k=3 and the resulting clusters

P4

P5

P1

P3

P2

3.2

1

2

3

4

5

6

P5 P2 P3 P1 P4

k=1

Clusters

1

3

5

7

9

P2 P5 P1 P3 P4

k=2

Noise

1

4

7

10

13

P2 P5 P4 P1 P3

k=3

Noise NoiseClusters

Clusters

Valley Valley Valley

1-D

ist

2-D

ist

3-D

ist

Order of the points Order of the points Order of the points



A traditional DBSCAN algorithm performs poorly if the data points are of varied

density (multi-density data sets). To address this, some researchers have suggested

dividing datasets into different density levels (referred as subspaces) prior to the

clustering process (Elbatta & Ashour, 2013; Parsons, Haque, & Liu, 2004). The

difference in density levels can be observed from sorted k-dist plots. For instance,

Figure 6.4 shows a typical sorted k-dist plot if there are two density levels in the data

points. Thus, the decision to consider subspace clustering is made based on the density

distribution. As such, major subspaces/clusters are initially identified within the

datasets and the clustering process is then performed within the subspaces.

Figure 6.4: Demonstration of two density levels through sorted k-dist plot

6.2.2 Three-level approach for identifying DBSCAN parameters

This section discusses the methodology developed to identify the optimum

DBSCAN parameters using a three-level approach. Figure 6.5 illustrates the overview

of the three-level approach based on DBSCAN clustering algorithm. It describes the

methodology adopted to estimate typical OD matrices of typical travel patterns by

clustering β (in the study β= 415) B-OD matrices. The step by step approach is

described as follows.



Figure 6.5: Three level approach to cluster B-OD matrices

First level: Identify the possible subspaces

o Step 1: First, the density distribution of data points is observed from

sorted k-dist plots for k=1 to k=15. Based on the experiments in this

study, for k>15 the number of clusters formed were less than or equal

to 2. Thus, an upper limit of k to be 15 was selected. If plots show v

distinct valleys, then it is a v-density dataset. Thus, the data points are

further split into v subspaces for subspace clustering. If the plots

represent only one valley, then no subspace clustering is undertaken.

Second level: Identifying the initial set of DBSCAN parameters

o Step 2: Unlike the approach adopted by Ester, et al. (1996); that is,

visually inferring threshold from the valley of sorted k-dist plots, it is

proposed in this study that the shortest distance from origin criterion is

to identify the initial set of DBSCAN parameters represented by

),… ) )]. According to this



criterion, the valley of a sorted k-dist graph corresponds to point at the

shortest distance from the origin of axes formed by k-dist values in the

y-axis and sorted data points (OD matrices) in the x-axis.

Third level: Identifying the optimum set of DBSCAN parameters ( ) and the

resulting clusters.

o Step 3: DBSCAN clustering is now performed using the set of

parameters ( ) identified in the second level.

o Step 4: Although a good number of clusters is required, at the same time

unimportant clusters are not wanted. Thus, those parametric

combinations of and MinPts that result in c homogeneous clusters

where, cl <= c <= cu. The lower and upper limits are analyst’s

discretion, and in this study 3 <= c <= 6 is considered. The selected

parameters are referred to as ( ) and the rest of parametric

combinations are ignored. The homogeneous clusters belonging to

these parametric combinations are the final clusters.

Section 6.3 explains the above process further, with an example from the real

data.

6.2.3 Distance measures for clustering B-OD matrices

The two statistical metrics proposed in Chapter 3; that is, the GSSI and NLOD,

were deployed as the structural proximity measures for comparison of OD matrices.

In this research, the applicability of these metrics were independently tested as distance

measures for clustering B-OD matrices. First, the formulations and characteristics of

both statistical metrics are discussed and thereafter the distance measures are defined.

Since the DBSCAN algorithm considers a distance matrix for clustering process,

GSSI values are initially converted into distance values (dGSSI) using Equation (74).

The pre-computed 415*415 GSSI matrix is multiplied by 1,000 so that the distance

value is close to one decimal place.

(74)



The NLOD in itself is a distance value; thus, it requires no further conversion.

However, to be consistent with GSSI the 415*415 NLOD matrix is multiplied by 1000


dNLOD = 1000*(NLOD( )) (75)

To compare the results of experiments based on the structural proximity measures

with a traditional metric that does not account the OD matrix structural information,

normalized root mean square error (RMSN) is chosen. The formulation for RMSN is

taken from (Antoniou, et al., 2004) and is shown in Equation (76). To be consistent

with other distance measures, the equivalent distance measure for RMSN is obtained

by multiplying Equation (76) with 1000 as shown in Equation (77).

RMSN ( ) =

(76)

dRMSN = 1000*(RMSN ) (77)

6.3 EXPERIMENTS AND RESULTS

This section details the conduct of experiments using dGSSI (Experiment-1) and

dNLOD (Experiment-2) as proximity measures and their corresponding results are

compared against Experiment-3 that is based on dRMSN..

The initial observations from sorted k-dist plots indicated a possibility of two

different density regimes in the datasets for all three experiments (Figure 6.6, Figure

6.7 and Figure 6.8). Thus, based on Step-1 of the three-level approach (Section 6.2.2),

all data points were first divided into two different subspaces. It was observed that the

first 129 points (in the order shown by x-axis) defined subspace-1 and consisted of

Saturdays, Sundays, public holidays, and long weekends. The rest of the data points

were pre-classified as subspace-2, which consisted of regular weekdays (WDR) and

weekday school holidays (WDSH). The experiments for individual subspaces are

described in the following subsections.



Figure 6.6: Sorted k-dist plots for experiment-1





6.3.1 Experiment-1: dGSSI as proximity measure

6.3.1.1 Subspace-1 analysis

Here, the analysis was performed on 129 data points of subspace-1. The initial

set of DBSCAN parameters; that is, were identified based on the shortest

distance from origin criterion. Figure 6.9a presents the number of clusters formed for

different MinPts. The pie-chart represents a consistent proportion of clusters

(homogeneous clusters) for (MinPts =4 to MinPts =9). The relationship between the

optimum parameters ( ) was observed to be linear, with an R2 value of 0.8932 (see

Figure 6.9b).

The clusters of subspace-1 from Experiment-1 were:

Cluster-1 (C1) included weekends, Public Holidays, Long Weekends,

January to June 2016.

Cluster-2 (C2) included Sundays of Spring and summer, 2016;

Cluster-3 (C3) included Saturdays of Spring and summer, 2016;

Cluster-4 (C4) included Sundays of Winter, 2015; and

Cluster-5 (C5) included Saturdays of Winter, 2015.

Figure 6.9: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for Subspace-1 of experiment-1


Similar to the last analysis, the graphs presented in Figure 6.10a indicate the

number of clusters formed for different MinPts and Figure 6.10b indicates the linear

R² = 0.8932

30.00

31.00

32.00

33.00

34.00

35.00

36.00

3 4 5 6 7 8 9

MinPts ( )

0

2

4

6

8

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Num

ber

of c

luste

rs

MinPts

(A) (B)

C148%

C215%

C312%

C413%

C512%



relationship (with R2 =0.94) between optimal DBSCAN parameters. The following are

the observed clusters:

Cluster-1 (C1) included regular weekdays of 2016 except summer;

Cluster-2 (C2) included regular weekdays, 2015;

Cluster-3 (C3) included weekday school holidays, 2015 and 2016; and

Cluster-4 (C4) included regular weekdays of November 2016

Figure 6.10: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for subspace-2 of experiment-1

6.3.2 Experiment-2: dNLOD as proximity measure


The relationship between MinPts and the number of clusters is illustrated in

Figure 6.11a, while Figure 6.11b shows the linear relationship between the optimal

DBSCAN parameters (R2 =0.8977) that resulted in the following clusters:

Cluster-1 (C1) included weekends, Public Holidays, Long Weekends,

January to June 2016.

Cluster-2 (C2) included Sundays of Winter, 2015; and

Cluster-3 (C3) included Saturdays of Winter, 2015.

0123456789

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Num

ber

of c

luste

rs

MinPts

(A) (B)

R² = 0.94

15.00

16.00

17.00

18.00

19.00

20.00

2 4 6 8 10

MinPts ( )

C144%

C224%

C324%

C48%





The relationship between MinPts and number of clusters is shown in Figure

6.12a. The relationship between ε and M for subspace-2 of experiment-2 was also

found to be linear, with a R2 value of 0.9716 (Figure 6.12b). The clusters resulting

from this analysis were:

Cluster-1 (C1) included regular weekdays of 2016 except Summer;

Cluster-2 (C2) included regular weekdays, 2015;

Cluster-3 (C3) included weekday school holidays, 2015 and 2016; and

Cluster-4 (C4) included regular weekdays of November 2016.


0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Num

ber

of c

luste

rs

MinPts

(A) (B)

C137%

C233%

C330%

R² = 0.8977

94.00

99.00

104.00

109.00

4 6 8 10 12

MinPts ( )

01234567

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Num

ber

of c

luste

rs

MinPts

(A) (B)

C146%

C223%

C322%

C49%

R² = 0.9716

65.00

67.00

69.00

71.00

73.00

75.00

2 3 4 5 6

MinPts ( )



6.3.1 Experiment-3: dRMSN as proximity measure

6.3.1.1 Subspace-1 analysis:

The distance measure dRMSN has resulted in only one major cluster for subspace-

1. It included all Saturdays, Sundays, Public Holidays of 2015 and 2016 except

Saturdays of spring and summer, 2016 that was considered to be noise.

6.3.1.2 Subspace-2 analysis:

A total of 4 homogeneous clusters are formed for MinPts ranging from 4 to 13.

The relationship between MinPts and number of clusters are illustrated in Figure

6.13(a) and Figure 6.13(b) shows the linear relationship between the optimal

DBSCAN parameters (R2 =0.9832) that resulted in the following clusters:

Cluster-1 (C1) includes WDR of 2016 except summer;

Cluster-2 (C2) includes WDR, 2015;

Cluster-3 (C3) includes WDSH, 2015 and 2016; and

Cluster-4 (C4) includes WDR of November 2016.

Figure 6.13: (a) Number of clusters vs MinPts and proportion of clusters; and (B) vs for Subspace-2 of experiment-3

6.3.2 Typical B-OD flows

One of the ways to derive typical B-OD matrices and typical OD flows for

individual OD pairs is by taking average of all B-OD matrices within each cluster type.

To give an example of the difference among the typical OD flows, OD flows for the

OD pair-Mt. Gravatt and Brisbane CBD is shown in the Box-Whisker plot (Figure

6.14). The plot is shown for the clusters resulted from experiment-1 where, the first 5

clusters in the x-axis correspond to C-1 to C-5 of subspace-1 and the last 4 clusters

0

2

4

6

8

10

12

14

16

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Num

ber o

f clu

sters

MinPts

C141%

C224%

C323%

C412%

R² = 0.9832

135

140

145

150

155

160

165

170

175

2 4 6 8 10 12 14

Opt

imum

ε

MinPts



correspond to C-1 to C-4 of subspace-2, respectively. The y-axis represents the OD

flow values.

Figure 6.14: Box-Whisker plot demonstrating the difference among the typical B-OD flows for OD pair – Mt. Gravatt and Brisbane CBD (results of experiment-1)

6.3.3 Discussion

Since the ground truth is unknown, one of the ways to compare the clusters

resulted from all three experiments is to see how good they are able to reproduce pre-

classified day types. The number of days in each category of day type are shown in the

Figure 6.15. (Refer Figure 6.16 or notations section for the expansion of the terms used

in Figure 6.15).

Figure 6.15: Classification of day types

While the comparison in Figure 6.16 shows that PH (Public Holidays), LW

(Long Weekends), School Holidays during Saturdays and Sundays could not form

standalone clusters, both GSSI (9 clusters) and NLOD (7 clusters) could represent the

pre-classification better than RMSN (5 clusters). The similarity in the clusters resulted

from GSSI and NLOD are further explained in detail below.

3918

40

16

219

67

5 11

SATRSATSHSUNRSUNSHWDRWDSHPHLW



Both metrics were able to differentiate weekday and weekend patterns. In

fact, there was no typical weekend travel pattern because travel patterns

during Saturday and Sunday were found to differ from each other.

Both metrics observed seasonal trends in travel patterns. For instance,

Saturdays during the Australian Winter, 2015 were observed to have

different travel patterns compared to the rest of the Saturdays. A similar

observation was noted for the Sundays of Winter, 2015.

Both metrics identified a group of Saturdays and Sundays during the

school holiday season that shared similar travel patterns with a few well

noted public holidays of Australia.

The classification of subspace-2 (i.e., WDR and WDSH) was the same in

both experiments. This identified that the travel patterns during WDSH

differed from those of the WDR. Interestingly, WDSH from both 2015 and

2016 were grouped into one single cluster by both metrics.

Both metrics identified that WDR travel patterns during November 2016

differed from those of other regular working weekdays. The difference in

travel patterns during November 2016 could be attributed to major events

held in that month. The annual report published by Royal National

Agricultural and Industrial Association of Queensland (RNA, 2016)

estimated that, in 2016, the Brisbane Showgrounds attracted almost a

million people by hosting more than 250 events, with an increase of 20%

compared to 2015. The month of November was the busiest month of

2016, due to hosting a total of 35 events.

However, the only difference between them is that GSSI identified Saturdays

and Sundays from Australian spring and summer of 2016 into two individual clusters

which NLOD failed to differentiate. The less sensitivity of NLOD in this regard can

be attributed to the fact that it computes statistics on OD pairs belonging to one specific

origin, whereas, GSSI computes statistics on groups of OD pairs belonging to more

than one origin. Due to this, GSSI is able to capture subtle structural differences in

travel patterns during the afore-mentioned days.

On the other hand, clusters produced from experiment-3 (based on RMSN)

demonstrated seasonal trends in subspace-2 travel patterns and were similar to the



results of other experiments. However, it failed to distinguish the differences among

the daily travel patterns during Saturdays, Sundays and Public Holidays. Resulting in

one major cluster, it was unable to recognize seasonal variations within other days in

subspace-1. This is because RMSN is based on deviations of individual OD flows due

to which it could not identify the structural differences within the respective B-OD

matrices.

The typical OD flows (see section 6.3.2 for results of experiment-1) from each

cluster demonstrated typical travel patterns of the Brisbane city and are better than the

observations from a similar study by Guo, et al. (2012) conducted on Brisbane city

over the same time period. Guo, et al. (2012) could identify only three types of travel

patterns namely Saturday, Sunday and Weekday patterns. This is perhaps because

travel patterns are analysed on the dimensionally reduced OD matrices. However, the

present study is able to identify other patterns highlighting the strength of structural

proximity measures to identify more typical travel patterns.

For travel demand modelling, the knowledge of travel patterns can be used for

estimating typical OD matrices using bi-level solution algorithms. Moreover, the

knowledge of travel patterns is important for effective policy decisions such as shifting

public holidays of similar travel patterns towards weekends can form more number of

long weekends (Chung, 2003). This would encourage public to spend more during the

holidays, and thus boosting the nation’s economy. Further, the knowledge of seasonal

distribution of travel patterns help transport planners to schedule the travel surveys

across the study network over any period. For instance, the Household Travel Survey

(HTS) for South East Queensland (SEQTS, 2010) was conducted for over 10 weeks

from mid-April through late-June and in July in 2009. However, the survey period

avoided the days during School/University holidays. Since, the study showed that the

travel patterns are different during school holidays and during different seasons,

distributing the survey period over a year based on the knowledge of Bluetooth travel

patterns can capture better travel patterns of any study region. There are short-term

ITS applications of identifying typical OD matrices. For instance, developing the

database of typical historical time-sliced OD matrices can improve the performance of

OD prediction algorithms (like Kalman Filter) for real time traffic management and

decision making such as Aimsun Live (Aimsunlive, 2017) etc.

Cha

pter

6: M

etho

dolo

gy to

Clu

ster

B-O

D M

atric

es a

nd Id

entif

y Ty

pica

l Tra

vel P

atte

rns:

Cas

e St

udy

App

licat

ion

of th

e B

CC

regi

on

177

Figu

re 6

.16:

Com

paris

on o

f clu

ster

s res

ulte

d fr

om a

ll th

ree

expe

rimen

ts

2015

2016

2015

2016

2015

,16

2015

,16

2015

2016

2015

2016

2015

2016

2015

2016

1W

eeke

nds,

PH a

nd L

W, J

an-J

un 2

016

23

816

817

2Su

nday

s, Sp

ring

and

sum

mer

201

61

25

83

Satu

rday

s, Sp

ring

and

sum

mer

201

65

94

Sund

ays,

Win

ter 2

015

13

105

Satu

rday

s, W

inte

r 201

53

106

WD

R,

2016

exc

ept s

umm

er11

97

WD

R, 2

015

631

8W

DSH

, 201

5 an

d 20

163

2240

9W

DR

, Nov

embe

r 201

623

2015

2016

2015

2016

2015

,16

2015

,16

2015

2016

2015

2016

2015

2016

2015

2016

1W

eeke

nds,

PH a

nd L

W, J

an-J

un 2

016

33

141

282

101

282

Sund

ays,

Win

ter 2

015

13

103

Satu

rday

s, W

inte

r 201

53

104

WD

R,

2016

exc

ept s

umm

er11

95

WD

R, 2

015

636

WD

SH, 2

015

and

2016

117

407

WD

R, N

ovem

ber 2

016

23

2015

2016

2015

2016

2015

,16

2015

,16

2015

2016

2015

2016

2015

2016

2015

2016

Subs

pace

-11

Wee

kend

s, PH

and

LW

, 201

5 an

d 20

165

116

1011

186

1011

292

WD

R, 2

015

611

3W

DSH

, 201

5 an

d 20

161

2239

4W

DR

, Nov

embe

r 201

624

5W

DR

, 20

16 e

xcep

t sum

mer

109

Long

W

eeke

nds

(LW

)

Satu

rday

sSu

nday

s

Dur

ing

Scho

ol

Hol

iday

s (S

ATS

H)

Reg

ular

(S

ATR

)

Dur

ing

Scho

ol

Hol

iday

s (S

UN

SH)

Reg

ular

(SU

NR

)

Expe

rimen

t-3: R

MSN

Subs

pace

-2

Subs

pace

-1

Subs

pace

-2

Expe

rimen

t-2: N

LOD

Subs

pace

-1

Subs

pace

-2

Expe

rimen

t-1: G

SSI

Wee

kday

sPu

blic

Hol

iday

sW

eeke

nds

Reg

ular

W

eekd

ays

(WD

R)

Scho

ol H

olid

ays

durin

g w

eekd

ays

(WD

SH)

Nor

mal

Pub

lic

Hol

iday

s (P

H)



6.4 SUMMARY

Although DBSCAN clustering algorithm is not new, the study has two major

contributions:

Firstly, clustering multi-density OD matrices based on structural proximity measures

to identify typical daily travel patterns of large-scale network has not been addressed

in the literature.

Secondly, the proposed three-level clustering approach is simple and effective in

identifying the OD clusters. The prior identification of subspaces addresses the

incapacity of classical DBSCAN with respect to multi-density datasets. Identification

of the set of optimum DBSCAN parameters demonstrates that different parametric

combinations can produce homogeneous clusters and their relationship is nearly linear.

The clustering results demonstrated many typical travel patterns for the BCC

region. All three experiments showed that there were seasonal variations in the travel

patterns for weekdays, and the travel patterns of during weekday school holidays and

November 2016 were unique. The experiments based on structural proximity measures

could identify the seasonal variations even among the travel patterns during Saturdays

and Sundays. On the other hand, RMSN failed to identify any unique travel patterns

within subspace-1 because of its incapacity to capture the subtle structural differences

within those patterns. This highlights the importance of accounting the structural

information of OD matrices with many practical benefits for both long-term strategic

and short-term transport planning applications.

Chapter 7: Conclusion 179

Chapter 7: Conclusion

This chapter contains the conclusions, limitations, and recommendations related

to the research. First, a summary of this thesis is provided in Section 7.1. Second, the

findings of the study and their connection to the research questions raised in Chapter

1 are reflected upon in Section 7.2. Lastly, based on the understanding gained in this

research, new and pertinent questions for future research are discussed in Section 7.3.

7.1 BRIEF SUMMARY

Estimating OD matrices has been the study of transport modelling research for

more than last three decades. Ever since traffic counts began to be treated as indirect

observations of OD flows, “matrix estimation” has been considered an optimisation

problem. Since then, many methods have been proposed and implemented with respect

to solution algorithms, assignment models, rules-based heuristics, objective function

formulations, measurements from alternate data sources, and statistical performance

measures. While most of the methods developed thus far fall under the schema of bi-

level modelling framework, many challenges are yet to be resolved. First, a traffic

count-based bi-level method is an under-determined problem and to address this most

methods are still dependent on an outdated target OD matrix to maintain the structural

consistency in an OD matrix estimation. Second, assignment-models remain

challenging due to modelling errors and inseparable dependency on OD matrix. Third,

bi-level methods are computationally challenging due to the dimensionality of an OD

matrix and lower-level user-equilibrium assignment problem. Fourth, most existing

statistical performance measures do not account for the structural information of OD

matrices. Fifth, there is a great need to identify typical travel patterns and their

corresponding typical OD matrices in demand modelling. The last challenge is related

to bridging the gap between the availability of massive amounts of big-traffic data and

their direct implementation into transport models, especially tackling the issue related

to unknown market penetration rates of trips inferred from advanced data sources.

This research is an attempt to review the literature, understand the state-of-the-

art techniques, and propose methods to address some of the challenges. Specifically,


this study proposes methods to exploit the additional structural knowledge available

from other big data sources, such as Bluetooth, to maintain structural consistency and

address the problem of under-determinacy, develop alternate methodology to the

existing bi-level-based framework, develop new statistical performance measures for

the structural comparison of OD matrices, and propose a methodological approach to

cluster B-OD matrices and identify typical travel patterns based on the structural

proximity measures using a case study application on real Bluetooth datasets from

BCC region.

The Brisbane City network is already equipped with several Bluetooth scanners.

This Bluetooth data is a good source of travel related information in both spatial and

temporal contexts. While the current applications are only limited to travel time

estimation, the unexplored potential of trip-related information formed the strong

motivation for the current research. Taking one step beyond the existing

implementation, the current study investigated the potential of Bluetooth data and

proposed new methods for improving the quality of OD matrix estimates using

additional knowledge (either the “structure of trips” and/or turning proportions) of

Bluetooth observations. Few analyses were conducted as a part of this research (see

Appendix B) to add more confidence into the structural knowledge of real Bluetooth

observations from the BCC region. However, in the absence of ground truth,

simulation-based experiments are the only way to strengthen the argument that the

“structure” of Bluetooth trips could improve the quality of OD estimates. Although,

the current research is based on Bluetooth observations and applied on the BCC region,

the methodology is applicable for data from any other similar data sources that can

provide additional information related to the structure of trips over any other study

network.

Overall, the entire study is based on enhancing the existing research with respect

to OD matrices comparison (through structural similarity measures); OD matrix

estimation (through the knowledge of Bluetooth trips/turning proportions), and

identification of typical travel patterns and typical OD matrices (through structural

proximity-based clustering method).


7.2 RESEARCH FINDINGS

The study identified major research gaps, which lead to the development of four

research questions (Chapter 1) following a comprehensive review of the literature

(Chapter 2). In conjunction with the research questions, the research findings are

discussed as follows:

The sensitivity analysis results from Chapter 3 demonstrated that GSSI and

NLOD are robust statistical performance measures that have enough

potential to structurally compare OD matrices, which answered the first

research question (RQ1).

The findings of Chapter 4 answered RQ-2, as follows:

o The B-OD method demonstrated that the additional structural

knowledge of Bluetooth OD flows can improve the quality of OD

matrix estimates. The B-OD method is suitable for the networks (such

as the BCC region) that have a good connectivity of Bluetooth scanners.

Although, the B-OD method assumes that the trip ends are exactly

known, the methodology still holds well for observations from any

other emerging data sources that can provide more confidence about

trip ends compared to Bluetooth.

o The B-SP method suits the situations when the penetration rate of

Bluetooth trajectories is low. This method demonstrated the

applicability of Bluetooth subpath flows. The quality of the OD matrix

estimates are found to be better than the traditional traffic counts-based

approach even for 2.5% penetration rate of Bluetooth trips.

o Since, the core of both methods is based on structural information of

Bluetooth trips, the need to estimate unknown penetration rates of

Bluetooth trips is relaxed.

The findings of Chapter 5 answered RQ-3, as follows:

o It demonstrated the ability of the proposed turning-proportion-based

technique as an alternate method to replace the assignment-based

models.

o The improvement in the quality of the OD matrix estimates through

additional knowledge of Bluetooth trips strengthened the proposed


single-level formulation. In fact, knowledge about traffic assignment

was implicitly considered in the observed turning proportions and

Bluetooth trips.

The core of Chapter 6 was to develop a methodological approach to cluster multi-

density B-OD matrices database and identify typical travel patterns with a real

case study application on the BCC region. This chapter addressed RQ-4. The

major findings of clustering analysis were:

o The clusters resulting from experiment-1 and experiment-2

demonstrated the ability of the proposed statistical metrics – GSSI and

NLOD as potential structural proximity measures for DBSCAN

clustering algorithm.

o The clusters from experiment-3 that is based on RMSN failed to

distinguish travel patterns during the weekends and public holidays.

This is because most traditional metrics do not the account the structure

of OD matrices in their mathematical formulation and due to which they

could not identify the subtle structural differences in the afore-

mentioned travel patterns.

7.3 RECOMMENDATIONS FOR FUTURE RESEARCH

This section discusses the future research directions and some pertinent

questions:

Although introducing randomness in Bluetooth flows demonstrated

improvement in the quality of OD flow estimates, to achieve more realistic

modelling, the experiments could include errors and inconsistencies in the

observed traffic counts and turning proportions.

In this study, Bluetooth subpaths were created by trimming the first and last

IDs of BMS from the complete sequence of trips. However, as shown in

Figure 1.12, there could be mis-detections within the Bluetooth trajectories.

Accounting for these mis-detections before incorporating them into the

optimisation model would be even more realistic.

Future studies could be tested using state-of-the-art solution algorithms,

such as versions of SPSA (Tympakianaki, et al., 2018) or metamodels


(Osorio, 2019), and these could be compared with other solution algorithms,

such as a genetic algorithm (Kim, et al., 2001), etc., over a benchmark

network. More improvements could be made with respect to the parameters

of gradient-based algorithms. For instance, in the present study, the prior

step-size was chosen through trial-and error. However, the sensitivity of OD

flows to different values of step-sizes and the rate of change of step-sizes

need to be investigated. The step-sizes could also be sensitive to the OD

flow values; that is, higher and lower flow values. Convergence criteria

could also be tested for future investigation.

The current research focussed only on utilising the knowledge of Bluetooth

trips in the objective function formulation. As vehicle trajectories can be

inferred from Bluetooth observations, they could be used to calibrate the

assignment model in the future research.

This study can be extended to dynamic OD space. Current state-of-the-art

techniques to estimate better quality time-dependent OD matrices use quasi-

dynamic approaches. Thus, the methods proposed in this research could

incorporate a quasi-dynamic assumption with respect to the distribution of

origin flows and estimate better time-dependent offline OD matrices. Quasi-

dynamic Kalman filter algorithms could then be investigated with additional

measurements from Bluetooth observed flows for real-time estimation of

OD flows.

Bibliography 184

Bibliography

Abedi, N., Bhaskar, A., & Chung, E. (2013). Bluetooth and Wi-Fi MAC address based crowd data collection and monitoring: benefits, challenges and enhancement. Retrieved from

Abedi, N., Bhaskar, A., & Chung, E. (2014). Tracking spatio-temporal movement of

human in terms of space utilization using Media-Access-Control address data. Applied Geography, 51, 72-81. Retrieved from

Abedi, N., Bhaskar, A., Chung, E., & Miska, M. (2015). Assessment of antenna

characteristic effects on pedestrian and cyclists travel-time estimation based on Bluetooth and WiFi MAC addresses. Transportation Research Part C: Emerging Technologies, 60, 124-141. Retrieved from

ABS (Singer-songwriter). (2017). More than two in three drive to work, Census

reveals. On. Retrieved from http://www.abs.gov.au/ausstats/[email protected]/mediareleasesbyReleaseDate/7DD5DC715B608612CA2581BF001F8404?OpenDocument

ABS. (2018). Census of Population and Housing: Community Profile, DataPack and

TableBuilder Templates, Australia, 2016 Retrieved from http://www.abs.gov.au/AUSSTATS/[email protected]/Latestproducts/2079.0Main%20Features42016?opendocument&tabname=Summary&prodno=2079.0&issue=2016&num=&view=. http://www.abs.gov.au/AUSSTATS/[email protected]/Latestproducts/2079.0Main%20Features42016?opendocument&tabname=Summary&prodno=2079.0&issue=2016&num=&view=

Ahas, R., Silm, S., Järv, O., Saluveer, E., & Tiru, M. (2010). Using mobile positioning

data to model locations meaningful to users of mobile phones. In Journal of urban technology (Vol. 17, pp. 3-27).

Aimsun. (2019). Aimsun Next 8.4 User's Manual. Aimsun, Barcelona, Spain.

Retrieved from https://www.aimsun.com/ Aimsunlive. (2017). Gold Coast: Predictive Solutions Trial. Retrieved from

https://www.aimsun.com/gold-coast-predictive-solutions-trial/Retrieved from https://www.aimsun.com/gold-coast-predictive-solutions-trial/

Alexander, L., Jiang, S., Murga, M., & González, M. C. (2015). Origin–destination

trips by purpose and time of day inferred from mobile phone data. In Transportation research part c: emerging technologies (Vol. 58, pp. 240-250).

Bibliography 185

Alibabai, H., & Mahmassani, H. (2008). Dynamic origin-destination demand estimation using turning movement counts. Transportation Research Record: Journal of the Transportation Research Board(2085), 39-48. Retrieved from

Allahviranloo, M., & Recker, W. (2015). Mining activity pattern trajectories and

allocating activities in the network. In Transportation (pp. 1-19). Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric

regression. The American Statistician, 46(3), 175-185. Retrieved from Andrienko, G., Andrienko, N., Fuchs, G., & Wood, J. (2017). Revealing patterns and

trends of mass mobility through spatial and temporal abstraction of origin-destination movement data. IEEE Transactions on Visualization & Computer Graphics(1), 1-1. Retrieved from

Antoniou, C., Barceló, J., Breen, M., Bullejos, M., Casas, J., Cipriani, E., . . . Marzano,

V. (2016). Towards a generic benchmarking platform for origin–destination flows estimation/updating algorithms: Design, demonstration and validation. Transportation Research Part C: Emerging Technologies, 66, 79-98. Retrieved from

Antoniou, C., Ben-Akiva, M., & Koutsopoulos, H. (2004). Incorporating automated

vehicle identification data into origin-destination estimation. Transportation Research Record: Journal of the Transportation Research Board(1882), 37-44. Retrieved from

Antoniou, C., Ben-Akiva, M., & Koutsopoulos, H. N. (2006). Dynamic traffic demand

prediction using conventional and emerging data sources. In IEE Proceedings-Intelligent Transport Systems (Vol. 153, pp. 97-104): IET.

Antoniou, C., Ciuffo, B., Montero, L., Casas, J., Barcelò, J., Cipriani, E., . . . Bullejos,

M. (2014). A framework for the benchmarking of OD estimation and prediction algorithms. In 93rd Transportation Research Board Annual Meeting.

Asakura, Y., Hato, E., & Kashiwadani, M. (2000). Origin-destination matrices

estimation model using automatic vehicle identification data and its application to the Han-Shin expressway network. Transportation, 27(4), 419-438. Retrieved from

ASGS. (2017). Australian Statistical Geography Standard (ASGS). Retrieved from

http://www.abs.gov.au/websitedbs/D3310114.nsf/home/Australian+Statistical+Geography+Standard+(ASGS)

Ashok, K. (1996). Estimation and prediction of time-dependent origin-destination

flows. In Doctoral Dissertation. Ashok, K., & Ben-Akiva, M. E. (2000). Alternative approaches for real-time

estimation and prediction of time-dependent origin–destination flows. Transportation Science, 34(1), 21-36. Retrieved from

Bibliography 186

Ashok, K., & Ben-Akiva, M. E. (2002). Estimation and prediction of time-dependent

origin-destination flows with a stochastic mapping to path flows and link flows. Transportation Science, 36(2), 184-198. Retrieved from

ATAP. (2016a). Australian Transport Assessment and Planning Guidelines. Retrieved

from https://atap.gov.au/tools-techniques/travel-demand-modelling/files/T1_Travel_Demand_Modelling.pdf

ATAP. (2016b). Overview of transport modelling. Retrieved from

https://atap.gov.au/tools-techniques/travel-demand-modelling/2-overview.aspx

Australian Transport Assessment and Planning (ATAP). (2017). Retrieved from

https://atap.gov.au/tools-techniques/travel-demand-modelling/1-introduction.aspx

Balakrishna, R., Ben-Akiva, M., & Koutsopoulos, H. (2007). Offline calibration of

dynamic traffic assignment: simultaneous demand-and-supply estimation. Transportation Research Record: Journal of the Transportation Research Board(2003), 50-58. Retrieved from

Bar-Gera, H., Mirchandani, P. B., & Wu, F. (2006). Evaluating the assumption of

independent turning probabilities. Transportation Research Part B: Methodological, 40(10), 903-916. Retrieved from

Barceló Bugeda, J., Montero Mercadé, L., Marqués, L., & Carmona, C. (2010). A

Kalman-filter approach for dynamic OD estimation in corridors based on bluetooth and Wi-Fi data collection. In 12th World Conference on Transportation Research WCTR, 2010.

Barceló, J., Gilliéron, F., Linares, M., Serch, O., & Montero, L. (2012). Exploring link

covering and node covering formulations of detection layout problem. Transportation Research Record: Journal of the Transportation Research Board(2308), 17-26. Retrieved from

Barceló, J., Montero, L., Bullejos, M., Linares, M., & Serch, O. (2013). Robustness

and Computational Efficiency of Kalman Filter Estimator of Time-Dependent Origin-Destination Matrices: Exploiting Traffic Measurements from Information and Communications Technologies. Transportation Research Record: Journal of the Transportation Research Board(2344), 31-39. Retrieved from

Barceló, J., Montero, L., Bullejos, M., Serch, O., & Carmona, C. (2013). A Kalman

filter approach for exploiting bluetooth traffic data when estimating time-dependent OD matrices. Journal of Intelligent Transportation Systems, 17(2), 123-141. Retrieved from

Battiti, R. (1989). Accelerated backpropagation learning: Two optimization methods.

Complex systems, 3(4), 331-342. Retrieved from

Bibliography 187

Bauer, D., Richter, G., Asamer, J., Heilmann, B., Lenz, G., & Kölbl, R. (2018). Quasi-

Dynamic Estimation of OD Flows From Traffic Counts Without Prior OD Matrix. IEEE Transactions on Intelligent Transportation Systems, 19(6), 2025-2034. Retrieved from

Behara, K. N., Bhaskar, A., & Chung, E. (2018, 7- 11 January 2018). Classification of

typical Bluetooth OD matrices based on structural similarity of travel patterns-Case study on Brisbane city. In Transportation Research Board 97th Annual Meeting.

Bell, M. G. (1983). The estimation of an origin-destination matrix from traffic counts.

Transportation Science, 17(2), 198-217. Retrieved from Bell, M. G. (1991). The estimation of origin-destination matrices by constrained

generalised least squares. Transportation Research Part B: Methodological, 25(1), 13-22. Retrieved from

Ben-Akiva, M. E., Gao, S., Wei, Z., & Wen, Y. (2012). A dynamic traffic assignment

model for highly congested urban networks. Transportation research part C: emerging technologies, 24, 62-82. Retrieved from

Bera, S., & Rao, K. (2011). Estimation of origin-destination matrix from traffic counts:

the state of the art. European Transport - Trasporti Europei, 49, 2-23. Retrieved from

Bhaskar, A., & Chung, E. (2013). Fundamental understanding on the use of Bluetooth

scanner as a complementary transport data. Transportation Research Part C: Emerging Technologies, 37, 42-72. Retrieved from

Bhaskar, A., Qu, M., & Chung, E. (2015). Bluetooth vehicle trajectory by fusing

bluetooth and loops: motorway travel time statistics. IEEE Transactions on Intelligent Transportation Systems, 16(1), 113-122. Retrieved from

Bhaskar, A., Qu, M., Nantes, A., Miska, M., & Chung, E. (2015). Is bus

overrepresented in Bluetooth MAC scanner data? Is MAC-ID really unique? International Journal of Intelligent Transportation Systems Research, 13(2), 119-130. Retrieved from

Bierlaire, M. (2002). The total demand scale: a new measure of quality for static and

dynamic origin–destination trip tables. In Transportation Research Part B: Methodological (Vol. 36, pp. 837-850).

Bierlaire, M., & Crittin , F. (2004). An efficient algorithm for real-time estimation and

prediction of dynamic OD tables. Operations Research, 52(1), 116-127. Retrieved from

Bierlaire, M., & Toint, P. L. (1995). Meuse: An origin-destination matrix estimator

that exploits structure. Transportation Research Part B: Methodological, 29(1), 47-60. Retrieved from

Bibliography 188

Blogg, M., Semler, C., Hingorani, M., & Troutbeck, R. (2010). Travel time and origin-

destination data collection using Bluetooth MAC address readers. In Australasian Transport Research Forum (pp. 1-15).

Bluetooth data from Brisbane City Council. (2016). Retrieved from Brooks, A. C., Zhao, X., & Pappas, T. N. (2008). Structural similarity quality metrics

in a coding context: Exploring the space of realistic distortions. IEEE Transactions on image processing, 17(8), 1261-1273. Retrieved from

BSTM (Cartographer). (2015). Traffic Analysis Zonal network on Google Earth. BSTM. (2016). Brisbane Strategic Transport Demand Model. Retrieved from Bullejos, M., Barceló Bugeda, J., & Montero Mercadé, L. (2014). A DUE based bilevel

optimization approach for the estimation of time sliced OD matrices. In Proceedings of the International Symposia of Transport Simulation (ISTS) and the International Workshop on Traffic Data Collection and its Standardisation (IWTDCS), ISTS'14 and IWTCDS'14.

Calabrese, F., Di Lorenzo, G., Liu, L., & Ratti, C. (2011). Estimating origin-

destination flows using mobile phone location data. IEEE Pervasive Computing, 10(4), 0036-0044. Retrieved from

Cantelmo, G., Cipriani, E., Gemma, A., & Nigro, M. (2014). An adaptive bi-level

gradient procedure for the estimation of dynamic traffic demand. IEEE Transactions on Intelligent Transportation Systems, 15(3), 1348-1361. Retrieved from

Carpenter, C., Fowler, M., & Adler, T. (2012). Generating route-specific origin-

destination tables using Bluetooth technology. Transportation Research Record: Journal of the Transportation Research Board(2308), 96-102. Retrieved from

Cascetta, E. (1984). Estimation of trip matrices from traffic counts and survey data: a

generalized least squares estimator. Transportation Research Part B: Methodological, 18(4-5), 289-299. Retrieved from

Cascetta, E., Inaudi, D., & Marquis, G. (1993). Dynamic estimators of origin-

destination matrices using traffic counts. Transportation science, 27(4), 363-373. Retrieved from

Cascetta, E., & Nguyen, S. (1988). A unified framework for estimating or updating

origin/destination matrices from traffic counts. Transportation Research Part B: Methodological, 22(6), 437-455. Retrieved from

Cascetta, E., Papola, A., Marzano, V., Simonelli, F., & Vitiello, I. (2013). Quasi-

dynamic estimation of o–d flows from traffic counts: Formulation, statistical

Bibliography 189

validation and performance analysis on real data. Transportation Research Part B: Methodological, 55, 171-187. Retrieved from

Cascetta, E., & Postorino, M. N. (2001). Fixed point approaches to the estimation of

O/D matrices using traffic counts on congested networks. Transportation science, 35(2), 134-147. Retrieved from

Chang, G.-L., & Wu, J. (1994). Recursive estimation of time-varying origin-

destination flows from traffic counts in freeway corridors. Transportation Research Part B: Methodological, 28(2), 141-160. Retrieved from

Cheung, W., Wong, S., & Tong, C. (2006). Estimation of a time‐dependent origin‐

destination matrix for congested highway networks. Journal of advanced transportation, 40(1), 95-117. Retrieved from

Chitturi, M. V., Shaw, J. W., Campbell IV, J. R., & Noyce, D. A. (2014). Validation

of Origin–Destination Data from Bluetooth Reidentification and Aerial Observation. Transportation Research Record, 2430(1), 116-123. Retrieved from

Chung, E. (2003). Classification of traffic pattern. In Proc. of the 11th World Congress

on ITS (pp. 687-694). Chung, E. (2016). Use of Bluetooth and Wifi for Measuring Vehicles and People

Movements, PATREC. Retrieved from http://www.patrec.uwa.edu.au/announcements/use-of-bluetooth-and-wifi-for-measuring-vehicles-and-people-movements

Cipriani, E., Florian, M., Mahut, M., & Nigro, M. (2010). Investigating the efficiency

of a gradient approximation approach for the solution of dynamic demand estimation problems. Chapters. Retrieved from

Cipriani, E., Florian, M., Mahut, M., & Nigro, M. (2011). A gradient approximation

approach for adjusting temporal origin–destination matrices. Transportation Research Part C: Emerging Technologies, 19(2), 270-282. Retrieved from

Ciuffo, B., & Punzo, V. (2010). Verification of traffic micro-simulation model

calibration procedures: Analysis of goodness-of-fit measures. In Proceeding of the 89th Annual Meeting of the Transportation Research Record, Washington, DC.

Cools, M., Moons, E., & Wets, G. (2010). Assessing the quality of origin-destination

matrices derived from activity travel surveys: Results from a Monte Carlo experiment. Transportation Research Record: Journal of the Transportation Research Board(2183), 49-59. Retrieved from

Cooper, R. (1977). Abstract Structure and the Indian Rāga System. In

Ethnomusicology (pp. 1-32).

Bibliography 190

Crawford, F., Watling, D. P., & Connors, R. D. (2018). Identifying road user classes based on repeated trip behaviour using Bluetooth data. Transportation research part A: policy and practice, 113, 55-74. Retrieved from

Cremer, M., & Keller, H. (1981). Dynamic identification of flows from traffic counts

at complex intersections. In Proc., 8th International Symposium on Transportation and Traffic Theory (pp. 121-142): University of Toronto Press, Canada.

Cremer, M., & Keller, H. (1987). A new class of dynamic methods for the

identification of origin-destination flows. Transportation Research Part B: Methodological, 21(2), 117-132. Retrieved from

Dandy, G., Daniell, T., Foley, B., & Warner, R. (2017). Planning and design of

engineering systems: CRC Press. de Dios Ortuzar, J., & Willumsen, L. G. (2011). Modelling transport: John Wiley &

Sons. De Haas, M. (2016). Travel pattern transitions: A study on the effects of life events on

changes in travel patterns. Retrieved from Dictionary. (Ed.) (2018) Cambridge online dictionary. Cambridge, UK. Dixit, V., Gardner, L. M., & Waller, S. T. (2013). Strategic User Equilibrium

Assignment Under Trip Variability. In Transportation Research Board 92nd Annual Meeting (Vol. 9).

Dixon, M. P. (2000). Incorporation of automatic vehicle identification data into the

synthetic OD estimation process. Ph.D. thesis, Texas A&M University, College Station, TX.

Dixon, M. P., & Rilett, L. (2002). Real‐Time OD Estimation Using Automatic Vehicle

Identification and Traffic Count Data. Computer‐Aided Civil and Infrastructure Engineering, 17(1), 7-21. Retrieved from

Djukic, T. (2014). Dynamic OD demand estimation and prediction for dynamic traffic

management. In PhD Thesis. Djukic, T., Barceló Bugeda, J., Bullejos, M., Montero Mercadé, L., Cipriani, E., van

Lint, H., & Hoogendoorn, S. (2015). Advanced traffic data for dynamic od demand estimation: The state of the art and benchmark study. In TRB 94th Annual Meeting Compendium of Papers (pp. 1-16).

Djukic, T., Hoogendoorn, S., & Van Lint, H. (2013). Reliability assessment of dynamic

OD estimation methods based on structural similarity index. Retrieved from Djukic, T., Van Lint, J., & Hoogendoorn, S. (2012). Application of principal

component analysis to predict dynamic origin-destination matrices.

Bibliography 191

Transportation Research Record: Journal of the Transportation Research Board(2283), 81-89. Retrieved from

Dong, H., Wu, M., Ding, X., Chu, L., Jia, L., Qin, Y., & Zhou, X. (2015). Traffic zone

division based on big data from mobile phone base stations. In Transportation Research Part C: Emerging Technologies (Vol. 58, pp. 278-291).

Elbatta, M. T., & Ashour, W. M. (2013). A dynamic method for discovering density

varied clusters. Int. Journal of Signal Processing, Image Processing, and Pattern Recognition, 6(1), 123-134. Retrieved from

Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for

discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, pp. 226-231).

Fisk, C. (1989). Trip matrix estimation from link traffic counts: The congested network

case. Transportation Research Part B: Methodological, 23(5), 331-336. Retrieved from

Fisk, C. S., & Boyce, D. E. (1983). A note on trip matrix estimation from link traffic

count data. Transportation Research Part B: Methodological, 17(3), 245-250. Retrieved from

Florian, M., & Chen, Y. (1995). A Coordinate Descent Method for the Bi‐level O–D

Matrix Adjustment Problem. International Transactions in Operational Research, 2(2), 165-179. Retrieved from

Frederix, R., Viti, F., & Tampère, C. M. (2011). A hierarchical approach for dynamic

origin-destination matrix estimation on large-scale congested networks. In 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC) (pp. 1543-1548): IEEE.

Frederix, R., Viti, F., & Tampère, C. M. (2013). Dynamic origin–destination

estimation in congested networks: theoretical findings and implications in practice. Transportmetrica A: Transport Science, 9(6), 494-513. Retrieved from

Friedrich, M., Immisch, K., Jehlicka, P., Otterstätter, T., & Schlaich, J. (2010).

Generating origin-destination matrices from mobile phone trajectories. Transportation Research Record: Journal of the Transportation Research Board(2196), 93-101. Retrieved from

Gan, L., Yang, H., & Wong, S. C. (2005). Traffic counting location and error bound

in origin-destination matrix estimation problems. Journal of Transportation Engineering, 131(7), 524-534. Retrieved from

Gazis, D. C., & Knapp, C. H. (1971). On-line estimation of traffic densities from time-

series of flow and speed data. Transportation Science, 5(3), 283-301. Retrieved from

Bibliography 192

Gong, L., Liu, X., Wu, L., & Liu, Y. (2016). Inferring trip purposes and uncovering travel patterns from taxi trajectory data. Cartography and Geographic Information Science, 43(2), 103-114. Retrieved from

Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A.-L. (2008). Understanding individual

human mobility patterns. nature, 453(7196), 779. Retrieved from Guo, D., Zhu, X., Jin, H., Gao, P., & Andris, C. (2012). Discovering spatial patterns

in origin‐destination mobility data. Transactions in GIS, 16(3), 411-429. Retrieved from

Gur, Y. J. (1980a). Estimation of an origin-destination trip table based on observed

link volumes and turning movements. Executive summary. Retrieved from Gur, Y. J. (1980b). ESTIMATION OF AN ORIGIN-DESTINATION TRIP TABLE

BASED ON OBSERVED LINK VOLUMES AND TURNING MOVEMENTS. EXECUTIVE SUMMARY. Retrieved from

Hai, Y., Akiyama, T., & Sasaki, T. (1998). Estimation of time-varying origin-

destination flows from traffic counts: A neural network approach. Mathematical and computer modelling, 27(9), 323-334. Retrieved from

Hazelton, M. L. (2000). Estimation of origin–destination matrices from link flows on

uncongested networks. Transportation Research Part B: Methodological, 34(7), 549-566. Retrieved from

Heeringa, W. J. (2004). Measuring dialect pronunciation differences using

Levenshtein distance. Citeseer. Hensher, D. A. (1976). The structure of journeys and nature of travel patterns. In

Environment and Planning A (Vol. 8, pp. 655-672). Hollander, Y., & Liu, R. (2008). The principles of calibrating traffic microsimulation

models. Transportation, 35(3), 347-362. Retrieved from Hu, S. (1996). An adaptive kalman filtering algorithm for the dynamic estimation and

prediction of freeway origin-destination matrices (Order No. 9725558). Available from ProQuest Dissertations & Theses Global. (304264559). . Retrieved from

Huang, T.-q., Yu, Y.-q., Li, K., & Zeng, W.-f. (2009). Reckon the parameter of

DBSCAN for multi-density data sets with constraints. In Artificial Intelligence and Computational Intelligence, 2009. AICI'09. International Conference on (Vol. 4, pp. 375-379): IEEE.

Iqbal, M. S., Choudhury, C. F., Wang, P., & González, M. C. (2014). Development of

origin–destination matrices using mobile phone call data. In Transportation Research Part C: Emerging Technologies (Vol. 40, pp. 63-74).

Bibliography 193

Jiang, S., Ferreira, J., & González, M. C. (2017). Activity-based human mobility patterns inferred from mobile phone data: A case study of Singapore. In IEEE Transactions on Big Data (Vol. 3, pp. 208-219).

Jornsten, K., & Nguyen, S. (1979). On the estimation of a trip matrix from network

data. Publication No. 153, Centre de Recherche sur les Transports, Universite~ de Montreal, Montreal. Retrieved from

Jörnsten, K., & Wallace, S. W. (1993). Overcoming the (apparent) problem of

inconsistency in origin-destination matrix estimations. Transportation science, 27(4), 374-380. Retrieved from

Kang, Y. (1999). Estimation and prediction of dynamic origin-destination (OD)

demand and system consistency control for real-time dynamic traffic assignment operation.

Kantorovich, L. V. (1942). On the translocation of masses. In Dokl. Akad. Nauk. USSR

(NS) (Vol. 37, pp. 199-201). Khoei, A. M., Bhaskar, A., & Chung, E. (2013). Travel time prediction on signalised

urban arterials by applying SARIMA modelling on Bluetooth data. In 36th Australasian transport research forum (ATRF) 2013.

Kieu, L.-M., Bhaskar, A., & Chung, E. (2015). A modified Density-Based Scanning

Algorithm with Noise for spatial travel pattern analysis from Smart Card AFC data. Transportation Research Part C: Emerging Technologies, 58, 193-207. Retrieved from

Kieu, L. M., Bhaskar, A., & Chung, E. (2012). Bus and car travel time on urban

networks: integrating bluetooth and bus vehicle identification data. Retrieved from

Kim, H., Baek, S., & Lim, Y. (2001). Origin-destination matrices estimated with a

genetic algorithm from link traffic counts. Transportation Research Record: Journal of the Transportation Research Board(1771), 156-163. Retrieved from

Kim, S.-J., Kim, W., & Rilett, L. (2005). Calibration of microsimulation models using

nonparametric statistical techniques. Transportation Research Record: Journal of the Transportation Research Board(1935), 111-119. Retrieved from

Kroeber, A. L. (1943). Structure, function and pattern in biology and anthropology.

The Scientific Monthly, 56(2), 105-113. Retrieved from Kwon, J., & Varaiya, P. (2005). Real-time estimation of origin-destination matrices

with partial trajectories from electronic toll collection tag data. Transportation Research Record: Journal of the Transportation Research Board(1923), 119-126. Retrieved from

Laharotte, P.-A., Billot, R., Come, E., Oukhellou, L., Nantes, A., & El Faouzi, N.-E.

(2015). Spatiotemporal analysis of Bluetooth data: Application to a large urban

Bibliography 194

network. IEEE Transactions on Intelligent Transportation Systems, 16(3), 1439-1448. Retrieved from

Lee, J.-G., Han, J., Li, X., & Gonzalez, H. (2008). TraClass: trajectory classification

using hierarchical region-based and trajectory-based clustering. Proceedings of the VLDB Endowment, 1(1), 1081-1094. Retrieved from

Lee, M., & Sohn, K. (2015). Inferring the route-use patterns of metro passengers based

only on travel-time data within a Bayesian framework using a reversible-jump Markov chain Monte Carlo (MCMC) simulation. Transportation Research Part B: Methodological, 81, 1-17. Retrieved from

Lee, M. S., & McNally, M. G. (2003). On the structure of weekly activity/travel

patterns. Transportation Research Part A: Policy and Practice, 37(10), 823-839. Retrieved from

Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and

reversals. In Soviet physics doklady (Vol. 10, pp. 707-710). Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for

stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 661-670): ACM.

Lo, H.-P., & Chan, C.-P. (2003). Simultaneous estimation of an origin–destination

matrix and link choice proportions using traffic counts. Transportation Research Part A: Policy and Practice, 37(9), 771-788. Retrieved from

Lu, L., Xu, Y., Antoniou, C., & Ben-Akiva, M. (2015). An enhanced SPSA algorithm

for the calibration of Dynamic Traffic Assignment models. Transportation Research Part C: Emerging Technologies, 51, 149-166. Retrieved from

Lu, Z., Rao, W., Wu, Y. J., Guo, L., & Xia, J. (2015). A Kalman filter approach to

dynamic OD flow estimation for urban road networks using multi‐sensor data. Journal of Advanced Transportation, 49(2), 210-227. Retrieved from

Lundgren, J. T., & Peterson, A. (2008a). A heuristic for the bilevel origin–destination-

matrix estimation problem. Transportation Research Part B: Methodological, 42(4), 339-354. Retrieved from

Lundgren, J. T., & Peterson, A. (2008b). A heuristic for the bilevel origin–destination-

matrix estimation problem. In Transportation Research Part B: Methodological (Vol. 42, pp. 339-354).

Ma, W., & Qian, Z. S. (2018). Statistical inference of probabilistic origin-destination

demand using day-to-day traffic data. In Transportation Research Part C: Emerging Technologies (Vol. 88, pp. 227-256).

Bibliography 195

Maher, M. (1983). Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach. Transportation Research Part B: Methodological, 17(6), 435-447. Retrieved from

Maher, M. (1998). Algorithms for logit-based stochastic user equilibrium assignment.

Transportation Research Part B: Methodological, 32(8), 539-549. Retrieved from

Maher, M. J., Zhang, X., & Van Vliet, D. (2001). A bi-level programming approach

for trip matrix estimation and traffic control problems with stochastic user equilibrium link flows. Transportation Research Part B: Methodological, 35(1), 23-40. Retrieved from

Manual, T. A. (1964). Bureau of public roads. In US Department of Commerce. Martin, W. A., & McGuckin, N. A. (1998). Travel estimation techniques for urban

planning (Vol. 365): National Academy Press Washington, DC. Marzano, V., Papola, A., Simonelli, F., & Papageorgiou, M. (2018). A Kalman Filter

for Quasi-Dynamic od Flow Estimation/Updating. IEEE Transactions on Intelligent Transportation Systems(99), 1-9. Retrieved from

Masip, D., Djukic, T., Breen, M., & Casas, J. (2018). Efficient OD Matrix Estimation

Based on Metamodel for Nonlinear Assignment Function. Paper presented at Australasian Transport Research Forum 2018 Proceedings, Darwin, Australia.

McNally, M. G. (2008). The four step model. Center for Activity Systems Analysis.

Retrieved from Michau, G. (2016). Link dependent origin-destination matrix estimation: nonsmooth

convex optimisation with Bluetooth-inferred trajectories. Université de Lyon. Michau, G., Nantes, A., Bhaskar, A., Chung, E., Abry, P., & Borgnat, P. (2017).

Bluetooth data in an urban context: Retrieving vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems, 18(9), 2377-2386. Retrieved from

Michau, G., Nantes, A., & Chung, E. (2013). Towards the retrieval of accurate OD

matrices from Bluetooth data: lessons learned from 2 years of data. Retrieved from

Michau, G., Nantes, A., Chung, E., Abry, P., & Borgnat, P. (2014, 17-18 February

2014). Retrieving trip information from a discrete detectors network: The case of Brisbane Bluetooth detectors. In 32nd Conference of Australian Institutes of Transport Research (CAITR 2014).

Michau, G., Pustelnik, N., Borgnat, P., Abry, P., Nantes, A., Bhaskar, A., & Chung,

E. (2016). A Primal-Dual Algorithm for Link Dependent Origin Destination Matrix Estimation. arXiv preprint arXiv:1604.00391. Retrieved from

Bibliography 196

Michau, G., Pustelnik, N., Borgnat, P., Abry, P., Nantes, A., Bhaskar, A., & Chung, E. (2017). A primal-dual algorithm for link dependent origin destination matrix estimation. IEEE Transactions on Signal and Information Processing over Networks, 3(1), 104-113. Retrieved from

Mishalani, R. G., Coifman, B., & Gopalakrishna, D. (2002). Evaluating Real-Time

Origin-Destination Flow Estimation Using Remote Sensing Based Surveillance Data. In Proceeding of the 7th International Conference on the Applications of Advanced Technology in Transportation, ASCE, Cambridge, MA.

Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. Histoire de

l'Académie Royale des Sciences de Paris, 177, 666-704. Retrieved from Nanda, D. (1997). A Method to Enhance the Performance of Synthetic Origin-

Destination (OD) Trip Table Estimation Models. In Masters Thesis. Naoki, M. (2013). Geographic Boundaries of Population Census of Japan. Retrieved

from http://ggim.un.org/meetings/2013-ISGI-NY/documents/ESA_STAT_AC.279_P20_Geographic%20Boundaries%20of%20Population%20Census%20of%20Japan02.pdf

Naveh, K. S., & Kim, J. (2018). Urban Trajectory Analytics: Day-of-Week Movement

Pattern Mining Using Tensor Factorization. IEEE Transactions on Intelligent Transportation Systems. Retrieved from

Nguyen, S. (1976). A unified approach to equilibrium methods for traffic assignment.

In Traffic equilibrium methods (pp. 148-182): Springer. Nguyen, S. (1977). Estimating and OD Matrix from Network Data: a Network

Equilibrium Approach. Montréal: Université de Montréal, Centre de recherche sur les transports. Retrieved from

NPTEL. (2009). Data collection. I. Madras (Ed.) Retrieved from

https://nptel.ac.in/courses/105101087/06-Ltexhtml/p8/p.html Okutani, I., & Stephanedes, Y. J. (1984). Dynamic prediction of traffic volume through

Kalman filtering theory. Transportation Research Part B: Methodological, 18(1), 1-11. Retrieved from

Oliveira-Neto, F. M., Han, L. D., & Jeong, M. K. (2012). Online license plate matching

procedures using license-plate recognition machines and new weighted edit distance. Transportation research part C: emerging technologies, 21(1), 306-320. Retrieved from

Osorio, C. (2017). High-dimensional offline OD calibration for stochastic traffic

simulators of large-scale urban networks. In Technical Report: Massachusetts Institute of Technology.

Bibliography 197

Osorio, C. (2019). Dynamic origin-destination matrix calibration for large-scale network simulators. In Transportation Research Part C: Emerging Technologies (Vol. 98, pp. 186-206).

Oxford. (Ed.) (2018) English Oxford living Dictionaries. Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional

data: a review. Acm Sigkdd Explorations Newsletter, 6(1), 90-105. Retrieved from

Patriksson, M. (2015). The traffic assignment problem: models and methods: Courier

Dover Publications. Perera, K., Bhattacharya, T., Kulik, L., & Bailey, J. (2015). Trajectory inference for

mobile devices using connected cell towers. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 23): ACM.

Pollard, T., Taylor, N., van Vuren, T., & MacDonald, M. (2013). Comparing the

Quality of OD Matrices in Time and Between Data Sources. In Proceedings of the European Transport Conference.

Pool, B. (2014). Brisbane Strategic Transport Model-Multi-Modal (BSTM-MM):

model improvement program. In Australian Institute of Traffic Planning and Management (AITPM) National Conference, 2014, Adelaide, South Australia, Australia.

Rakha, H., & Van Aerde, M. (1995). Statistical analysis of day-to-day variations in

real-time traffic flow data. Transportation research record, 26-34. Retrieved from

Respati, W. S., Bhaskar, A., Zheng, Z., & Chung, E. (2017). Systematic Identification

of Peak Traffic Period. Paper presented at Australasian Transport Research Forum 2017 Proceedings, Auckland, New Zealand.

RNA. (2016). RNA Annual Report. Retrieved from

https://www.rna.org.au/media/881637/2016%20rna%20annual%20report.pdf Robillard, P. (1975). Estimating the OD matrix from observed link volumes.

Transportation Research, 9(2), 123-128. Retrieved from Ros-Roca, X., Montero, L., Schneck, A., & Barceló, J. (2018). Investigating the

performance of SPSA in simulation-optimization approaches to transportation problems. In Transportation research procedia (Vol. 34, pp. 83-90).

Ruiz de Villa, A., Casas, J., & Breen, M. (2014). OD matrix structural similarity:

Wasserstein metric. In Transportation Research Board 93rd Annual Meeting. SEQTS. (2010). South-East Queensland Travel Survey 2009. In Queensland

Transport and Main Roads.

Bibliography 198

Shafiei, M., Nazemi, M., & Seyedabrishami, S. (2015). Estimating time-dependent

origin–destination demand from traffic counts: extended gradient method. Transportation Letters, 7(4), 210-218. Retrieved from

Shafiei, S., Gu, Z., & Saberi, M. (2018). Calibration and validation of a simulation-

based dynamic traffic assignment model for a large-scale congested network. Simulation Modelling Practice and Theory, 86, 169-186. Retrieved from

Spall, J. C. (1992). Multivariate stochastic approximation using a simultaneous

perturbation gradient approximation. IEEE transactions on automatic control, 37(3), 332-341. Retrieved from

Spiess, H. (1987). A maximum likelihood model for estimating origin-destination

matrices. Transportation Research Part B: Methodological, 21(5), 395-412. Retrieved from

Spiess, H. (1990). A gradient approach for the OD matrix adjustment problem.

CENTRE DE RECHERCHE SUR LES TRANSPORTS PUBLICATION, 1(693), 2. Retrieved from

Stathopoulos, A., & Tsekeris, T. (2003). Framework for analysing reliability and

information degradation of demand matrices in extended transport networks. Transport Reviews, 23(1), 89-103. Retrieved from

Stathopoulos, A., & Tsekeris, T. (2005). Methodology for Validating Dynamic

Origin–Destination Matrix Estimation Models with Implications for Advanced Traveler Information Systems. Transportation Planning and Technology, 28(2), 93-112. Retrieved from

Steinbach, M., Ertöz, L., & Kumar, V. (2004). The challenges of clustering high

dimensional data. In New directions in statistical physics (pp. 273-309): Springer.

Stone, J. R., Han, Y., Khattak, A. J., Fan, Y., Huntsinger, L. F., & Bing Mei, P. (2007).

Guidelines for Developing Travel Demand Models: Medium Communities and Metropolitan Planing Organizations. Retrieved from

Tamin, O., & Willumsen, L. (1989). Transport demand model estimation from traffic

counts. Transportation, 16(1), 3-26. Retrieved from Tavana, H. (2001). Internally-Consistent Estimation of Dynamic Network Origin-

Destination Flows from Intelligent Transportation Systems Data Using Bi-Level Optimization. Ph.D. Dissertation, The University of Texas at Austin. Retrieved from

Tavassoli, A., Alsger, A., Hickman, M., & Mesbah, M. (2016a). How close the models

are to the reality? Comparison of Transit Origin-Destination Estimates with Automatic Fare Collection Data. In Australasian Transport Research Forum (ATRF), 38th, 2016, Melbourne, Victoria, Australia.

Bibliography 199

Tavassoli, A., Alsger, A., Hickman, M., & Mesbah, M. (2016b). How close the models

are to the reality? Comparison of Transit Origin-Destination Estimates with Automatic Fare Collection Data. In Australasian Transport Research Forum 2016 Proceedings.

TMR. (2016). BSTM data. In Department of Transport Main Roads. TMR. (2017). The Future of Transport. Retrieved from

https://blog.tmr.qld.gov.au/blog/2017/02/09/the-future-of-transport/ Toledo, T., & Kolechkina, T. (2013). Estimation of Dynamic Origin-Destination

Matrices Using Linear Assignment Matrix Approximations. IEEE Trans. Intelligent Transportation Systems, 14(2), 618-626. Retrieved from

Toledo, T., Koutsopoulos, H., Davol, A., Ben-Akiva, M., Burghout, W., Andréasson,

I., . . . Lundin, C. (2003). Calibration and validation of microscopic traffic simulation tools: Stockholm case study. Transportation Research Record: Journal of the Transportation Research Board(1831), 65-75. Retrieved from

Toole, J. L., Colak, S., Sturt, B., Alexander, L. P., Evsukoff, A., & González, M. C.

(2015). The path most traveled: Travel demand estimation using big data resources. Transportation Research Part C: Emerging Technologies, 58, 162-177. Retrieved from

Transport, B. o., & Economics, R. (Singer-songwriters). (2007). Estimating urban

traffic and congestion cost trends for Australian cities. On: Department of Transport and Regional Services Canberra.

Tympakianaki, A., Koutsopoulos, H. N., & Jenelius, E. (2018). Robust SPSA

algorithms for dynamic OD matrix estimation. Procedia computer science, 130(C), 57-64. Retrieved from

USCensus. (2019). 2005 Metropolitan and Micropolitan Statistical Areas (CBSAs) of

the United States and Puerto Rico. Retrieved from https://www2.census.gov/geo/maps/metroarea/us_wall/Dec2005/cbsa_us_1205.pdf?#.

Van Der Zijpp, N. (1997). Dynamic origin-destination matrix estimation from traffic

counts and automated vehicle identification data. Transportation Research Record: Journal of the Transportation Research Board(1607), 87-94. Retrieved from

Van Zuylen, H. (1978). The information minimising method: validity and applicability

to transport planning. New developments in modelling travel demand and urban systems. Retrieved from

Van Zuylen, H. J., & Willumsen, L. G. (1980). The most likely trip matrix estimated

from traffic counts. Transportation Research Part B: Methodological, 14(3), 281-293. Retrieved from

Bibliography 200

Verbas, İ., Mahmassani, H., & Zhang, K. (2011). Time-dependent origin-destination

demand estimation: Challenges and methods for large-scale networks with multiple vehicle classes. Transportation Research Record: Journal of the Transportation Research Board(2263), 45-56. Retrieved from

Villani, C. (2003). Topics in optimal transportation: American Mathematical Soc. Vogl, T. P., Mangis, J., Rigler, A., Zink, W., & Alkon, D. (1988). Accelerating the

convergence of the back-propagation method. Biological cybernetics, 59(4-5), 257-263. Retrieved from

Wang, W., Wan, H., & Chang, K.-H. (2016). Randomized block coordinate

descendant STRONG for large-scale Stochastic Optimization. In Winter Simulation Conference (WSC), 2016 (pp. 614-625): IEEE.

Wang, Y., Ma, X., Liu, Y., Gong, K., Henricakson, K. C., Xu, M., & Wang, Y. (2016).

A Two-Stage Algorithm for Origin-Destination Matrices Estimation Considering Dynamic Dispersion Parameter for Route Choice. PloS one, 11(1), e0146850. Retrieved from

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality

assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612. Retrieved from

Weijermars, W., & Van Berkum, E. (2005). Analyzing highway flow patterns using

cluster analysis. In Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE (pp. 308-313): IEEE.

Wen, T., Cai, C., Gardner, L., Dixit, V., & Waller, S. T. (2014). A Least Squares

Method For Origin-Destination Estimation Incorporating Variability of Day-to-Day Travel Demand. Retrieved from

Wild, D. (1997). Short-term forecasting based on a transformation and classification

of traffic volume time series. International Journal of Forecasting, 13(1), 63-72. Retrieved from

Willumsen, L. (1984a). Estimating time-dependent trip matrices from traffic counts.

In Ninth International Symposium on Transportation and Traffic Theory (pp. 397-411): VNU Science Press Utrecht.

Willumsen, L. (1984b). Estimating time-dependent trip matrices from traffic counts.

In Ninth International Symposium on Transportation and Traffic Theory, VNU Science Press (pp. 397-411).

Willumsen, L. G. (1978). Estimation of an OD Matrix from Traffic Counts–A Review.

Retrieved from Wilson, A. G. (1967). A statistical theory of spatial distribution models.

Transportation research, 1(3), 253-269. Retrieved from

Bibliography 201

Yang, C., Yan, F., & Xu, X. (2017). Daily metro origin-destination pattern recognition

using dimensionality reduction and clustering methods. In Intelligent Transportation Systems (ITSC), 2017 IEEE 20th International Conference on (pp. 548-553): IEEE.

Yang, H. (1995). Heuristic algorithms for the bilevel origin-destination matrix

estimation problem. In Transportation Research Part B: Methodological (Vol. 29, pp. 231-242).

Yang, H., Iida, Y., & Sasaki, T. (1991). An analysis of the reliability of an origin-

destination trip matrix estimated from traffic counts. Transportation Research Part B: Methodological, 25(5), 351-363. Retrieved from

Yang, H., Sasaki, T., Iida, Y., & Asakura, Y. (1992). Estimation of origin-destination

matrices from link traffic counts on congested networks. Transportation Research Part B: Methodological, 26(6), 417-434. Retrieved from

Yujian, L., & Bo, L. (2007). A normalized Levenshtein distance metric. IEEE

transactions on pattern analysis and machine intelligence, 29(6), 1091-1095. Retrieved from

Yun, I., & Park, B. (2005). Estimation of dynamic origin destination matrix: A genetic

algorithm approach. In Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE (pp. 522-527): IEEE.

Zhang, A., Kang, J. E., Axhausen, K. W., & Kwon, C. (2018). Multi-day activity-

travel pattern sampling based on single-day data. In 97th Annual Meeting of the Transportation Research Board (TRB 2018): TRB Annual Meeting.

Zhou, X. (2004). Dynamic origin-destination demand estimation and prediction for

off-line and on-line dynamic traffic assignment operation. Retrieved from Zhou, X., & Mahmassani, H. S. (2006). Dynamic origin-destination demand

estimation using automatic vehicle identification data. IEEE Transactions on intelligent transportation systems, 7(1), 105-114. Retrieved from

Zhou, X., & Mahmassani, H. S. (2007). A structural state space model for real-time

traffic origin–destination demand estimation and prediction in a day-to-day learning framework. Transportation Research Part B: Methodological, 41(8), 823-840. Retrieved from

Zhu, K. (2007). Time-dependent origin-destination estimation: Genetic algorithm-

based optimization with updated assignment matrix. KSCE Journal of Civil Engineering, 11(4), 199-207. Retrieved from

Appendices 202

Appendices

Appendix A

Methodology to develop B-OD matrix

The knowledge of trajectories can further help in developing Bluetooth based

OD matrices at scanner as well as at zonal level. The methodology to develop

Bluetooth-based OD matrix (B-OD) at zonal level is explained using flowchart shown

in the Figure A1.1.

To develop a B-OD matrix, raw Bluetooth data from a particular day is spatially

and temporally matched to define individual Bluetooth vehicle trajectories that are

further split into trips (Michau, et al., 2014). Here, the Bluetooth dataset for the study

date is downloaded from the BCC server and unique Device IDs are then identified.

Records are retrieved individually for each Device ID and are sorted based on time-

stamp detections for further analysis. Within the record of each Device ID, difference

in time-stamps between successive detections; that is, δ, is used to identify unique

trips/trajectories. If successive detections are from the same scanner, then the threshold

value of δ chosen to identify a new trip is 10 minutes. On the other hand, if the

successive detections are from different scanners, the threshold value of δ chosen is 30

minutes, to identify a new trip. The threshold values are chosen in accordance with a

similar study on Brisbane Bluetooth datasets by Michau et al. (2017). This way, all

individual trips/trajectories of each Device ID are identified, and are then further used

to infer OD trips at a scanner level to form the sOD matrix. The size of the sOD matrix

is 845 × 845, which is further transformed into B-OD matrix at either SA2 or SA3

levels. For this, the concordance between BMS location and SA zones are considered

from the BCC. The process is repeated over 415 days to generate the B-OD matrices

for each day.

Appendices 203

Figure A1.1: Methodology to develop B-OD matrix at zonal level

BCC Bluetooth dataset

Select Device ID

Retrieve the detection record (R) of Device ID and sort it

based on time-stamps

Identify trip ends and add trips of Device _ID into

OD flows for corresponding OD pairs

Exogenous information relating scanner

locations to SA2 zones

Is it the last Device_ID?

End

If successive detections are from the

same scanner

Select two successive detections from the first till the

last detections in record R

If δ >= 10 mins

Record a new trip for the Device ID

If δ >=30 minsYesNo

No

Yes Yes

No

Is it the last record for Device_ID?

Yes

No

Trajectories construction

Yes

No

Identify trip ends and add trips of Device _ID into

OD flows for corresponding OD pairs

Exogenous information relating scanner

locations to SA2 zones

YYYesYesY

OD matrix development

Appendices 204

Appendix B

Can the structure of Bluetooth trips be a proxy for true OD?

1. Background

Although Bluetooth observations capture only a fraction of the actual OD

demand, the observed trip distribution patterns can provide some insights into the real

travel behaviour within any network. Due to this capacity, the knowledge of Bluetooth

trips seems to have the potential to contribute to the OD matrix estimation process.

However, it is important to validate the knowledge of Bluetooth trips before any

practical implementation. Since the ground truth is unknown, it is not directly possible

to validate Bluetooth trips. However, in the absence of the availability of true OD

flows, confidence in the Bluetooth trips can be gained using surrogate measures that

are considered to be the structural properties of OD matrices (Antoniou, et al., 2016).

Because Bluetooth trips are only partial observations, they might not infer a

complete sequence of trajectories. However, at a macroscopic level, the structure of

Bluetooth trips might provide some valuable trip-related information.

In this context, few analyses were conducted to check if the “structure” of

Bluetooth trips preserves the integrity of the actual demand distribution and can be

used as a proxy for the actual distribution of trips. This hypothesis was validated by

testing the following four surrogate measures: a) screenline counts, b) the Brisbane

Strategic Transport Model (BSTM) (BSTM, 2016) travel time distributions, c) car

users (as drivers) taken from the 2016 Census (ABS, 2018), and d) BSTM OD flows.

2. Bluetooth vs Screenline counts

Screenlines divide the region into larger zones, running along natural barriers,

such as river sides, with few cross points across them or along major road

corridors/tunnels (NPTEL, 2009). They are primarily used to calibrate and validate the

base year transport models, such as BSTM (Pool, 2014). See Figure A2.1(a) for the

screenlines and the locations of screenline counts (blue coloured Google pins), and

Figure A2.1(b) for a closer look at the alignment of screenlines with the locations of

BMSs (red coloured circles) within the BCC region.

Appendices 205

(b) Figure A2.1: (a) Locations of screen line counts and screen lines for BCC region (b) Closer

look at the alignment of BMS locations with the screen lines (BSTM, 2016)

A good correlation between screenline counts and the number of Bluetooth

observations from BMS scanners upstream and downstream of the screenline count

location should enhance confidence in using Bluetooth data. For the current analysis,

selected locations of the screenline survey (blue coloured Google pins) and the

corresponding BMS locations (red coloured circles) are shown in Figure A2.2. For

each selected location (both directions of flow), BMS scanners were identified

upstream and downstream, such that the detected Bluetooth data should pass through

the screenline count location. Here, eight screenline count locations were selected, and

these locations were distributed throughout the study region (see Figure A2.2). The

data for comparison were weekday traffic from the year 2016.

Figure A2.2: Selected screen line and BMS locations

Figure A2.3 presents the correlation between the two counts. An increasing trend

between Bluetooth and screenline counts with R2 value = 0.7594 and correlation

coefficient (ρ) = 0.8714 was observed. A decent alignment with high correlation

Appendices 206

coefficients between both observations demonstrates the aptness of Bluetooth in

transport applications.

The penetration rate of Bluetooth counts; that is, the ratio of Bluetooth to

screenline counts for the selected locations is illustrated in Figure A2.3. The average

penetration rate is observed to be nearly 20% and spread between 15%-35% (see

Figure A2.4), which is consistent with 12%-30% for the year 2014 for Brisbane City

(Michau, 2016). Note that slope of the plot in Figure A2.3 also illustrates the

penetration rate of Bluetooth counts. Although traffic counts observations from both

data sources do not provide any “structure” or trip distribution related information, the

penetration rate of Bluetooth counts being consistent both in the literature and in the

current study provides some intial confidence on the Bluetooth observations.

Figure A2.3: Bluetooth vs screenline counts

Figure A2.4: The penetration rate of Bluetooth counts at the selected study locations

y = 0.1923x + 2.8692R² = 0.7594

0

500

1000

1500

2000

1000 2000 3000 4000 5000 6000 7000 8000 9000

Blue

toot

h co

unts

-AM

pea

k(7

AM

-9 A

M)

Screenline counts- AM peak (7AM-9 AM)

Correlation coefficient = 0.8714

0.15

0.34

0.19

0.19

0.16

0.22

0.15

0.19

0.20

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

Walter Taylor Bridge

Breakfast Creek Rd

William Jolly Bridge

Compton Rd

Sherwood Road

Wynnum Rd

South Pine Road

Beckett Road

Average penetration

Bluetooth penetration rate

Sele

ctiv

e sc

reen

line

loca

tions

Appendices 207

3. Trip length (travel time) distribution

Trip length distribution tables are generally used to compare and validate the

modelled trip distribution (such as gravity model) with that of the survey data (Stone

et al., 2007). The trip length distribution plots of existing demand models can also be

used to compare the distributions developed from other data sources. In this study, a

similar analysis was carried out to check the validity of the Bluetooth travel time

distribution plots with BSTM’s distribution.

First, the raw travel times from Bluetooth observations were filtered using a

median absolute deviation filter with f=2 (Kieu, Bhaskar, & Chung, 2012) and the

Bluetooth travel times were estimated for trips between SA2 zones. Similarly, the

BSTM travel times were aggregated from BSTM zonal level to SA2 level for a fair

comparison.

The travel time distribution plots for the Bluetooth observations and the BSTM

model are shown in Figure A2.5. Here, the x-axis represents the travel time in minutes

between SA2 zonal pairs and the y-axis represents the proportion of car trips during

the AM peak period. The mean travel time of trips observed from the BSTM and

Bluetooth were 15.87 minutes and 12.96 minutes, respectively, and their

corresponding standard deviations were 19.70 minutes and 15.33 minutes,

respectively. The highest proportion of car trips (represented by peaks) for Bluetooth

and the BSTM plots were at 10 and 15 minutes, respectively. The difference between

the two plots could be due to the modelling errors in the BSTM, or because the

Bluetooth travel time was the travel time between BMS scanner to scanner locations,

which was not consistent with that of BSTM zone to zone travel time. Another reason

for the negative shift was that Bluetooth detections at the first and last signalised

intersections were not necessarily captured. Thus, proper care must be taken when

using Bluetooth data. Nevertheless, the general shape of the distribution and the values

are acceptable for current surrogate comparison.

Appendices 208

Figure A2.5: BSTM vs Bluetooth travel time distribution

4. Trip productions: Bluetooth vs Census

In this section, Bluetooth trips produced from SA2 zones during the AM peak

period are compared to the 2016 Census “Method of travel to work” observations

(ABS, 2018). The following assumption was made before the comparison: Since most

of the Bluetooth trips were from the detections of in-built cars systems, they could be

considered as a proxy for car trips within the study region.

According to the 2016 Census, most work-based trips in Brisbane were made by

car (as driver) for commuting (75.3%) (ABS, 2017). Since most work-based trips are

generally observed during the AM peak period, car users (who preferred to travel to

work as drivers) from the 2016 Census data were used as a proxy for actual car trips

produced.

The comparison between trips produced by Bluetooth (x-axis) and car users (as

drivers) from the 2016 Census (y-axis) is demonstrated using a scatter plot in Figure

A2.6. Bluetooth observations were found to closely correspond to the 2016 Census

data, with a correlation coefficient (ρ) of 0.8467 and R2 value of 0.7168. Bluetooth

trips also constituted approximately 4.3% of the census car trips. Interestingly, this

observation is consistent with the average Bluetooth trips capture rate of 4.4%

validated by Chitturi et al. (2014).

0.000

0.050

0.100

0.150

0.200

0.250

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Prop

ortio

n of

trip

s

Travel time(Minutes)

BSTM Bluetooth

Appendices 209

Figure A2.6: Bluetooth vs 2016 Census – trips productions at SA2 level

5. BSTM OD flows vs Bluetooth based OD flows

In this section, BSTM OD flows are compared with Bluetooth based OD (B-OD)

flows at the SA3 level for the AM peak period. In practice, the BSTM base year OD

was generated using extensive modelling techniques. On the other hand, the B-OD

flows were developed through the inference of vehicles trajectories (see Appendix A

for the details of the methodology adopted for developing the B-OD matrix).

Because the B-OD flows are only a fraction of the actual OD flows and BSTM

flows represent scaled-up demand, this section first analyses the variation in the

capture rates of B-OD flows with respect to BSTM OD flows, and then compares both

through R2 and correlation coefficient.

The total number of BSTM OD flows and B-OD flows to be compared were

235,556 and 56,542, respectively. This implies that Bluetooth captured almost 24% of

the total BSTM flows (this value also lies in the range of 15%-35%; i.e., the penetration

rate of Bluetooth counts in Section 2). However, it must be noted that the capture rate

of Bluetooth OD flows was different from that with respect to counts, and varied for

different OD pairs due to many factors, such as distance, socio-economic

characteristics, etc. To provide an example of the variations, the comparison between

BSTM OD flows and B-OD flows is shown using Pareto distribution plots (see Figure

A2.7), where the x-axis represents the ratio of the Bluetooth to BSTM OD flows

( ) arranged in the order of their frequency; the y-axis (left) represents the

proportion of total OD pairs for different values of , and the y-axis (right)

R² = 0.7168

0

1000

2000

3000

4000

5000

6000

7000

0 200 400 600 800 1000 1200 1400

Car u

sers

/SA

2 fro

m C

ensu

s 20

16

Bluetooth trips produced at SA2 level

Correlation coefficient = 0.8467

Appendices 210

represents the cumulative percentage of OD pairs. Interestingly, 75% of the OD pairs

had varying between 5 to 35%. Compared to the capture rate of Bluetooth

counts (i.e. 15% - 35% from Section 2), the penetration rate of the OD flows had a

higher variation. Note that although BSTM is a modelled flow, for understanding

purposes, can be considered a proxy for the actual capture rates of the OD

flows.

Figure A2.7: Pareto distribution of the ratio of Bluetooth OD to BSTM OD flows

Nevertheless, a good correlation was observed between BSTM OD flows and B-

OD flows (ρ = 0.8878 in Figure A2.8). The line of fit between both OD flows also

shows a descent alignment with R2 = 0.7883, and the slope of the fit suggests that the

B-OD flows were nearly 25% of BSTM OD flows. Although there was a wide spread

of , a good correlation with BSTM OD provides more confidence in the

structure of Bluetooth trips.

Perc

enta

ge o

f OD

pai

rs

Appendices 211

Figure A2.8: B-OD flows vs BSTM OD flows

From the above comparisons of the OD matrix structural properties (over four

surrogate measures) it can be concluded that although Bluetooth observations are

partial and only constitute a sample, the structure of the Bluetooth trips is not bad and

probably it can be used as a proxy for the actual distribution of trips. However, in the

absence of the ground truth, and the discrepancies due to the statistical and model

errors in the Bluetooth and data from other sources that are difficult to disentangle, a

further detailed investigation is recommended for the future research.

y = 4.0379x + 114.47R² = 0.7883

0

1000

2000

3000

4000

5000

6000

0 200 400 600 800 1000 1200 1400 1600

BST

M O

D fl

ows

B-OD flows

Correlation Coefficient = 0.8878

Appendices 212

Appendix C

MATLAB optimisation code for B-OD/B-SP methods

clc clear all currentFolder = pwd; True_OD_matrix = load(fullfile(currentFolder, 'inputs', 'OD.txt')); W=size(True_OD_matrix,1)*size(True_OD_matrix,2); % Size of OD vector True_transpose = True_OD_matrix'; OD_True_Vector=True_trasnpose(:); % True OD vector Prior_OD_matrix=load(fullfile(currentFolder, 'inputs', Prior_OD_matrix.txt')); Prior_transpose = Prior_OD_matrix'; Prior_OD_vector=Prior_transpose(:);% Prior OD vector load (fullfile(currentFolder, 'inputs','zones.txt')); load (fullfile(currentFolder, 'inputs', 'ObsCounts.txt'));% Observed link flows load (fullfile(currentFolder, 'inputs','det_sec.txt')); % The IDs of loop detectors and corresponding links (sections) y_obs=ObsCounts(: , 2);% Observed Link counts OD=Prior_OD_matrix; OD_Tranp=OD'; OD_Vector = OD_Tranp (:); %% case=1 for B-OD method and case=2 for B-SP method

if case==1 load (fullfile(currentFolder, 'inputs','BOD_vector.mat'));% Vector of B-OD flows BOD_matrix = reshape(BOD_vector,size (zones,1),size(zones,1)); BOD_matrix=BOD_matrix'; % B-OD matrix pen=Ω*210;% Ω is the percentage number of connected OD pairs (excluding internal OD pairs). So, for 210 OD pairs, Ω=100%, for 168 OD pairs, Ω= 80%, and so on. [BpenStr,Bpen,OD_Vector,OD_ind] = Bluetooth_connected_ODpairs (OD, pen);% Refer to “Bluetooth_connected_ODpairs” function

elseif case==2

load (fullfile(currentFolder, 'inputs',' EndDet_Zone.txt')); % Look up table relating BMS at trip ends with zonal IDs load (fullfile(currentFolder, 'inputs',' Subpathfreq_obs.mat')); % The 1st column is for subpath flows; 2nd and 3rd (last) column for origin and destination zones Subpathflows_obs = Subpathfreq_obs (:,1); % Vector of subpath flows

end lambda = lambda_prior; % choose any prior step length as lambda_prior StrOD_Prior=corr2(Prior_OD_vector,OD_True_Vector); StrOD_BT = corr2(BOD_vector,OD_True_Vector); Obj_ite=[]; y_est_ite=[];Demand=OD_Vector;Values_Ite=[]; l_up=1.5;l_down=0.9; % chose l_up and l_down by trial and error

Appendices 213

[GSSI_PriorOD]=GSSI_computation (Prior_OD_matrix,True_OD_matrix); % Refer to “GSSI_computation” function Objective=2;% Objective=1 corresponds to the obj. function of traditional method and Objective=2 is for B-OD/B-path method for ite=1:20% the number of iterations

[OD_Id_Sno] = Aimsun_matrix (OD,zones); [terminal] = Aimsun (); % Refer to the function “Aimsun.m” system(terminal); % Executing “terminal” [extracted_data] = SQLITE(); % Refer to the function “SQLITE” extracted_data=cell2mat (extracted_data); diff=abs(OD_True_Vector - OD_Vector); SumdiffSq =sum((diff). *(diff)); RMSE_OD=sqrt(SumdiffSq/size(OD_True_Vector,1)); load('BNE.matrix');% BNE is the output from ‘AutoRun_BNE.py’ saved as a text file. Refer python script (AutoRun_BNE.py) in appendix D. [y_est, Sections, LinkPropMat] = Assignment (det_sec, extracted_data, BNE, OD_Id_Sno); % Refer the “assignment” function diff2=abs(y_obs - y_est); SumdiffSq2 =sum((diff2). *(diff2)); RMSE_linkflows=sqrt(SumdiffSq2/size(y_obs,1));

if case==1 [Obj, Gradient, StrBOD] = Obj_Grad (y_obs, y_est, LinkPropMat, Objective, BpenStr, OD_Vector, BOD_vector, pen); % Refer the “Obj_Grad” function

elseif case==2

BTraw = readtable("Det2DetDataALLDETECTOR.txt"); % this text file is output from Aimsun through a separately scripted API. It resembles the raw Bluetooth observations from BMSs. [Traj3_table] = BTpaths_secs (BTraw, det_sec, EndDet_Zone); % refer to “BTpaths_secs” function. [SubTraj3_table] = Subpathsanalysis (Traj3_table); % refer “Subpathsanalysis” function MLSPNo=1; % Only one Most Likely Subpath per OD pair is considered [SubMLP, SubPathFreq] = MostLikelySubpaths (SubTraj3_table, zones, det_sec, MLSPNo); % refer “MostLikelySubpaths” function Subpathprop = Sub_path_proportion_matrix (Subpathfreq_obs, SubPathFreq, OD_True_Vector, OD, zones); [Obj, Gradient, StrSP] = Obj_Grad_subpathflows (y_obs, y_est, PropMat, Subpathprop, Subpathflows_obs, Subpathflows_est, Objective);

end Obj_ite = [Obj_ite; Obj];

if size(Obj_ite,1)>1 if Obj<=Obj_ite(end-1)

lambda=lambda*l_up;

Appendices 214

else

lambda=lambda*l_down; % Deleting the parameter values of current iteration Demand (:, end)=[]; Obj_ite(end)=[];y_est_ite(:,end)=[];Values_Ite(end,:)=[];

% Setting the OD vector to previous iteration OD_Vector=Demand(:,end);

end end

OD_Vector=OD_Vector.*(1-lambda.*(Gradient));% Updating OD vector temp5=reshape(OD_Vector,[size(OD,1),size(OD,2)]); OD = temp5';% Reshaping OD vector into matrix

if case==1 values = [StrBOD, RMSE_OD, RMSE_linkflows, Obj];

elseif case==2 values = [StrSP, RMSE_OD, RMSE_linkflows, Obj]; end

Values_Ite=[Values_Ite; values]; y_est_ite=[y_est_ite, y_est]; Demand = [Demand, OD_Vector]; fopen ('matrix.txt','w'); % deleting flow values in the text file “matrix.txt” delete 'BNE.ang.sqlite'; % deleting the Aimsun sqlite database delete 'BNE.ang.old'; % deleting the Aimsun back-up delete 'BNE.matrix'; % deleting the assignment related text file, “matrix.txt”

if case==2 delete 'Det2DetDataALLDETECTOR.txt'; end end IteNo=length (Obj_ite); tempe2=reshape (Demand (:,IteNo),[size(OD,1),size(OD,2)]); Final_OD_matrix =tempe2';% This is the final estimated OD matrix [GSSI_OD] = GSSI_computation (Final_OD_matrix, True_OD_matrix);

Appendices 215

Appendix D

Functions

Function-1: Bluetooth_connected_ODpairs.m function [BpenStr, OD_ind] = Bluetooth_connected_ODpairs (OD, pen) diagind= []; for ind=1: size(OD,1) diagind = [diagind; size(OD,1)*(ind-1)+ind]; % indices of diagonal elements end SNO= [1:size(OD,1)*size(OD,1)]; filter = [~ismember(SNO, diagind)]; OD_ind =SNO(: , filter); % Indices of all OD pairs except that of diagonal BpenStr=datasample (OD_ind, pen-size(OD,1),'Replace', false); % Indices of OD pairs that are Bluetooth connected end Function-2: Aimsun_matrix.m function [OD_Id_Sno] = Aimsun_matrix(OD,zones)

for j=1:size(OD,2) m=OD; %Save the matrix into a .txt file compliant AIMSUN standards filename=strcat('matrix','.txt'); fid=fopen(filename,'w'); fprintf (fid, 'id\t'); fprintf (fid,'%i\t', zones); fprintf (fid,'\n'); fclose (fid); fid=fopen (filename, 'a'); for i=1: length (zones) fprintf (fid,'%i\t', zones(i)); fprintf (fid,'%5.2f\t', m(i,:)); fprintf (fid,'\n'); end fclose(fid); end OD_Id=[]; for i=1: length(zones) for j=1: length(zones) OD_Id=[OD_Id; zones(i) zones(j)]; end end OD_Id_Sno=[[1:size(OD,1)*size(OD,1)]' OD_Id];

end

Appendices 216

Function-3: Aimsun.m function [terminal] = Aimsun () AIMSPath= ('C:\Program Files\Aimsun\Aimsun Next 8.2\aconsole.exe'); Autorunpath= ('C:\.....\AutoRun_BNE.py'); Angpath= ('C:\.....\BNE.ang'); Detpath= ('C:\.....\det_sec.txt'); terminal =horzcat ('"', AIMSPath, '"',' -script ', '"',Autorunpath,'"',' ','"', Angpath, '"',' ','"', Detpath, '"' ); end Function-4: SQLITE.m function [extracted_data] = SQLITE () Sqlitepath=('C:\.....\BNE.ang.sqlite'); conn=database(Sqlitepath,'','','org.sqlite.JDBC','jdbc:sqlite:C:\......\BNE.ang.sqlite'); sqlQuery='SELECT oid, did, sid, ent, countveh, speed, occupancy, density FROM MIDETEC ORDER BY oid, ent;';% Selected fields of sqlite database extracted_data = fetch (conn, sqlQuery); close(conn); end Function-5: GSSI_computation.m function [GSSI] = GSSI_computation(X,Y) % In this function geographical windows are created for 15 x 15 OD matrix % Higher zones (hz) are created as follows: % hz1: Westend-Southbank-Highgate Hill, Ext5, Gabba % hz2: BNE Inner East, New Farm; hz3: Valley, Spring Hill, CBD i.e. 9, 14,2 % hz4: Newstead-Bowen Hills, Ext 2, Ext 4; hz5: Ext 1, Kelvin Grove-Herston % hz6: Red Hill-Milton-Auchenflower, Ext 3 hz=[1;2;3;4;5;6]; % 6 hzs Zonal_IDs=[3,2,5,4,6,4,1,3,5,2,4,6,3,1,1];% hz IDs for all 15 small zones that links to the order of OD matrix loaded into Aimsun that is not in the sequence of hz. hz=unique(Zonal_IDs);

for i=1: length(hz) for j=1: length(hz) Filter_Row=[ismember(Zonal_IDs, hz(i))]; Filter_Col=[ismember(Zonal_IDs, hz(j))]'; X_Geo=X(Filter_Row, Filter_Col); Y_Geo=Y(Filter_Row, Filter_Col);

mean_comp(i,j)=2*mean2(X_Geo)*mean2(Y_Geo)/ (mean2(X_Geo)^2+mean2(Y_Geo)^2);

std_comp(i,j)=2*std2(X_Geo)*std2(Y_Geo)/ (std2(X_Geo)^2+std2(Y_Geo)^2);

Covariance=cov(X_Geo,Y_Geo); str_comp(i,j)=Covariance(1,2)/(std2(X_Geo)*std2(Y_Geo)); SSIM(i,j)=mean_comp(i,j)*std_comp(i,j)*str_comp(i,j); end end GSSI=mean2(SSIM); end

Appendices 217

Function-6: Assignment.m function [y_est, Sections, PropMat] = Assignment (det_sec, extracted_data, BNE, OD_Id_Sno) Detectors=unique(extracted_data(:,1));% as 1st column represents detectors y_est=[]; Sections=[];

for q=1: length(Detectors) Filter=[det_sec(:,1)==Detectors(q)]; Filter0 = [extracted_data(:,1)== Detectors(q)];

temp0=extracted_data(Filter0,:); if sum(Filter)~=0

Sections = [Sections; det_sec(Filter,2)]; % Links equipped with detectors y_est = [y_est; max(temp0 (:,5))]; % User equilibrium link flows

end end

PropMat= zeros(24,225); % 24 Links and 225 OD pairs (including diagonals) for w=1: length(Sections)

Filter2 = [BNE(:,4)==Sections(w)]; K=BNE(Filter2,:); for c=1:size(K,1)

for d=1:size(OD_Id_Sno,1) if OD_Id_Sno(d,2)== K(c,1)& OD_Id_Sno(d,3)== K(c,2)

PropMat(w, OD_Id_Sno(d,1))=K(c,7); end end

end end

end Function-7: MostLikelySubpaths.m function [SubMLP, SubPathFreq] = MostLikelySubpaths (SubTraj3_table, zones, det_sec, MLSPNo) SubMLP= []; SubPathFreq = []; for z1=1: size(zones,1)

for z2=1: size(zones,1) if z1~=z2

Filter =[SubTraj3_table.Zorg==zones(z1) & SubTraj3_table.Zdest==zones(z2)]; if sum(Filter)>0 Org_trips=SubTraj3_table.trip_det(Filter,:); subpath_id=[];subpath_id_str=[];

for p=1:size(Org_trips,1) p1=cell2mat(Org_trips(p)); subpath_str=[];

for j=1:size(p1,1) result = strcat(num2str(p1(j))); subpath_str = [subpath_str result];

end

Appendices 218

subpath_id = [subpath_id; str2num(subpath_str)]; subpath_id_str = [subpath_id_str; {subpath_str}]; subpath_str=[];

end a = unique(subpath_id);% “a” gives all path IDs that are unique temp2 = sortrows([a, histc(subpath_id(:),a)],2); % temp2 gives the frequency of each unique subpath_id a1 = unique(subpath_id_str); MLPTab=table; MLPTab.a1=a1; a2=[];

for r=1:size(a1,1) MLPTab.a1_no(r)=str2num(char(a1(r))); a2= [a2; MLPTab.a1_no(r)];

end MLPTab_sort=sortrows(MLPTab,{'a1_no'}); a3 = zeros(size(a2));

for i = 1:size(a2,1) %Replaced a3(i) = sum(path_id(:) == a2(i));

end if size(temp2,1)>MLPNo

dp=MLPNo; else

dp=size(temp2,1); end

MLPath_id = [];MLPath_freq=[]; for i=size(temp2,1):-1:(size(temp2,1)-dp+1)

MLPath_id = [MLPath_id;temp2(i)]; MLPath_freq = [MLPath_freq;[temp2(i,2) zones(z1) zones(z2)]];

end DetIDs = det_sec_test(:,1); Dig_DetID = numel(num2str(fix(abs(DetIDs(1)))));%Dig_DetID = No of digits in a detector ID MLP_Det = [];MLPaths=[];

for i=1:size(MLPath_id,1); AllDet_in_Path= []; filter = [MLPTab.a1_no(:)==MLPath_id(i)]; y=char(MLPTab.a1(filter)); Dig_PathID = numel(y);% Dig_PathID = No of digits in a pathID

for j=1:Dig_DetID:Dig_PathID % here step length of 3 is taken because, the no of digits in each detector is 3

Det_in_path = sscanf(y(j:j+Dig_DetID-1), '%d');

Appendices 219

AllDet_in_Path=[AllDet_in_Path Det_in_path];

end te=struct('f1',AllDet_in_Path); MLPaths = [MLPaths; [struct2cell(te) zones(z1) zones(z2)]]; end

SubMLP = [SubMLP; MLPaths]; SubPathFreq=[SubPathFreq; MLPath_freq]; MLPaths=[]; end

end end

end end ------------------------------------------------------------------------------------------------------ Function-8: Subpath proportion matrix function (Subpathprop) = Sub_path_proportion_matrix (Subpathfreq_obs, SubPathFreq, OD_True_Vector, OD, zones) u=unique(Subpathfreq_obs(:,2:3),'rows');temp=[]; for f=1:size(u,1) filter1 = [Subpathfreq_obs(:,2)==u(f,1) & Subpathfreq_obs(:,3)==u(f,2)]; filter2=[SubPathFreq(:,2)==u(f,1) & SubPathFreq(:,3)==u(f,2)]; p1=Subpathfreq_obs (filter1,:); p2=SubPathFreq (filter2,:); if size(p1,1)==size(p2,1) temp = [temp; p2]; elseif size(p1,1)>size(p2,1) diff=size(p1,1)-size(p2,1); if size(p2,1)==0 temp2 = repmat([0 p1(1,2) p1(1,3)],diff,1); temp = [temp; p2; temp2]; else temp2 = repmat([0 p2(1,2) p2(1,3)],diff,1); temp = [temp; p2; temp2]; end else size(p1,1)<size(p2,1) temp = [temp; p2(1:size(p1,1),:)]; end end Subpathfreq_est = temp; Subpathflows_est = Subpathfreq_est(:,1); OD_noDiag=[];OD_Flows_Zones=[]; for q1=1:size(OD,1) for q2=1:size(OD,1) OD_Flows_Zones = [OD_Flows_Zones; OD(q1,q2), zones(q1), zones(q2)]; % Creating a OD vector with Origin and Dest IDs if q1~=q2

Appendices 220

OD_noDiag=[OD_noDiag; OD(q1,q2), zones(q1), zones(q2)]; end end end % constructing subpath proportion matrix based on estimated subpath flows Subpathprop=zeros(size(Subpathflows_obs,1),size(OD_True_Vector,1)); for q3=1:size(Subpathprop,1) filter1 = [OD_Flows_Zones(:,2)==Subpathfreq_est(q3,2)]; filter2= [OD_Flows_Zones(:,3)==Subpathfreq_est(q3,3)]; filter3=filter1.*filter2; ODflow = OD_Flows_Zones(logical(filter3),1); f=find(filter3==1); Subpathprop(q3,f)=Subpathfreq_est(q3,1)/ODflow; end Function-9: Obj_Grad.m function [Obj, Gradient, StrBOD] = Obj_Grad_rep (y_obs, y_est, PropMat, Objective, BpenStr, OD_Vector, BOD_vector, pen) OD_Vec_sample = OD_Vector(BpenStr',:); BOD_Vec_sample = BOD_vector(BpenStr',:); B1=BOD_Vec_sample-mean(BOD_Vec_sample); OD1=OD_Vec_sample-mean(OD_Vec_sample); OD_ones = repmat(1,size(OD_Vector,1),1); StrBOD = corr2(OD_Vec_sample, BOD_Vec_sample); c1=sum(B1.*OD1)/sum(OD1.^2); c2=sqrt(sum(B1.^2)*sum(OD1.^2)); Grad_Str_sample=(B1-c1*OD1)/c2; Grad_Str_x = zeros(size(OD_Vector,1),1);

for u=1: pen Grad_Str_x(BpenStr(u))= Grad_Str_sample (u);

end G1= (y_est-y_obs)*(2-StrBOD); G2= (2-StrBOD)*PropMat'; G3 = Grad_Str_x*(y_est-y_obs)';

if Objective = = 1 % Link flows deviation Obj=0.5*(sum(((y_est-y_obs).^2))); Gradient = (PropMat')*((y_est-y_obs));

return elseif Objective = = 2 % Link flows deviations and Structural deviation of OD flows

Obj=0.5*(sum(((y_est-y_obs)*(2-StrBOD)).^2)); Gradient = (G2-G3)*(G1);return end end

Appendices 221

Function-10: Obj_Grad_subpathflows.m function [Obj, Gradient, StrSP] = Obj_Grad_subpathflows (y_obs, y_est, PropMat, pathprop, Subpathflows_obs, Subpathflows_est, Objective)

Pathflowsdiff=Subpathflows_obs-mean(Subpathflows_obs); Estpathflowsdiff=Subpathflows_est-mean(Subpathflows_est); StrSP = corr2(Subpathflows_est, Subpathflows_obs); c1=sum(Pathflowsdiff.*Estpathflowsdiff)/sum(Estpathflowsdiff.^2); c2=sqrt(sum(Pathflowsdiff.^2)*sum(Estpathflowsdiff.^2)); Grad_Str_x=pathprop'*(Pathflowsdiff-c1*Estpathflowsdiff)/c2; G1= (y_est-y_obs)*(2-StrSP); G2= (2-StrSP)*PropMat'; G3 = Grad_Str_x*(y_est-y_obs)';

if Objective == 1 % Link flows Obj=0.5*(sum(((y_est-y_obs).^2))); Gradient = (PropMat')*((y_est-y_obs));return

elseif Objective == 2 % Link flows deviations and Structural deviation of Subpath flows

Obj=0.5*(sum(((y_est-y_obs)*(2-StrSP)).^2)); Gradient = (G2-G3)*(G1);return

end end ------------------------------------------------------------------------------------------------------

Appendices 222

Appendix E

Python script – Autorun.py

from __future__ import division import sys, os import sqlite3 from PyANGAimsun import * from PyANGBasic import * from PyANGConsole import * from PyANGKernel import * from PyANGBasic import * from PyMesoPlugin import * IdRep =[486954] #Simulation Replication ID IdScenario = 479272 #Simulation Scenaria ID IdDemand = 479284 # IdDemand def getExternalMatrices(model): matrix =[]; objType = model.getType("GKODMatrix") for types in model.getCatalog().getUsedSubTypesFromType( objType ): for obj in types.itervalues(): matrix.append(obj) matrix.sort() return matrix def main(argv): for i in range(len(IdRep)): if len(argv) < 3: print "usage: %s ANG_FILE_NAME MATRIX_ID" % argv[0] return -1 angFileName = argv[1] angAbsName = os.path.basename(angFileName) angName = os.path.splitext(angAbsName)[0]# Motorway is [0] and .ang is [1] assignMatrixFileName = os.path.dirname(angFileName)+ os.sep+angName detectorsFileName=argv[2] console = ANGConsole() if console.open( argv[1] ): model = console.getModel() # Create a backup console.save(argv[1]+".old") for matrix in getExternalMatrices(model): newPath = os.path.dirname( str( model.getDocumentFileName() ) ) + "\matrix.txt"

Appendices 223

matrix.setLocation(newPath) matrix.restoreExternalMatrix() plugin = GKSystem.getSystem().getPlugin("GGetram") scenario = model.getCatalog().find(IdScenario) demand = model.getCatalog().find(IdDemand) simulator = plugin.getCreateSimulator(model) replication = model.getCatalog().find(IdRep[i]) if simulator.isBusy() == False: print "usage: %s simulator.isBusy() == False"

# sections_det code is to get only those sections with detectors installed sections_det=list() file=open(detectorsFileName,'r') if file!=None: for line in file.readlines(): idDetector = line.split(";") det=model.getCatalog().find(int(idDetector[0])) if det != None: sections_det.append(det.getBottomObject())

# links code is to get all sections in the network links=list(); linkType = model.getType( "GKSection" ) for segs in model.getCatalog().getUsedSubTypesFromType( linkType ): for lk in segs.itervalues(): links.append(lk) if replication.getExperiment().getSimulatorEngine() == GKExperiment.eMicro: simulator.addSimulationTask (GKSimulationTask(replication,GKReplication.eBatch)) simulator.setGatherProportions (True,assignMatrixFileName+'.matrix',sections_det, turnings, 0 ) simulator.simulate() console.close() else: console.getLog().addError( "Cannot load the network" ) print "cannot load network" if __name__ == "__main__": sys.exit(main(sys.argv))

n9351833 krishna nikhil sumanth behara thesis · 2019. 9. 3. · ORIGIN-DESTINATION MATRIX ESTIMATION USING BIG TRAFFIC DATA: A STRUCTURAL PERSPECTIVE Krishna Nikhil Sumanth Behara

Documents