An Observational Study of the Characteristics of Taxi Floating Car

An Observational Study

of the Characteristics of Taxi Floating Car Data Compared

to Radar Sensor Data

T O N Y K A R L S S O N

Master of Science Thesis Stockholm, Sweden 2012

An Observational Study

of the Characteristics of Taxi Floating Car Data Compared

to Radar Sensor Data

T O N Y K A R L S S O N

DD221X, Master’s Thesis in Computer Science (30 ECTS credits) Degree Progr. in Computer Science and Engineering 270 credits Royal Institute of Technology year 2012 Supervisor at CSC was Jens Lagergren Examiner was Anders Lansner TRITA-CSC-E 2012:051 ISRN-KTH/CSC/E--12/051--SE ISSN-1653-5715 Royal Institute of Technology School of Computer Science and Communication KTH CSC SE-100 44 Stockholm, Sweden URL: www.kth.se/csc

AbstractIn Stockholm, each taxi from a studied taxi company isequipped with a Global Positioning System (GPS) device,transmitting its GPS position approximately once everyminute. With the help of a set of road segments repre-senting the road network and a map-matching algorithm,the GPS positions are map-matched to the road segments,to determine the route each taxi is driving on the road net-work. With the help of the map-matching algorithm thespeed for each road segment the taxi is driving on in itsroute is calculated. The goal of this study is to investigatethe relationship between the speed calculated from the taxiswith the speed generated from the radar sensors located onparts of the E4 motorway in Stockholm. The problem isthat radar sensors measure the speed at a fixed point on theroad network, and the speed from the taxis are measuredbetween two points on the road network and are thereforenot directly comparable. The data from the radar sensorsand taxis are first analyzed, and then aggregated into 5minutes periods for 7 days in July, 2010. To investigate therelationship between the two data sets the average differ-ence of the speed from the taxis and the radar sensors areanalyzed under different conditions. The average differenceof the speed is analyzed during different times of the day,with different levels of traffic congestions and for differentnumber of taxi cars passing each radar sensor. Some statis-tical relationships between speed calculated from the taxisand the speed from the radar sensors were found. And sev-eral new factors that could have an impact on the resultswere identified. For speed of radar sensors and taxis below100 km/h, a statistical relationship between the speed fromthe taxis and the speed from the radar sensors was iden-tified. The study conducted was an observational study,therefore the certainty of the results are unknown and fur-ther studies are needed to identify the factors that can af-fect the results. A controlled experimental study need tobe conducted to be able to draw any conclusions of a casualrelationship between the two data sets. The flow from thetaxis at the radar sensors are also compared to the esti-mated flow from the radar sensors and the penetration rateof taxis that are passing the radar sensors were calculatedas 0.5%.

ReferatEn observationsstudie om egenskaperna avtrafikdata genererad av taxibilar jämfört med

radarsensordata

Varje taxibil från ett studerat taxibolag i Stockholmsänder ut sin Global Positioning System (GPS)-positionungefär varje minut. Med hjälp av länkar som represen-terar vägnätet och en algoritm är det möjligt att para ihopGPS-positionerna med länkarna och få fram vilken väg var-je taxibil kört på. Därefter är det möjligt att beräkna has-tigheten för varje länk, varje taxibil körde på under sinfärd. Målet med denna studie var att undersöka samban-det mellan hastigheten av länkarna, beräknat med hjälp avtaxibilarna, och hastigheten från radarsensorer som sitterutplacerade på delar av bland annat E4:an i Stockholm.Problemet är att radarsensorer mäter hastigheten på enpunkt på vägen, till skillnad från taxibilarna, där hastighe-ten beräknas för en länk som består av en sträcka mellan tvåpunkter på vägnätet. Först analyserades data från radar-sensorerna och taxibilarna, sedan aggregerades data frånsju dagar i juli, 2010 i 5 minutersintervall för att underlättajämförelsen mellan de båda datamängderna. För att under-söka sambandet mellan hastigheten av länkarna, beräknadmed hjälp av taxibilar och hastigheten från radarsensorer,undersöktes hur hastighetsskillnaden mellan dem ändradesunder olika förutsättningar i trafiken. Hastighetsskillnadenundersöktes under olika tider på dygnet, under olika graderav trängsel, och beroende på hur många taxibilar som pas-serade en radarsensor i ett visst aggregeringsintervall. Vissasamband mellan hastigheten från taxibilarna och hastighe-ten från radarsensorerna kunde hittas men det visade sigatt det fanns andra faktorer som kunde påverka resultatet.För hastighetsmätningar under 100 km/h, såg det ut attfinnas ett statistiskt samband mellan hastigheten från tax-ibilarna och radarsensorerna. Storleken och tillförlitlighetenpå detta statistiska samband gick inte avgöra. Eventuellafaktorer som kan påverka jämförelsen av hastigheten be-höver identifieras och ett kontrollerat experiment behöversedan utföras för att se hur de olika faktorerna påverkarresultatet. Antalet taxibilar från det specifika taxibolaget ijämförelse med det uppskattade antalet fordon som passe-rade radarsensorerna beräknades till cirka 0.5%.

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Goal and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background Theory 42.1 Traffic Sensor Technologies . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Fixed Point Sensors . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Floating Car Data . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Comparison Methodology . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 Complete Roadway . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Road Segment by Road Segment . . . . . . . . . . . . . . . . 102.2.3 Fixed Point Sensor and Fixed Point Sensor . . . . . . . . . . 112.2.4 Fixed Point Sensor and Road Segment . . . . . . . . . . . . . 122.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

I Data Preparation 15

3 Traffic Data Sources 163.1 Stockholm Motorway Control System . . . . . . . . . . . . . . . . . 16

3.1.1 Radar Sensor Data Preparation . . . . . . . . . . . . . . . . . 193.1.2 Radar Sensor Data Analysis and Characteristics . . . . . . . 20

3.2 Road Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Taxi Floating Car Data . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3.1 Floating Car Data Analysis and Characteristics . . . . . . . . 24

4 Comparison Preparation 254.1 Aggregation Methodology . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 Aggregation of Radar Sensor Data . . . . . . . . . . . . . . . 264.1.2 Aggregation of Floating Car Data . . . . . . . . . . . . . . . 27

4.2 Association of Road Segments and Radar Sensors . . . . . . . . . . . 304.2.1 Characteristics of the Association . . . . . . . . . . . . . . . . 30

II Comparison 33

5 Statistical Background 345.1 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3 Observational Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Comparison Methodology 376.1 Average Taxi Penetration Rate . . . . . . . . . . . . . . . . . . . . . 376.2 Taxi and Traffic Average Speed Comparison . . . . . . . . . . . . . . 39

6.2.1 Certainty on all the Traffic . . . . . . . . . . . . . . . . . . . 416.2.2 Certainty during Time of the Day . . . . . . . . . . . . . . . 416.2.3 Certainty during Congestion . . . . . . . . . . . . . . . . . . 41

7 Results 447.1 Taxi Penetration Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 447.2 Certainty of Speed Measurement of Taxis . . . . . . . . . . . . . . . 457.3 Certainty of Speed Measurement during Different Hours of the Day . 497.4 Certainty of Speed Measurement of Taxis during Congestion . . . . . 49

8 Discussion and Conclusion 518.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Bibliography 54

Appendices 55

A List of Gantries 56

B Taxi Penetration Rate 61

C Certainty of Speed Measurement 65

Chapter 1

Introduction

The study conducted as a part of a master’s thesis was initiated by the Division ofTraffic and Logistics, of the Department of Transport Science, at the School of Ar-chitecture and the Built Environment, at the Royal Institute of Technology (KTH).The Division of Traffic and Logistics is doing research on Intelligent TransportationSystems (ITS).

1.1 BackgroundIn the field of ITS there are various methods to measure the average speed and flowof traffic on the road network. The purpose is to determine the level of congestion,travel time and average speed at fixed points or on parts of the road network. Onemeasurement method are fixed point detectors such as loop detectors and radarsensors, measuring the average flow and average speed of the traffic at fixed pointson the road network. Another type of measurement method is based on floatingcar data, collected from a vehicle fleet such as a taxi fleet or a bus fleet. In thecase of a taxi fleet, the floating car data is collected from the dispatch system of thetaxis which are sending their Global Positioning System (GPS)-position to centraldispatch for the taxi company. Since taxis has to adopt their speed to the generaltraffic, the taxi cars can be seen as floating along the traffic and therefore be usedas an approximation of the general traffic.

In Stockholm each taxi from a studied taxi company is equipped with a dispatchsystem including a GPS device, which is sending its GPS position and timestampapproximately once every minute to central dispatch. Central dispatch uses the GPSpositions of all the taxis to locate the nearest taxi to a customer when the customeris calling central dispatch for taxi service. The Division of Traffic and Logisticsis receiving a copy of this data as consecutive GPS positions of each taxi. Theconsecutive GPS positions for each taxi is map-matched to a digital road networkusing a map-matching algorithm, producing the most likely route each taxi wasdriving on the road network.

As a part of the Motorway Control System in Stockholm there are a number of

1

fixed point radar sensors located on the E4 motorway in Stockholm. Each radarsensor is measuring the average speed of the vehicles passing by each minute. Theradar sensors are also measuring the flow, the number of vehicles passing by theradar sensor each minute. The Division of Traffic and Logistics is receiving a copyof the radar sensor data as well. Before this study was started the radar sensor datahad not been analyzed at the Division of Traffic and Logistics and its characteristicswas unknown.

The radar sensor data and the floating car data are two different kind of datasets. Each radar sensor is measuring the average speed of vehicles passing by a fixedpoint on the road network. But the taxi floating car data is measuring the speedbetween two points on the road network and the radar sensor data and floating cardata are therefore not directly comparable.

1.2 QuestionsThe questions this study will try to answer are the following:

• What is the fraction of taxis compared to the general traffic? Per day? Perhour?

• How many taxis are needed to have certainty in the speed measurement fromthe taxis?

• Does the number of taxis needed to have certainty in the speed measurementchange during the time of the day?

• Does the number of taxis needed to have certainty in the speed measurementchange when there is congestion?

The main source of traffic information such as the level of congestion, averagespeed and travel times on parts of the road network are based on data from fixedpoint sensors, such as radar sensors. In Stockholm the fixed point radar sensors arelocated on parts of the main arterial roads. By answering the questions specifiedabove, it will be possible to determine the reliability of the taxi floating car data.If the certainty of speed measurements from the taxi floating car data is reliable,then the taxi floating car data can be used as a measurement method to determinethe level of congestion, average speed and travel times on parts of the road networkwhere there are no fixed point radar sensors.

1.3 Goal and ObjectiveThe goal of this study is to learn more of the relationship between the radar sensordata generated by the radar sensors and the floating car data generated by the taxifleet from the studied taxi company. The studied taxi company was choosen since

2

the taxi floating car data from this taxi company was available at the Division ofTraffic and Logistics. The report for the study is divided into three parts.

Background Theory To find out how different kind of traffic sensors are com-pared and validated with one another, a literature study is conducted on thesubject of comparing one type of traffic sensor data with another kind of traf-fic sensor data. The background theory will present some of the technologiesused in ITS.

Data Preparation Since the characteristics of the radar sensor data are unknown,a first step is to insert the radar sensor data into a database and analyze itscharacteristics. The purpose is to identify erroneous data as well as to finda suitable period of time for comparison. After the radar sensor data hasbeen analyzed, the radar sensor data and floating car data are prepared forcomparison. Before the radar sensor data can be compared to the taxi floatingcar data, the radar sensors are associated with the road segments of the E4motorway. The second step is to aggregate the taxi floating car data andthe radar sensor data into a suitable period of time and then a method ofcomparison has to be found with support from the literature.

Comparison Once the radar sensor data and the taxi floating car data have beenaggregated into a suitable period of time, the two data sets are compared withone another. To answer the research questions of this study the comparisonis focusing on finding out how many taxis are needed to have certainty in thetaxi speed measurements compared to the speed measurements from the radarsensors. The comparison part will present the methodology used during thecomparison, and how the results are generated from the comparison.

Once the radar sensor data has been compared with the taxi floating car data,the results from the comparison are presented. The presentation of the results isgrouped by the research questions, with one section for each question. To answerthe research questions the results are presented in a form of a statistical analysis.With the help of the literature study, the results from the comparison are discussedand conclusions are presented.

3

Chapter 2

Background Theory

This chapter will present and explain the various traffic sensor technologies foundin the literature. Both the traffic sensor technologies based on floating car dataand fixed point sensors will be presented. The traffic data is used by the ITS, butgenerated by the various traffic sensors. This chapter will also explain how the trafficsensors generate traffic data such as the travel time between two points on the roadnetwork. How traffic sensors measure and estimate the average speed of the traffictravelling on a part of the road, or measure and estimate the number of vehiclespassing by a certain point on the road network. The literature on the subject ofcomparing one traffic sensor technology with another traffic sensor technology willbe presented and discussed.

2.1 Traffic Sensor TechnologiesFor the purpose of traffic surveillance, effective management of the road network,reduce congestion and estimate travel time, different kind of technologies are usedto collect traffic data. The road network can be divided into road segments linkedtogether, where each road segment is a part of the road between two points. Aroad segment can vary in length and either be a straight line or an arc with a slightcurve. Traffic data can be collected using fixed point sensors or from floating cardata, and be used to calculate the average speed and flow at a fixed point or of aroad segment.

2.1.1 Fixed Point SensorsThe most common type of technology for traffic surveillance has been fixed pointsensors. A fixed point sensor is a sensor installed at a fixed position on the roadnetwork. The fixed point sensors, in most cases function by aggregating the speedfor all vehicles passing by the fixed point sensor in a specific time period, suchas 5 minutes and reports the aggregated average speed to the ITS. The aggregatedaverage speed for all vehicles passing by a fixed point sensor in a specific time period

4

will here be called the fixed point average speed. The aggregated number ofvehicles that pass by the fixed point sensor in a specific time period will from hereon be called the fixed point traffic flow. The fixed point traffic flow is in mostcases also reported to the ITS in a 5 minutes period. Examples of fixed point sensorsthat will be explained in more detail are loop detectors, radar sensors and licenseplate recognition system sensors.

Loop detectors, also know as magnetic loop detectors or inductive loop detectorsare the most common type of fixed point sensors. A loop detector consists of acoil or loop of wire buried near the surface of the road [1, 10]. When a vehicleis passing by the loop detector, a change of inductance can be detected in thewire. The change of inductance can then be used to identify individual vehicles andcalculate the speed of the vehicle, which can then be used to calculate both thefixed point average speed and the fixed point traffic flow. Loop detectors have beenin use for several decades and have up until recently been the main source of trafficinformation [11, 6].

Radar sensors are another example of fixed point sensors that are used in trafficsurveillance. A radar sensor can measure the speed of an individual vehicle withthe help of a radar beam that is reflected upon the vehicle and back to the radarsensor. Another technology used in traffic surveillance is the automatic licenseplate recognition (ALPR) system, consisting of two cameras placed at two distinctpoints of the road network. The ALPR system can with the help of image processing,calculate the travel time and average speed of an individual vehicle between the twocameras, by identifying the license plate of the vehicle.

Since a loop detector is a magnetic detector buried in the ground, the cost ofmaintenance and installation is high [4, 7]. And since the cost of installation andmaintenance is high the use of loop detectors is limited to the main arterial roadsof the network and the coverage of the traffic surveillance is therefore limited [1, 8].

A general problem with fixed point sensors is that estimation of traffic conditionson the complete network is based on the fixed point average speed and fixed pointtraffic flow, reported from a small number of sensors, located at fixed points on theroad network [6]. An example of this problem is when fixed point sensors locatedjust after a major intersection is used to estimate the average speed of a longer partof a road. Some vehicles has to stop just before passing the intersection and canthen increase its speed once the vehicle has passed the intersection. If the fixedpoint sensor located just after the intersection is used to estimate the average speedof a longer part of the road, starting just after the intersection, the average speedwill most likely be underestimated [7]. The ALPR system share some disadvantageswith the fixed point sensors such as high installation and maintenance costs as wellas limited coverage.

2.1.2 Floating Car DataUsing floating car data is a more recent technology in traffic surveillance and itcan be used to estimate the average speed and traffic flow of road segments on the

5

road network. Floating car data is measured from a subset of vehicles in the traffic,and can be seen as vehicles that float with the traffic stream and therefore capturethe characteristics of the general traffic [2]. Floating car data can be collectedusing various methods. One such method is using a taxi fleet equipped with aGlobal Positioning System (GPS) device, that can record and transmit its routeas consecutive GPS positions while driving in the traffic [2]. With the help of adigital road network stored as road segments, each GPS position can be matchedto a position on the road segment using a map-matching algorithm. An increasing

Figure 2.1. An example of how floating car data is map-matched to the roadsegments of a road network. The figure show the same taxi at three distinct points intime. For each point in time the taxi is transmitting its GPS position and timestampto a central computer. The figure also show five road segments as dotted lines, linkedtogether at the intersections, representing the road network. The central computeruse a map-matching algorithm to determine the route the taxi was driving as thesequence: road segment A, road segment C, road segment E. With the help of thetimestamps, the average speed for each road segment can be computed.

number of fleet operators such as taxi fleet operators or bus fleet operators, usethe GPS technology to keep track of their vehicles, as well as to direct them. Theincreasing use of the GPS technology is as a side effect producing floating car datawith almost zero additional costs [2]. With the help of a vehicle fleet, the segmentaverage speed and the segment taxi flow can be calculated. The segment averagespeed is the average speed of the floating cars driving on the road segment duringa specific time period. The segment taxi flow is the number of floating taxicars driving on the road segment during a specific time period. An example of themap-matching of the floating car data is presented in figure 2.1

The floating cars are only a subset of all the vehicles driving in the traffic, andtherefore the floating cars can only be used to estimate the average speed and flowof the general traffic.

6

By the increasing use of cell phones the possibility to use them as a trafficsensor has increased [1, 8]. One such technology uses the anonymous informationexchanged between the cell phones and the cell phone Radio Base Stations (RBS)[11]. When the communication of a cell phone has been handed over from one RBSto another RBS. The segment average speed and segment taxi flow can be calculatedwith the help of a digital road network and a map-matching algorithm in the sameway as for taxi floating car data.

An advantage of using floating car data compared to fixed point sensors is thewide coverage, making it possible to collect traffic information for the complete roadnetwork [7]. A possible problem with this technology is the lack of reliability of thedata received when there is a low penetration rate of vehicles collecting the floatingcar data [5]. The penetration rate of vehicles is the number of floating cars inpercent of the total number of vehicles driving on the road network. Several studiesare mentioned in the literature that concluded that a penetration rate of at least5% of cell phones in the vehicles is needed to arrive at a good estimation of traveltimes for the traffic [4, 8]. On the contrary one study mentioned in the literature,concluded that only a penetration rate of 1% of vehicles is required to be able tocalculate reliable information from the floating car data [7].

With the increasing use of built in GPS devices in cell phones, the possibility touse them as a mean to obtain traffic information has increased [8]. A problem withusing the GPS device in cell phones is the privacy concern. Even by anonymouslyrecording sequent GPS positions the travel pattern and home address of the usercan be found by analyzing the consecutive GPS positions [8]. The solution tothis problem as proposed in the literature, is to use virtual trip lines which arepreprogrammed geographical points in the software of the cell phone spread outon roads with high traffic [8]. When the user is passing a virtual trip line thespeed and travel time is calculated and sent anonymously to a server and privacyis protected since only speed measurements are taken on certain points on theroadways with high traffic. A general problem with using cell phones to obtaintraffic data is that the cell phone might not always be travelling in a car. Therefore,first the transportation mode of the cell phones is identified. Then the cell phonestravelling on trains, or buses travelling in special bus lanes not suited for obtainingtraffic information for the general traffic is filtered. A penetration rate of 2–3%of cell phones in vehicles is estimated to be enough to receive good results frommeasurements of traffic speed from the cell phones [8].

2.2 Comparison MethodologyThe traffic data generated from the traffic sensor technologies that can be ofinterest in traffic surveillance are:

• The segment taxi flow

• The segment average speed

7

• The fixed point average speed

• The fixed point traffic flow

• The travel time between two points of a road

Traffic data can be generated from various fixed point sensors and floating cardata sources, each with its own characteristics. Characteristics that can be ofinterest are:

• The accuracy of the speed measurements from a fixed point sensor.

• The number of valid sensor readings of the fixed point sensor.

• The accuracy of the segment average speed calculated from the floating cardata.

• How the segment average speed change with the number of floating vehiclescompared to ground truth data.

• How the segment average speed change with the size of the segment comparedto ground truth data.

• How the segment average speed change for different parts of the road networkcompared to ground truth data.

To be able to see the difference in characteristics of the various traffic sensortechnologies the traffic data generated from one traffic sensor technology has to becompared to another traffic sensor technology. To be able to compare data from twodistinct traffic sensor technologies, the data has to be comparable. Some methodsof comparison manage to show a higher detail of the characteristics than othercomparison methods.

2.2.1 Complete RoadwayIn several studies in the literature, traffic data from two distinct traffic sensor tech-nologies were compared over a complete roadway [1, 3, 2, 16, 11]. Either by com-puting the average speed over the complete roadway from two distinct traffic sensortechnologies. Or by comparing the computed travel time from the start of theroadway to the end of the roadway from two distinct traffic sensor technologies.

One study compared the travel times, computed from cell phone RBS informa-tion, for the complete Ayalon freeway in Israel consisting of 4 to 5 lanes in 2005,with the travel times computed from the fixed point average speed reported by loopdetectors located on the same freeway [1]. To calculate the travel times from the cellphones, the roadway is first split up into road segments, and then the travel timefor each road segment is calculated using a map-matching algorithm. The traveltime for the complete roadway is then calculated by summing up the travel time

8

for each road segment. To calculate the travel times from the loop detectors thedistance between two consecutive loop detectors is divided by two. And to createa road segment for each loop detector, the first half of the distance is associatedwith the first loop detector, and the second half of the distance is associated withthe second loop detector. And then for each road segment, the travel times arecalculated by dividing the fixed point average speed from the loop detector, withthe length of the road segment the loop detector is located on, and then inverting.The travel time for the complete roadway is then compared for every aggregated 5minutes time period. By comparing the travel times for the complete roadway thestudy concluded that there is a good agreement during non-congested conditions.But during congested conditions, when travel time for the complete roadway is cal-culated as 18 minutes, the difference in travel time calculated from the cell phonesand the loop detectors, is 3–4 minutes, which the study state as quite acceptable[1].

Another study compares the travel times of a complete roadway computed fromfloating car data collected from taxis in Hamburg during 2006 with test drives of twovehicles [3]. The study was conducted with the help of test drives using two vehicles,each equipped with a GPS logger which collected the positions of the vehicle witha frequency of 5 s. To calculate travel times from both the floating car data fromtaxis and the consecutive GPS positions from the test drives, the GPS positions aremap-matched to a digital road network, using a map-matching algorithm. From thecomparison it could be concluded that there is a good agreement between the traveltimes calculated from the taxi floating car data and those travel times calculatedfrom the test drives. But as could be seen in a previous study, the travel timeshad more errors during congested conditions [1]. The current travel times was alsocompared with the historical travel times and the study concluded that they agreemost of the time, thus hinting that the historical travel times can be used as acomplement when there is missing traffic data [3].

Another similar study was conducted in the city of Nuremberg, Germany in2005, comparing the travel times calculated from floating car data from taxis withtravel times calculated from an ALPR system [2]. To compare the traffic data,the travel times during the whole day is aggregated into 15 minutes periods. Thestudy concluded that overall the average travel times for the complete roadway arecalculated reliably. Traffic congestion lasting for a longer period of time was suc-cessfully detected by the floating car data produced by taxis, but traffic congestionthat lasted for a shorter period of time was on the other hand not always detected.

Another study was conducted on two roadways in the city center of Düsseldorf,Germany during 2006, where the average speed calculated from taxi floating cardata is evaluated by comparing it to the speed calculated from an ALPR system[16]. In contrast to previous studies the estimated average speed from floating cardata was much lower than the average speed calculated from the ALPR system forthe whole period [1, 2]. But the trend of the average speed was similar, followingthe same pattern. In the study it is explained that since origin-destination data isused for the floating car data in the comparison and that waiting time at the origin

9

and destination could affect the average speed calculations [16]. The average errorbetween the average speed calculated from the taxi floating car data and the averagespeed calculated from the ALPR system is 12 km/h. Historical average speed valuesgenerated from taxi floating car data is also compared, and the historical values arefound to follow the current estimated values closely.

A study conducted on a short roadway in the city of Antwerp, Belgium, wherefloating car data generated from cell phone RBS information was compared withtraffic data from loop detectors in a similar way as the study conducted on theAyalon freeway in Israel [11]. Travel time for each road segment of the roadway wascalculated from the RBS information and then summed up to obtain the travel timefor the complete roadway. Travel times derived from the loop detectors were calcu-lated by inverting the speed on the road segment the loop detector was located on.The travel times calculated from the RBS information for the complete roadway arethen compared with the travel times calculated from the loop detectors. The studyconcluded that there is a good agreement between the average speeds from the twosources of traffic data. But that the average speeds from the RBS information areslightly lower and fluctuating more than the average speeds from the loop detectors[11].

All five studies which compared data over a complete roadway presented theircomparison as a graph, showing the difference in speed or travel time during thetime of the day on the studied roadway. Making it possible to distinguish variationsduring the time of the day. One study presented some further statistical analysesand a regression analysis [1]. Because the comparisons are conducted on a com-plete roadway, it is difficult to see how much the structure of the road and trafficconditions on different parts of the roadway affected the results in the comparison.

2.2.2 Road Segment by Road SegmentIn two studies, the method of comparing the average segment speed from one trafficsensor technology with the average segment speed from another traffic sensor tech-nology is used as a mean to analyse the characteristics of the traffic data [10, 19].

In the first study, a comparison of the floating car data produced by 4 000 taxisin the city of Shenzhen, China during 7 days in December 2006, was conducted [10].Measuring the segment average speed for each road segment with the help of a stopwatch and comparing it with segment average speed computed from floating cardata. The comparison was conducted on a roadway consisting of 116 km of urbanhigh-speed road and 138 km of thoroughfare. The comparison was conducted bypresenting the error distribution of the difference in segment average speed of theroad segments in percent. 36.6% of the road segments had less than 10% error,23.6% of the road segments had between 10–20% error, 20.3% of the road segmentshad between 20–30% error, and 19.5% had more than 30% error. To present theresults of the comparison between the segment average speeds, calculated for theroad segments, the road segments were divided into groups of high-speed road andthoroughfare. And the results show that high-speed roads has less errors than the

10

thoroughfare.In the second study, conducted in the city of Cheongju, Korea for 31 days in

2004, the segment average speed, calculated from RBS hand over information fromcell phones are compared pairwise with the segment average speed from floatingcar data generated from 10 probe vehicles [19]. In a similar way as in the previousstudy conducted in the city of Shenzhen, China, the comparison is conducted bypresenting the average error of the segment average speed for each road segment.The conclusion of the study in the city of Cheongju, Korea, was that cell phones canbe used as an alternative to estimate the segment average speeds. But the studyalso concluded that the map-matching can affect the accuracy of the result. Andthat when a cell phone is communicating with an RBS located far away, the map-matching is getting more inaccurate, which affects the accuracy of the estimatedsegment average speed.

The comparison for both of these studies are presented in a similar way. Thefirst study presented the error distribution across road segments in percent andthe other study presented the average errors for all the road segments used in thestudy. The method of presenting the results using the error distribution across roadsegments make it possible, to some extent, to see how the comparison is affectedby the characteristics of distinct road segments. But the method do not manage todemonstrate how the results are varying during the time of the day and to whichextent that effect the comparison.

2.2.3 Fixed Point Sensor and Fixed Point SensorIn a study conducted on freeway I-880 in Union City, California, USA during 8hours in February, 2008, the speed measured at virtual trip lines from 100 vehicles,each carrying a GPS-enabled Nokia N95 cell phone is compared with the fixed pointaverage speed from loop detectors [8]. The 17 virtual trip lines are placed at thesame locations as the 17 loop detectors and the comparison was presented as twovelocity fields, the first constructed from the 17 virtual trip lines and the secondfrom the 17 loop detectors. In the velocity field, each loop detector represent theroad segment of the road the loop detector is located on, and the length of that roadsegment depended on the distance to the next and previous loop detector. Duringcomparison with the speed measurements from the virtual trip lines, differences inthe speed measurements for some of the loop detectors are noticed [8]. One reasonfor the speed difference is the fact that the segment average speeds from the loopdetectors and the segment average speeds from the virtual trip lines are calculatedusing different methods. Another reason suggested in the literature, “Virtual triplines collect their velocity from a proportion of all vehicles crossing that location,while loop detectors collect data from all vehicles. If this proportion is too small, itmight not be statistically representative of the entire population” [8].

11

2.2.4 Fixed Point Sensor and Road SegmentSeveral studies from the literature, used the method of comparing traffic data calcu-lated from a fixed point sensor, with traffic data from some other sensor technology,such as floating car data when doing their comparison [1, 4, 6, 15].

In a study conducted on the 14 km long Ayalon freeway in Israel, consistingof 4 lanes in each direction, the road is divided into road segments of length 300m to 1 000 m [1]. The study on the Ayalon freeway in Istael has been mentionedin a previous subsection in this report when comparing the travel time for thecomplete roadway. The freeway has 10 interchanges and 60 loop detectors placedapproximately 500 m apart. RBS hand over information from cell phones are usedto calculate the travel time for each road segment of the road. In the study themethodology for comparing traffic data from a fixed point sensor with that based onRBS hand over information was discussed. The study concluded that since the twodata sets has different characteristics the results from a comparison can only be usedas a starting point for evaluating the data, and that the comparison is not sufficientto be able draw any conclusions. For that reason a graphical representation is usedin the study, in a form of a velocity field as a mean to compare the speeds fromthe two data sets along the Ayalon freeway. The velocity field for the loop detectordata is then compared to the velocity field of the floating car data from the RBShand over information. The segment average speed of each road segment is usedto calculate the velocity field of the floating car data. And for the loop detectorvelocity field the speed is calculated for a road segment of 500 m centered at thedetectors location, aggregating the speed of all lanes from that detector during thelast 5 minutes.

In a study conducted on a 9 km part of the Gardiner expressway in Toronto,Canada, containing 14 loop detectors, data from 14 days in 2008 is compared withfloating car data from RBS hand over information [4]. The RBS hand over infor-mation is used to calculate the segment average speed for each road segment of theroadway. In contrast to the study conducted in Israel [1], the study in Toronto,Canada compared 5 minutes aggregated average speed for each loop detector withthe 5 minutes aggregated average speed of the road segment the loop detector waslocated on. The comparison of the segment average speeds, with the fixed pointaverage speeds from the loop detectors was carried out by first calculating the av-erage speed difference and presenting it in a table. And then presenting a graphfor one loop detector showing the fixed point average speed of the loop detectorcompared with the segment average speed during the time of the day. Both thestudy conducted in Israel and the study conducted in Toronto conclude that thespeed pattern of the two different data sets overall is similar [1, 4]. The study con-ducted in Toronto, Canada, noticed that the speeds measured by the RBS hand overinformation were slightly lower than the speed measured from the loop detectors.And mentioned that it is expected since speeds from RBS hand over informationwere calculated across a larger distance compared to loop detectors which measurethe speed at a fixed point. Both studies mentioned that the floating car data from

12

the RBS hand over information was more noisy. In the study conducted in Israel,mentioned that the noise from the floating car data was larger on the first sectionof the road where an on-ramp is connected and mentioned that as a possible ex-planation. And in the study conducted in Toronto, Canada, mentioned that therewere noise from the floating car data in the early morning and late night periods,and explained the noise by low traffic and low penetration rate of RBS hand overinformation during that period.

In another study the segment average speeds calculated from preprogrammedpseudo bus stops, are compared with the speeds calculated from loop detectors, withthe purpose to test different loop detector data collection methods [6]. Each pseudobus stop is placed with equal distance of every two consecutive loop detector, placingeach loop detector near the center of a road segment. When comparing the averagesegment speeds calculated with the fixed point average speeds reported by the loopdetectors, the results show that there is a slight variation in the comparison. Butthe study concluded that aggregating the fixed point average speed from the loopdetectors over 5 minutes is better than aggregating over 1 minute.

In a study conducted on the I-4 freeway in Florida, USA, the road segment traveltime calculated from the data of a vehicle equipped with a GPS, was comparedwith travel time computed from the 5 minutes aggregated fixed point average speedmeasurements of a loop detector [15]. The comparison showed that the two datasets are highly consistent but as the studies noted, there are only 6 observationsmade from the GPS-equipped vehicle, which is a relatively small amount.

When fixed point sensor data was compared with floating car data the valida-tions are presented in a couple of different ways. In two studies from the literature,a graphical representation is used for presenting the comparison over a distance ofthe road and time of the day [4, 1]. This made it easier to see how the characteristicsof the roadway and the sensors at different locations and time affected the results.In contrast, two other studies used a regression analysis to present their comparisonwhich better manage to show the overall results of the comparison [6, 15]. But didnot manage to show any details of the characteristics of the results.

2.2.5 SummaryIf the traffic data from one traffic sensor technology is compared to the traffic datafrom another traffic sensor technology with different characteristics, the comparisonmay produce some errors. One study from the literature compared the average seg-ment speeds calculated from floating car data generated by a bus fleet, with fixedpoint average speeds generated from loop detectors and stated, “Clearly the trans-formation of a fixed point parameter to report conditions over an inhomogeneoussegment carry with it potential for error, but the magnitude of this error is not wellunderstood” [6]. Another study from the literature pointed this out as well andstated “In practice, a new speed measurement technology can only be comparedto another technology for measuring speeds and, as each technology has its owncharacteristics in terms of latency, smoothing, errors and so forth, the two data sets

13

are rarely directly comparable” [4]. After the data from virtual trip lines and loopdetectors were compared, one study noted that the loop detectors reported lowerspeeds and stated “Because of the previous considerations, loop detector measure-ments are not considered as ground truth in this study. A data analysis is carriedout only to observe the main features of both type of measurements, and not todetermine the accuracy of measurement” [8]. For the comparisons in the literaturepresented as a velocity field or a graph, more details of the variation of speed duringthe time of the day for each of the loop detectors in the study was visible. Thismade it easier to draw conclusions on how the characteristics of the traffic dataaffected the comparison. Compared to comparisons made on a complete roadwaywhich made it difficult to draw any conclusions on how the characteristics of thetraffic sensors and road affected the comparison. Since only a small number of float-ing cars are used for representing the flow of traffic on the complete road network,the floating cars will most likely not be able to cover the complete road networkduring all times of the day. Therefore several studies are suggesting that floatingcar data should be complemented with historical floating car data for each roadsegment where the penetration rate of floating cars is low [3, 2, 10, 5].

14

Part I

Data Preparation

15

Chapter 3

Traffic Data Sources

This chapter will explain the two kind of traffic sensor technologies that are usedfor answering the research questions specified for this study. This chapter will alsoexplain how the traffic data from the two traffic sensor technologies were obtainedand an analysis of the traffic data will be presented. One traffic data source arethe fixed point radar sensors which are a part of the Stockholm Motorway ControlSystem, placed on parts of the Stockholm road network. The data generated bythe fixed point radar sensors, will from here on be referred to as the radar sensordata. The other source of traffic data are the taxi cars from a studied taxi companydriving on the streets of Stockholm, generating taxi floating car data. A digitalrepresentation of the road network in the form of road segments was provided tothis study, where each road segment represent either a complete road or a part of aroad.

3.1 Stockholm Motorway Control SystemOne of the traffic data sources used are the microwave radar sensors, which are apart of the Stockholm Motorway Control System, based on the MTM-2 (MotorwayTraffic Management) system used in the Netherlands. The Stockholm MotorwayControl System was installed on parts of six different roads during the period 1996–2004.

The radar sensors are placed on gantries (see figure 3.1) over each lane in eachdirection of the road. For most of the radar sensors there is also a variable messagesign located above each radar sensor, on the gantry, showing recommended speed,or lane control signs when necessary. The number of gantries and radar sensors oneach of the six different roads are presented in table 3.1.

In figure 3.2 the location of the six different roads which has radar sensorsinstalled can be seen in relation to the city of Stockholm.

Each radar sensor is continuously collecting traffic data, and sending a radarsensor reading to the Stockholm Motorway Control System each minute. A radarsensor reading consists of the fixed point average speed and the fixed point traffic

16

50 50 50 50

Figure 3.1. An example of a gantry with a variable message sign above each lane.There is one radar sensor for each lane, all mounted on the same gantry.

Table 3.1. Stockholm Motorway Control System

Road Radar Sensors GantriesE4 548 179E4 (excluding on- and off-ramp) 511 155Riksväg 75 (Södra länken) 289 145Riksväg 73 (Nynäsvägen) 33 22Länsväg 222 (Värmdöleden 12 6Länsväg 226 (Huddingevägen) 29 18Länsväg 265 111 64

flow during the last minute. The purpose of the Stockholm Motorway ControlSystem is to increase network capacity of the road network and increase safety byusing the variable message signs for queue warnings, incident management, andspeed reduction during congestion [13]. The system is controlled by a traffic controlcenter and an automatic incident detection system which uses the data from theradar sensor to display different speed and lane control signs on the variable messagesigns when necessary.

Since the Stockholm Motorway Control System is installed on six different roads,the traffic conditions such as traffic intensity, number of lanes, and the number oftaxis travelling on each road may vary. Therefore only the radar sensors located onthe E4 motorway are used in this study. Radar sensors that are located on an on- oroff-ramp are not used since the fixed point average speed from those radar sensorsare more difficult to compare to the taxi floating car data. The length of the part ofthe E4 motorway in Stockholm that has radar sensors is approximately 26 km long

17

Stockholm

Lidingö

Nacka

Solna

Danderyd

Sollentuna

265

75

226

73

226

E4

E4

Figure 3.2. The location of the roads with radar sensors in relation to the city ofStockholm. The solid line is the E4 motorway, and the dashed lines are the parts ofthe 5 other roads with radar sensors.

and the distance between the gantries are on average 333 m. The shortest distancebetween two gantries is 65 m and the longest distance is 690 m. There are in total155 gantries placed on the E4 motorway in Stockholm, 74 of the gantries are in thenorthbound direction and 81 of the gantries are in the southbound direction. The155 gantries has in total 511 radar sensors, where each gantry has either 2, 3, 4 or5 radar sensors, one for each lane. At positions on the E4 motorway where thereis an on- or off-ramp, the gantry at such a location is in most cases placed eitherright before or right after the on- or off-ramp.

The radar sensor data is sent to a central system and then forwarded as aneXtensible Markup Language (XML) message to the Division of Transport andLogistics at KTH. The Division of Transport and Logistics at KTH has for sometime been storing the XML messages by concatenating them into files stored on aserver. The radar sensor data was provided to this study in the form of concatenatedXML files.

The location of each radar sensor is available in a Keyhole Markup Language(KML) file which can be viewed using Google Earth1, viewing the placement of eachradar sensor together with a satellite image of the surrounding area.

1http://earth.google.com

18

3.1.1 Radar Sensor Data PreparationThe radar sensor data from the Stockholm Motorway Control System was forwardedto the Division of Transport and Logistics at KTH as XML messages. The XMLmessages was then concatenated and stored in files. The XML messages containedeither radar sensor reading related data, or traffic related messages such as incidentreports. With the purpose of making it easier to analyze the radar sensor data,the relevant attributes of the XML messages were identified, and the data for thoserelevant attributes was extracted and stored in a database. With the help of anXML Schema Definition (XSD) file explaining the structure of the XML messages,and a document from the manufacturers of the Motorway Control System explainingthe meaning of each attribute in the XML messages, the relevant attributes of aradar sensor reading were identified as:

• Unique name of the gantry the radar sensor was placed on

• Lane number

• Timestamp

• Status of the radar sensor and the radar sensor reading

• Flow of traffic during the last minute

• Average speed during the last minute

Since the radar sensor data was going to be stored in a database, tables werecreated in a PostgreSQL database which could efficiently store the radar sensor data.To extract all the distinct radar sensor readings from the files of concatenated XMLmessages a program was implemented in C++ on Ubuntu Linux using the tinyxml2library to parse each XML message, and libpq3 library to handle the communicationwith the PostgreSQL database. Code was written to extract each separate XMLmessage from each file of concatenated XML messages by doing consecutive searchesof the beginning and end of an XML declaration. Once each separate XML messagecould be extracted, code was written to parse the XML message using the tinyxmllibrary and the relevant data for each radar sensor reading could be extracted. Whenthe parsing of the XML messages and extraction of the radar sensor readings wereverified to be correct, code to insert the radar sensor readings into the database waswritten using the libpq library. Radar sensor data from July, August, Septemberand October during 2010 were inserted into the database. Approximately 50% ofthe radar sensor readings stored in the XML files were duplicates, but each duplicatewas filtered, and no duplicates were inserted into the database.

2http://www.grinninglizard.com/tinyxml/3http://www.postgresql.org/docs/8.1/static/libpq.html

19

3.1.2 Radar Sensor Data Analysis and CharacteristicsAfter the radar sensor data had been inserted into the database, the data wasanalyzed. The purpose of the analysis was to identify valid and invalid radar sensorreadings, find the reason for invalid radar sensor readings and find how many of theradar sensor readings that are missing. Radar sensor data collected during July,August, September and October 2010, were inserted into the database. The radarsensor data was analyzed together with a document describing the meaning of theattributes for each radar sensor reading.

After the radar sensor data was analyzed, each radar sensor reading was classifiedas one of the following:

• Positive reading

• Zero reading

• Error reading

A positive reading is a radar sensor reading which is considered valid andhas a positive fixed point traffic flow value. A zero reading is a radar sensorreading with a fixed point traffic flow value of zero. A zero reading can occur whenthere are no vehicles passing the radar sensor or when there is a congestion and thevehicles are not moving forward. An error reading is a radar sensor reading thatis neither a positive reading nor a zero reading. An error reading occur when theradar sensor is turned off, when there is an error with the radar sensor, or a faultin the communication with the radar sensor. For some of the positive readings thefixed point average speed value is negative and the reason is overhead, where thevalue is stored as a signed byte but sent as an unsigned byte.

During July, August, September and October of 2010 there were in total 38 167 664(42.8% of optimal) radar sensor readings including error readings. During the sameperiod there are 34 768 696 (39.0% of optimal) valid sensor readings. If there is aradar sensor reading for each minute, for each radar sensor during the four months,the optimal percentage is 100%. In table 3.2 the number of valid radar sen-sor readings together with the percent of optimal for each of the four months arepresented.

Table 3.2. Valid radar sensor readings

Month (2010) Valid sensor readings Optimal percentage (%)July 9 263 163 41.3August 9 177 674 40.9September 8 044 170 37.0October 8 283 689 36.9

A suitable period of time was chosen, during which the radar sensor data andthe taxi floating car data could be compared. After looking at the graphs presented

20

in figure 3.3 and figure 3.4, the radar sensor data from July was choosen for thecomparison. The graphs show the number of valid radar sensor readings on theE4 motorway excluding on- and off-ramps. The number of radar sensor readingsduring July and August are more stable than during September and October. Fromtable 3.2 there is a higher optimal percentage of radar sensor readings during July,therefore July was choosen over August.

0100000200000300000400000500000600000700000

01−Jul 11−Jul 21−Jul 31−Jul 10−Aug 20−Aug 30−Aug

Date

Rad

ar s

enso

r re

adin

g co

unt

Figure 3.3. Number of valid radar sensor readings during July and August rep-resented by the solid line. The dashed line represent the optimal number of radarsensor readings.

0100000200000300000400000500000600000700000

01−Sep 11−Sep 21−Sep 01−Oct 11−Oct 21−Oct 31−Oct

Date

Rad

ar s

enso

r re

adin

g co

unt

Figure 3.4. Number of valid radar sensor readings during September and October.The dashed line represent the optimal number of radar sensor readings.

21

3.2 Road SegmentsA digital representation of the road network of Stockholm was provided to thisstudy by the Division of Traffic and Logistics, in a form of connecting road seg-ments of different lengths, where each road segment represent a small part of theroad network. The length, coordinates and speed limit for each road segment wereavailable in a database. Only road segments representing the 26 km part of the E4motorway in Stockholm, where the radar sensors are placed, are used in this studyand they are in average of length 289 m. The longest road segment is 2 574 m andthe shortest road segment is 0.9250 m. A connection between two road segmentsoccur at every point on the E4 motorway where there is an on- or off-ramp. At somepoints the connection between road segments occur at random positions, limitingthe length of the road segments. There is never one road segment per lane unlessthe lane is used as a separated road, such as lanes occurring by an on- or off-ramp.

The road segments are stored in a PostgreSQL database with a unique identifieralong with its length, speed limit and coordinates. The road segments are alsostored in a KML-file, which can be opened in Google Earth, making it possible toview the road segments on top of a satellite image of the surrounding area.

3.3 Taxi Floating Car DataThe taxi floating car data used in this study is generated by taxis of a studied taxicompany, with approximately 1 500 taxis operating in the Stockholm area [14]. Incomparison there were in total 5 639 active taxis registered in Stockholm Countyon December 31, 2009 [17]. The taxi dispatch system of each taxi is continuouslysending its GPS position, timestamp and status (occupied with passenger or unoc-cupied), to the taxi dispatch center with a period of in average 110 s, dependingon the status of each taxi. The GPS position, timestamp and status sent to thetaxi dispatch center is then forwarded to the Division of Transport and Logistics atKTH.

The route of a taxi can be seen as a sequence of consecutive pairs, where eachpair is consisting of a GPS position and a timestamp:

sequenceroute = ((pos1, ts1), ..., (posn, tsn)) (3.1)

An example of a route of a taxi can be seen in figure 3.5. The sequence of nconsecutive pairs in the route, sequenceroute can be divided into n−1 route pairs,where each route pair, ((posi, tsi), (posti+1, tsi+1)), is consisting of two consecutivepairs from sequenceroute. The process of generating the segment average speedfor each road segment is presented in figure 3.6. The map-matching algorithm ismap-matching the route pairs to the road segments, producing the map-matchedroute pairs. To produce the segment average speed for each road segment, the map-matched route pairs are aggregated into a suitable period of time. The methodof aggregating the map-matched route pairs into segment average speeds will beexplained in the next chapter.

22

(GPS2, ts2)

(GPS1 , ts1 )

(GPS3, ts3)

Figure 3.5. Example of a route of a taxi, sending its GPS position and timestampthree times.

Map-matching

algorithm

Road

segments

Route pairs

Map-matched

route pairsSegment

average

speed

Aggregation

Segment

taxi flowAggregation

Figure 3.6. The process of generating the segment average speed, from the roadsegments and the route pairs.

The content of each map-matched route pair is:

• Route pair start timestamp

• Route pair end timestamp

• Sequence of road segments

• Offset for the first road segment

• Offset for the last road segment

A problem with floating car data can be the occasional low accuracy of theGPS positions and the existence of blind spots where a GPS position cannot be

23

received. Another problem with floating car data is when the time period betweentwo GPS positions in a route pair is long, the probability of errors in the map-matching is increasing. Because of these problems and since the characteristics ofthe map-matching algorithm used is unknown, it could be a possible source of error.

3.3.1 Floating Car Data Analysis and CharacteristicsMap-matched route pairs, for the 7 days presented in table 3.3 were provided to thisstudy in a file by the Division of Traffic and Logistics. For the 7 days, only map-matched route pairs between the time period 06:00 and 22:00 were made available.Except for Thursday 2010-07-01, where data is only available during 08:00–22:00.

Table 3.3. Taxi readings

Day of week Date Distinct taxisThursday 2010-07-01 08:00–22:00 1 226Monday 2010-07-06 06:00–22:00 1 106Tuesday 2010-07-07 06:00–22:00 1 136Wednesday 2010-07-08 06:00–22:00 1 133Tuesday 2010-07-13 06:00–22:00 928Wednesday 2010-07-14 06:00–22:00 945Thursday 2010-07-15 06:00–22:00 984

To make it easier to analyze and to prepare the map-matched route pairs foraggregation, a program was written with the help of the libpq library. The programparsed and extracted the map-matched route pairs from the file, and inserted theminto a database. As can be seen in table 3.3, the number of distinct active taxisduring these days were found to vary between 928 to 1 226. After the map-matchedroute pairs had been analyzed, the average time between the start timestamp andthe end timestamp, was calculated as 81 s.

From the literature, one study compared the speed of taxis travelling with pas-sengers and without passengers, and concluded that there is a significant differenceof the travel speed between the two cases [19]. The reason for the difference in speedwhen no passengers are riding in a taxi, is because the taxi is sometimes riding ata lower speed looking for passengers, or standing still for moments. In this study,both floating car data from taxis travelling with passengers and taxis travellingwithout passengers are used. The reason is that the comparison is conducted onthe E4 motorway and there is no way for a taxi to pick up passengers along the E4motorway.

24

Chapter 4

Comparison Preparation

To answer the questions of this study, it is necessary to compare the fixed pointaverage speed generated by the radar sensors with the segment average speed gen-erated by the taxis. It is also necessary to compare the fixed point traffic flowgenerated by the radar sensors with the segment taxi flow generated by the taxis.This chapter will explain the method used when aggregating the map-matched routepairs and the radar sensor readings. For each group of radar sensors on a gantry,the fixed point average speed is compared with the segment average speed, for theroad segment the gantry is located on. For that reason, each gantry is associatedwith a road segment, and this will be explained later in this chapter.

4.1 Aggregation MethodologyThe radar sensor data stored as radar sensor readings, consist of the fixed pointaverage speed and fixed point traffic flow aggregated into 1 minute periods, withthe timestamp for each radar sensor reading at the start of a minute. The floatingcar data stored as map-matched route pairs contains a start timestamp, an endtimestamp and a sequence of road segments the taxi was driving on between thestart timestamp and the end timestamp. The start timestamp is in most cases notat the start of a minute, and the duration of the map-matched route pairs is onaverage 81 s, but varies from a few seconds up to a few hundred seconds.

The radar sensor data stored as radar sensor readings and the floating car datastored as map-matched route pairs are not directly comparable. To make the radarsensor readings and the map-matched route pairs comparable the two data setshas to be aggregated into a suitable period of time. For this study a time periodof 5 minutes was chosen, since it was the most commonly used time period foraggregation found in the literature. A 5 minutes time period will from here on bereferred to as an aggregation period. In figure 4.1, the process of aggregating theradar sensor data and floating car data is illustrated. Since the segment taxi flowand the fixed point traffic flow are compared for different purposes than the segmentaverage speed and the fixed point average speed, they are aggregated separately.

25

Map-matched route pairs Radar sensor readings

Segment average

speed

Segment taxi

flow

Fixed point

average speed

Fixed point traffic

flow

5 minutesaggregation

5 minutesaggregation

Figure 4.1. The process of aggregating the map-matched route pairs and the radarsensor readings into 5 minutes periods.

4.1.1 Aggregation of Radar Sensor DataThe radar sensor readings was stored in a database and the timestamp of each radarsensor reading is at exactly the beginning of a minute. The radar sensor readingsfrom all the radar sensors on a gantry in each direction was aggregated together ina 5 minutes aggregation period, into a new database table. A database query waswritten to do this for each of the 7 days. For the 7 days, only approximately 40% ofthe radar sensor readings are available, and the remaining 60% of the radar sensorreadings are either missing or classified as error readings.

Fixed Point Average Speed

To calculate the fixed point average speed for each aggregation period, the averagespeed from the approximately 40% radar sensor readings are used, and to simplifythe study, no compensation is made for the missing 60% radar sensor readings. Thefixed point average speed is calculated for each gantry, using radar sensor readingsfrom all the radar sensors located on the same gantry.

Fixed Point Traffic Flow

To calculate the fixed point traffic flow for a gantry in an aggregation period, thecalculated fixed point traffic flow should represent the total number of vehiclespassing by the gantry during that 5 minutes period. Since only approximately40% of the radar sensor readings are available, the fixed point traffic flow will notrepresent the total number of vehicles passing by each gantry during an aggregationperiod. For each gantry and aggregation period the number of available radar sensor

26

readings are saved in a database together with the fixed point traffic flow, making itpossible to compensate for the missing radar sensor readings when the fixed pointtraffic flow is compared to the segment taxi flow.

4.1.2 Aggregation of Floating Car DataThe map-matched route pairs had previously been generated and stored in a database,where each map-matched route pair contain a start timestamp, an end timestamp,sequence of road segments, and offsets:

• Route pair start timestamp, tsi

• Route pair end timestamp, tsj

• Sequence of road segments, (rs1, rs2, ..., rsn)

• Offset from the start of the first road segment in the sequence, offseti

• Offset from the start of the last road segment in the sequence, offsetj

The taxi floating car data stored as map-matched route pairs was aggregatedinto 5 minutes aggregation periods. When aggregating the map-matched route pairsthere are two possible cases that can occur for each map-matched route pair. Inthe first case both tsi and tsj are in the same aggregation period. In the secondcase, tsi and tsj are in distinct aggregation periods and the map-matched routepair is therefore split up into two new map-matched route pairs. The second caseis illustrated in figure 4.2. When tsi and tsj are in distinct aggregation periods,the map-matched route pair is then split up by first finding the breakpoint in time,tsbp, which is the start timestamp of the aggregation period for which tsj is in.

12:1 5:00 12:20:00 12:25:00

tsi tsbp tsj

Aggregation period Aggregation period

Map matched route pair

Figure 4.2. An example of when tsi and tsj , for a map-matched route pair is indistinct aggregation periods. The map-matched route pair is used for aggregation forboth aggregation periods, and the map-matched route pair is split up into two newmap-matched route pairs.

The ratio from equation (4.1) is multiplied with the total length of the map-matched route pair, to find the breakpoint of the distance of the map-matched routepair.

(tsbp − tsi)/(tsj − tsi) (4.1)

27

The sequence of road segments from the map-matched route pair are then splitup into two groups according to the breakpoint of the distance of the map-matchedroute pair. New offsets are calculated and the road segments are added to theirrespective aggregation period.

Segment Average Speed

For each map-matched route pair the segment average speed, vavg, the taxi travelledwith from start timestamp, tsi to end timestamp, tsj is calculated as in equation(4.2) and each road segment in the map-matched route pair is associated with thisaverage speed.

vavg = (len(rs1) + len(rs2) + ...len(rsn−1)− offseti + offsetj)/(tsj − tsi) (4.2)

To calculate the segment average speed for a road segment in an aggregation periodfrom the map-matched route pairs, a possible method is to calculate the segmentaverage speed as the average of all the speed values associated with that roadsegment in that aggregation period. But since the first road segment of a map-matched route pair occur as the last road segment in the preceding map-matchedroute pair. And the last road segment of a map-matched route pair occur as thefirst road segment in the sequent map-matched route pair, those road segments willthen affect the aggregated average speed with a weight 2. For that reason a weightis associated with each road segment of each map-matched route pair together withthe speed. If the road segment is not the first or the last road segment of a map-matched route pair, the road segment is associated with weight 1. If the roadsegment is the first in the sequence of road segments in a map-matched route pair,the road segment is associated with a weight calculated as in equation (4.3). If theroad segment is the last in the sequence of road segments it is associated with aweight calculated as in equation (4.4).

(len(rs1)− offseti)/len(rs1) (4.3)

offsetj/len(rsn) (4.4)

After the association of average speed and weight values, the sequence of road seg-ments for each map-matched route pair consists of ((rs1, vavg, w1), ..., (rsn, vavg, wn)),where vavg is the average speed for the map-matched route pair, and wi is the weightassociated with road segment, rsi. The road segments from all the map-matchedroute pairs can then be grouped into lists according to their aggregation period, apand road segment name, rsname: (ap, rsname, v1, w1), ..., (ap, rsname, vn, wn). Andthe segment average speed for each road segment and aggregation period is calcu-lated as in equation (4.5).

(v1 ∗ w1 + ...+ vn ∗ wn)/(w1 + ...+ wn) (4.5)

28

Segment Taxi Flow

To calculate the segment taxi flow for each road segment in each aggregation periodfrom the map-matched route pairs, the aggregated flow need to represent the numberof taxis passing the road segments associated to a gantry during the aggregationperiod. A possible method to calculate the segment taxi flow for each road segmentin each aggregation period is to count each occurrence of the road segment in themap-matched route pairs in that aggregation period as a taxi passing the gantry. Aswill be explained there is a problem with this simple approach, since it is possiblea taxi will be counted twice. A possible case is when the taxi is sending its GPS

Gantry

GPS1 GPS2 GPS3

Mapmatched route pair 1 Mapmatched route pair 2

Timestamp: 13:04:30

Timestamp: 13:05:00

Timestamp: 13:05:40

0.6 0.4

Road segment

Figure 4.3. Example of how the flow is calculated from two map-matched routepairs, based on three consecutive GPS positions from the same taxi. The first GPSposition is sent on a road segment not included in the association between gantry androad segment. The second GPS position is sent on the road segment associated to thegantry, and the third GPS position is sent on a road segment not associated to thegantry. The two map-matched route pairs are in two different aggregation periods andthe calculated flow for the gantry will therefore be different for the two aggregationperiods for this taxi. In this example the flow for the taxi in the aggregation period13:00:00–13:05:00 is 0.6 and the flow for the taxi in the aggregation period 13:05:00–13:10:00 is 0.4.

position while driving on the road segment. The road segment will then be presentas the last road segment in one map-matched route pair, and the first road segmentin the next map-matched route pair and the taxi will then be counted twice. Whenthe road segment is very long or the traffic along the road segment is moving at aslow speed it is also possible that the taxi will send its GPS position several timeswhile driving on the same road segment. Resulting in several map-matched routepairs for the same road segment and the taxi will be counted several times. To solvethis problem the weight explained earlier is used to represent the flow of a taxi. Theflow of a road segment for an aggregation period, calculated for a taxi is equal tothe ratio of the distance the taxi travelled in that aggregation period, on that roadsegment, in relation to the length of the road segment (see example in figure 4.3).By having the weight represent the flow, the flow for each map-matched route pairwill only be counted as the ratio between the distance travelled on the road segmentin a map-matched route pair divided by the total distance of the road segment.

29

4.2 Association of Road Segments and Radar SensorsTo be able to compare the fixed point average speed generated from the radarsensors, with the segment average speed generated by the taxi floating car data, itis necessary to know which road segment each group of radar sensors on a gantry islocated on. In this chapter it is explained how each group of radar sensors locatedon a gantry is associated to the road segment the gantry is located on. To beable to associate the radar sensors with the road segments on the E4 motorwayin Stockholm, the KML-file containing the location of the road segments, and theKML-file containing the location of the radar sensors are used. The KML-file withthe radar sensors contain a yellow marker for each radar sensor, together with thename of the gantry and lane the radar sensor is located on. The radar sensors areplaced on gantries, with one radar sensor for each lane on the E4 motorway. Sincethere is only one road segment for all the lanes in a direction on the E4 motorway,but one radar sensor for each lane, the association is conducted between a roadsegment and a gantry.

The KML-file with the road segments contain a blue curve for each road segmenton the road network near the E4 motorway. When the KML-file containing the roadsegments and the KML-file containing the radar sensors are opened in Google Earthsimultaneously, then the blue curves representing the road segments and the yellowmarkers representing the radar sensors, are displayed above the satellite image ofStockholm. This make it possible to distinguish which road segments that are onthe E4 motorway. The association between the gantries and the road segments isconducted in Google Earth, by associating each gantry with the road segment itis located on. Of the 155 gantries on the E4 motorway it is possible to associate152 of the gantries to road segments. For 3 of the gantries the radar sensors areeither not present in the KML-file or their positions are incorrect and are thereforenot included in the association. For some of the associations the gantry is locatedat the beginning or the end of a road segment. And to make the association moreuniform, additional road segments are included into the association where possible,to locate the gantry in the middle of the distance of the road segments included inthe association. The association between road segments and gantries are extendedwith 8 additional road segments to make the conditions in the association moreuniform. After the association is completed each pair of road segment and gantryare inserted into a database.

4.2.1 Characteristics of the AssociationThe association between road segments and the gantries consists of in total 160 pairs,76 pairs in the northbound direction and 84 pairs in the southbound direction. Thetotal length of the road segments included in the association is 45 056 m. Comparedto the total length of 52 124 m for all the road segments in both the southbound andthe northbound direction of the E4 motorway with radar sensors. The associationis then covering 86.4% of the part of the E4 motorway which has radar sensors.

30

The road segments that are associated to a gantry will here be referred to as anassociation segment.

Table 4.1. Distribution of gantries grouped by the length of its association segment

Length (m) Gantry count0–100 12100–200 26200–300 13300–400 9400–500 12500–600 9600–700 3700–800 9800–900 17900–1 000 81 000–1 500 191 500–2 000 11>2 000 4

Table 4.2. Distribution of the number of gantries per association segment

Gantry count Distribution1 642 173 10

Gantry count Distribution4 25 26 1

The length of each association segment varies between 36 m and 2 574 m, and intable 4.1, the distributions of the gantries grouped by the length of its associationsegments is presented. And as can be seen the length of the association segments arenot uniform. In table 4.2 the distribution of the number of gantries per associationsegment are presented. And as can be seen for some of the association segments,there are several gantries associated to the same association segment. In AppendixA a detailed list of the gantries and the length of the association segments arepresented. The road segments used in this study are not created with the purpose ofbeing used when comparing radar sensor data with floating car data. Therefore thelength and placement of road segments in relation to gantries varies from associationsegment to association segment. Since some of the association segments are longerthan 2 000 m, several gantries are associated to the same association segment. As ispointed out in the literature, the comparison of data from a fixed point sensor suchas a radar sensor with floating car data will potentially produce some errors. Sincethe conditions in the measurement from the two technologies are not uniform [6]. Apotential source of error when comparing the radar sensor data with the floating car

31

data is the fact that there is a large variation in length of the association segments,and there are several gantries associated to the same association segment.

A

B

C

D

E

Figure 4.4. A) A gantry located near the end of its association segment. B) Agantry located in the middle of its association segment. C) A gantry located near thebeginning of its association segment. D) A gantry associated with two road segmentstwo make the conditions more uniform. E) For some of the association segments thereare several gantries associated to the same association segment.

The location of each gantry in relation to its association segment varied fromgantry to gantry, where some gantries are located in the beginning of the associationsegment, some in the middle of the association segment and some gantries arelocated in the end of the association segment. This variation is illustrated in figure4.4. The fact that the location of each gantry in relation to its association segmentvaries from gantry to gantry is making the conditions non-uniform and this can bea potential source of error.

Table 4.3. Association segments grouped by their speed limit

Speed limit (km/h) Count30 170 6690 34110 3

In figure 4.3 the different speed limits for the association segments and theirdistribution across the association segments are presented. A speed limit of 70km/h is the most common speed limit for the association segments, followed by 90km/h.

32

Part II

Comparison

33

Chapter 5

Statistical Background

The statistical theory in this chapter is based on theory from the literature [9, 12, 18].If X = {x1, ..., xn} is a set of values, the mean of the set is calculated as in

equation (5.1).X̄ = x1 + ...+ xn

n(5.1)

To measure the variability of the set X, the standard deviation can be calculatedas in equation (5.2).

σ =

√√√√ 1n

n∑i=1

(xi − X̄)2 (5.2)

The standard deviation measures how much the values of the set varies from themean value. The coefficient of variation is the ratio between the standard deviationand the mean of the set, and can be calculated as in equation (5.3).

CV = σ

|X̄|(5.3)

The coefficient of variation is useful when comparing the degree of variation fromone set to another.

5.1 Regression AnalysisRegression is the analysis of the relationship between two variables X and Y. Withn pairs of observations, (x1, y1), ..., (xn, yn), the goal of a regression analysis is toanalyze how the possible values of X impact the values of Y. X is usually calledthe response or the dependent variable, and Y is called the explanatory, predictoror independent variable. In a simple linear regression model there is a single ex-planatory variable and a single response variable, and the explanatory variable ismodeled by equation (5.4). Where β0 is the intercept, β1 is the slope and ε is therandom error component.

Y = β0 + β1X + ε (5.4)

34

In a scatter plot graph the explanatory variable of each observation is plotted againstthe response variable of each observation. The simple linear regression model canbe plotted as a straight line, and is then called the regression line. To model then pairs of observations with the regression model, the observations has to be fittedinto the model, which can be done by estimating the values of β0 and β1 from the nobservations. For each observation (xi, yi) the deviation from the regression modelcan be calculated as in equation (5.5) and it can be seen as the vertical signeddistance from the point to the regression line.

yi − (β0 + β1xi) (5.5)

To estimate the parameters β0 and β1, the method of least squares can be used,where the sum of the squared deviations for each observation is considered.

Q =n∑

i=1(yi − β0 − β1xi)2 (5.6)

The method of least squares, estimates the values of β0 and β1 by choosing thosevalues that minimize the sum Q.

5.2 CausalityA response variable Y is said to depend casually on the explanatory variable X,if a cause-and-effect pattern can somehow be implied. If a statistical relationshipbetween the response variable, Y and the explanatory variable, X can be found, it isnot enough to imply that Y depends casually on X. Additional analysis are neededto obtain better understanding of the casual relationship between the explanatoryvariable and the response variable, and to exclude the possibility of confoundingfactors. A confounding factor, also called lurking variable, is a variable not includedin the statistical analysis, but affecting the results of the statistical analysis by falselyimplying a relationship between the response variable and the explanatory variable.

5.3 Observational StudyObservational studies are used to observe and model the relationship between dif-ferent variables of data in a study. For an observational study, it is only possibleto establish a statistical relationship between variables in the data, but usually notpossible to infer a casual relationship between the variables of the data. A casualrelationship can only be inferred from the relationship, if the potential confound-ing factors can be identified and analyzed to rule out the possibility of alternativecasual relationships between factors. With observational studies there is a risk ofobserving an outlier which could be the result of an error in the data. In contrastto observational studies where the values of the explanatory variable can only beobserved, the values of an explanatory variable in an experimental study can be

35

controlled by altering the values of the explanatory variable and then observing thechanges on the response variable. For this reason a casual relationship between theexplanatory and the response variable can usually only be inferred in experimentalstudies.

36

Chapter 6

Comparison Methodology

This chapter will explain the methodology for the comparison used to generate theresults presented in the next chapter. In previous chapters it was explained how thesegment average speed and segment taxi flow was calculated from the taxi floatingcar data. And how the fixed point average speed and fixed point traffic flow wascalculated from the radar sensor readings. To answer the questions of this studythe segment average speed is compared to the fixed point average speed and thesegment taxi flow is compared to the fixed point traffic flow. Only data from thetime periods of the 7 days presented in table 6.1 is used in the comparison, sinceonly taxi floating car data for these 7 days were provided to this study.

Table 6.1. Time periods for comparison

Day of week Date Comparison pointsThursday 2010-07-01 08:00–22:00 14 908Monday 2010-07-06 06:00–22:00 15 624Tuesday 2010-07-07 06:00–22:00 16 544Wednesday 2010-07-08 06:00–22:00 16 755Tuesday 2010-07-13 06:00–22:00 14 110Wednesday 2010-07-14 06:00–22:00 15 593Thursday 2010-07-15 06:00–22:00 15 370

6.1 Average Taxi Penetration RateThis section will explain the methodology for comparing the fixed point traffic flow,calculated from the radar sensors, with the segment taxi flow calculated from thefloating car data. To compare the fixed point traffic flow with the segment taxi flow,the estimated average taxi penetration rate during the time periods for the 7 daysis calculated. The taxi penetration rate is here referred to as the percentage oftaxis from the studied taxi company passing the gantries, compared to the estimatedtotal number of vehicles passing the gantries during the time periods of the 7 days.

37

The total number of taxis of the studied taxi company passing the gantries duringthe time periods of the 7 days will from here on be referred to as the taxi flow.For each association segment and each aggregation period, the segment taxi flowis available. The taxi flow is calculated by summing the segment taxi flow for allassociation segments available for the time periods of the 7 days. The total numberof vehicles passing the gantries calculated from the available radar sensor readingswill from here on be referred to as the available traffic flow. As was previouslymentioned, there are missing radar sensor readings and it is therefore not possible tocalculate the total number of vehicles passing the gantries during the time periodsof the 7 days. But since the optimal percentage of radar sensor readings is available,the estimated total number of vehicles passing the gantries during the time periodsof the 7 days can be calculated. Which from here on will be referred to as theestimated traffic flow

In table 6.2 the available traffic flow of the radar sensor readings included inthe aggregation for each day is presented with the optimal percentage. If thereis a radar sensor reading for each radar sensor on each gantry for all aggregationperiods, there is a 100% optimal percentage. But since there are error readings,including missing and erroneous radar sensor readings for some of the aggregationperiods the optimal percentage of the radar sensor readings is not 100% for any ofthe 7 days. In table 6.2, it can be seen that the optimal percentage is varying forthe 7 days.

Table 6.2. Available and estimated flow from the radar sensors including percentageof optimal radar sensor readings.

Date Availabletraffic flow

Estimatedtraffic flow

Optimal percentage (%)

2010-07-01 08:00–22:00 5 380 510 5 442 270 66.32010-07-06 06:00–22:00 2 451 420 5 712 904 33.92010-07-07 06:00–22:00 2 586 050 5 794 535 35.72010-07-08 06:00–22:00 2 943 060 5 941 406 40.32010-07-13 06:00–22:00 2 484 510 5 118 310 39.12010-07-14 06:00–22:00 2 613 000 5 367 541 40.52010-07-15 06:00–22:00 2 512 940 5 236 555 39.1

To calculate the average taxi penetration rate for the 7 days, the estimated trafficflow has to be estimated from the available traffic flow, since the optimal percentagefor the 7 days is not 100%. To calculate the estimated traffic flow it is assumedthat for each gantry, in each aggregation period, the number of vehicles passing thegantry is equal for each lane. The estimated total flow, flest for each gantry in eachaggregation period is calculated as in equation (6.1). Using the optimal number ofradar sensor readings for the aggregation period, opt, the available number of radarsensor readings for the aggregation period, num, and the flow from the aggregationperiod, fl. The optimal number of radar sensor readings, opt, for a gantry in an

38

aggregation period is 5 radar sensor readings for each lane.

flest = (opt/num) ∗ fl (6.1)

The estimated traffic flow is only based on the fixed point traffic flow values froma gantry in an aggregation period that is calculated from at least 1 radar sensorreading. The estimated traffic flow from the radar sensor readings is assumed cor-rect. The impact on this assumption is unknown and it could be a possible sourceof error. The estimated traffic flow for each day is presented in table 6.2.

To compute the taxi flow for each of the 7 days, the previously calculated weightvalue for each association segment in a map-matched route pair is summed. Sinceradar sensor readings are not available for all aggregation periods and gantries, thesegment traffic flow for all aggregation periods and association segments can notbe used. The taxi flow and the estimated traffic flow is only calculated for theaggregation periods and at the gantries which has a fixed point traffic flow value.

The association between gantries and road segments is extended to make theconditions as uniform as possible, and for some of the gantries, there are severalroad segments associated to the same gantry. The association segments from theextended association is not used when calculating the taxi flow to prevent individualtaxis to be counted twice. The segment taxi flow from association segments that hasseveral gantries associated to it, are only counted once for each distinct associationsegment to prevent the same taxi to be counted several times.

The average taxi penetration rate for the 7 days is calculated as the taxi flowdivided by the estimated traffic flow. The average taxi penetration rate for theseparate days and the separate hours of each of the 7 days are calculated as well.

6.2 Taxi and Traffic Average Speed ComparisonFor every 5 minutes aggregation period during 06:00–22:00 for the 7 days, the fixedpoint average speed and fixed point traffic flow from a gantry, is combined to thesegment average speed and segment taxi flow for the gantry’s association segmentin the same aggregation period. The data sets are combined using the associationsegments, creating comparison points. If the fixed point traffic flow for a compar-ison point is 0, where all the radar sensor readings are either missing or erroneous,then that comparison point is not included. And if there are no segment averagespeed or segment taxi flow for an aggregation period and association segment, thenthat comparison point is not included. For the 7 days there are in total 108 904comparison points, and approximately 53% of the optimal number of comparisonpoints for the 152 gantries are included in the association. The distribution of thecomparison points for the 7 days is presented in table 6.1. Each comparison pointconsists of the following information:

• Gantry name

• Road segment name

39

• Timestamp of the aggregation period

• Fixed point average speed

• Fixed point traffic flow

• Segment average speed

• Segment taxi flow

To answer the question on how many taxis are needed to have certainty in thespeed measurement from the taxis, it is possible to either calculate the penetrationrate of taxis in each comparison point and use that as a basis. Or it is possibleto use the segment taxi flow in each comparison point as a basis for investigatingthe relationship. In figure 6.1 the distribution of comparison points grouped by thesegment taxi flow is presented. And it can be seen that 85.5% of the comparisonpoints had a segment taxi flow of 5 or less, and 98.7% of the comparison points hada segment taxi flow of 10 or less. The segment taxi flow for each comparison point islow and if the taxi penetration rate is calculated, the fixed point traffic flow can havea large impact on the size of the taxi penetration rate in each comparison point. Forthat reason the segment taxi flow in the comparison points are used instead of thepenetration rate of taxis in each comparison point for investigating the relationshipbetween the number of taxis and the certainty in the speed measurement.

1 2 3 4 5 6 7 8 9 10

Distribution of distinct segment taxi flow

Com

paris

on p

oint

s

0

10000

20000

30000

40000

29.7%

22.6%

15.2%

10.6%

7.5%

5.1%3.4%

2.3%1.4% 0.9%

Figure 6.1. Distribution of comparison points grouped by the segment taxi flow percomparison point with the percentage of the comparison points above each bar.

40

The comparison points are grouped by the segment taxi flow in each comparisonpoint. The comparison speed difference, the average difference between thefixed point average speed previously computed from the radar sensor readings, andthe segment average speed previously computed from the taxi floating car data, iscomputed for each comparison point. The comparison speed difference, vd, for eachcomparison point is calculated as in equation (6.2), using the fixed point averagespeed, vradar and the segment average speed, vtaxi in each comparison point.

vd = vtaxi − vradar (6.2)

6.2.1 Certainty on all the TrafficA graph is created to present the relationship between the segment taxi flow andthe comparison speed difference. To further understand the relationship betweenthe segment taxi flow and the comparison speed difference in a comparison point,a scatter plot graph is created for each of the distinct segment taxi flow values.Each scatter plot graph contain the segment average speed plotted against the fixedpoint average speed for each comparison point using a density function. Wherethe color is darker where there is a high concentration of comparison points, andlighter where there is a lower concentration of comparison points. In each graph thefitted regression line is plotted for each data set, using the least squares estimatorto estimate the slope and intercept of the regression line.

6.2.2 Certainty during Time of the DayTo answer the question on how the certainty of the speed of the taxis change duringthe time of the day, the comparison points is not only grouped by the segment taxiflow but also one group for each of the 16 hours of the day during 06:00–22:00. Agraph is created for each of the 16 hours and in each graph the comparison speeddifference is plotted for each distinct segment taxi flow value for that hour.

6.2.3 Certainty during CongestionTo investigate the relationship between the certainty of speed measured from taxisduring congestion, the comparison points are divided into groups classified as eithercongested or free-flow. The distribution of the speed limits across the associationsegments presented earlier showed that the majority of the association segments hasa speed limit of either 70 km/h or 90 km/h. And a few association segments has aspeed limit of 110 km/h. For that reason the traffic conditions for the comparisonpoints are grouped into three groups: congested, free-flow and high-speed free-flow.Which group a comparison point belong to is based on the fixed point average speed:

• If the fixed point average speed in a comparison point is below 70 km/h, thenthat comparison point belong to the group congested.

41

• If the fixed point average speed of a comparison point is between 70 km/hand 110 km/h the comparison point belong to the group free-flow.

• If the fixed point average speed of a comparison point is above 110 km/h, thecomparison point belong to the group high-speed free-flow.

In table 6.3 the distribution of the comparison points into the three groupsbased on the fixed point average speed of each comparison point is presented. Themajority of the comparison points are a part of the free-flow group, followed by thecongestion group.

Table 6.3. Distribution of comparison points into groups of traffic conditions

Group CountCongestion 14 108Free-flow 92 396High-speed free-flow 2 400

To present the relationship between the segment taxi flow and the comparisonspeed difference, a graph is created for each of the three groups of traffic conditions.In figure 6.2 the distribution of comparison points for the segment taxi flow in eachof the three groups are presented. The graph with the distribution for the congestedand free-flow traffic group follow the same pattern as the distribution in figure 6.1.

1 3 5 7 9

Congestion

Distinct segment taxi flow

Com

paris

on p

oint

s

0

2000

4000

6000

8000

1 3 5 7 9

Free−flow


Com

paris

on p

oint

s

0

5000

10000

15000

20000

25000

30000

35000

1 3 5 7 9

High−speed free−flow


Com

paris

on p

oint

s

0

200

400

600

800

Figure 6.2. Distributions of comparison points grouped by the distinct segmenttaxi flow, with one graph for each of the three groups of traffic conditions. Note thedifference in the scales between the three graphs.

To further understand the relationship between the segment taxi flow in a com-parison point and the comparison speed difference, a scatter plot graph is createfor each distinct segment taxi flow. Where each scatter plot graph is presenting therelationship of the segment average speed and the fixed point average speed. To-

42

gether with the scatter plot graph for each distinct segment taxi flow, the regressionline is calculated and plotted for each of the traffic condition groups.

43

Chapter 7

Results

In this chapter the results are presented, based on the comparison methodologyexplained in the previous chapters. The results are presented with a section foreach of the research questions specified for this study. The results presented in thischapter are based on the taxi floating car data and radar sensor data during 06:00–22:00 for 7 days in July, presented in table 7.1. Except for Thursday 2010-07-01,where data is only available during 08:00–22:00.

Figures and tables are presented explaining the relationship between the taxifloating car data and the radar sensor data. In previous chapters the methodologyfor the preparation and aggregation into 5 minutes periods of the radar sensorreadings and taxi floating car data was explained, together with the combining intocomparison points. To be able to answer the questions regarding the certainty inspeed measured from taxis, in total 108 904 comparison points was used.

7.1 Taxi Penetration RateThe taxi penetration rate can be interpreted as the average percent of taxis fromthe studied taxi company passing the gantries on the E4 motorway, in relation tothe estimated number of vehicles passing the gantries. With the help of tables andgraphs, this section will present the results that will make it possible to answer thefollowing research question specified in the beginning of this report:

• What is the fraction of taxis compared to the general traffic? Per day? Perhour?

The average taxi penetration rate for the traffic measured at the gantries duringthe time periods for the 7 days is 0.52%. In table 7.1, the taxi flow calculated fromthe taxi floating car data and the estimated traffic flow calculated from the radarsensor data for each of the 7 days are presented.

The lowest taxi penetration rate is measured as 0.45% on Tuesday 2010-07-13and Wednesday 2010-07-14. The highest taxi penetration rate is measured as 0.61%on Thursday 2010-07-01. As can be seen in figure 7.1, the taxi flow in relation to

44

Table 7.1. Flow of traffic

Date Taxi flow Estimatedtraffic flow

Taxi penetra-tion rate (%)

Comparisonpoints

2010-07-01 32 991 5 442 270 0.61 14 9082010-07-06 28 707 5 712 904 0.50 15 6242010-07-07 30 741 5 794 535 0.53 16 5442010-07-08 33 908 5 941 406 0.57 16 7552010-07-13 22 909 5 118 310 0.45 14 1102010-07-14 24 055 5 367 541 0.45 15 5932010-07-15 25 838 5 236 555 0.49 15 370Total 199 149 38 613 521 0.52 108 904Average 28 450 5 516 217 - -

the average taxi flow for the 7 days varies more than the estimated traffic flow inrelation to the average estimated traffic flow for the 7 days. The standard deviationfor the taxi flow is σtaxi = 4 332 and the coefficient of variation is CVtaxi = 15.2%.The standard deviation for the estimated traffic flow is σtraffic = 305 790 and thecoefficient of variation is CVtraffic = 5.5%.

●

●●

●

● ●●

01/07 03/07 05/07 07/07 09/07 11/07 13/07 15/07

0.00

0.25

0.50

0.75

Date

Taxi

pen

etra

tion

rate

(%

)

Figure 7.1. Taxi penetration rates for each of the 7 days represented by a dot, andthe average taxi penetration rate for all the 7 days represented by a dashed line.

Figure 7.2, show a variation of the penetration rates for each of the 16 hours.From figure 7.2 it is difficult to see any relationship between the taxi penetrationrate and the hour of the day. Except that the taxi penetration rate varies moreduring 21–22 than for the previous hours.

7.2 Certainty of Speed Measurement of TaxisThis section will present the results that will help answer the following researchquestion specified in the beginning of this report:

45

7 8 9 11 13 15 17 19 21

0.00

0.25

0.50

0.75

1.00

Hour

Taxi

pen

etra

tion

rate

(%

)

Figure 7.2. Taxi penetration rates grouped by the hour of the day. Taxi penetrationrates for each of the 7 days are represented by a dashed line, and the average taxipenetration rate during the 7 days is represented by a solid line.

• How many taxis are needed to have certainty in the speed measurement fromthe taxis?

In figure 7.3 the comparison points are grouped by the segment taxi flow value,and the figure presents the average comparison speed difference for each of thedistinct segment taxi flow values. As can be seen from the figure, the segment taxiaverage speed calculated from the taxi floating car data is between 1.5 km/h and8.3 km/h higher than the fixed point average speed calculated from the radar sensordata. For groups of comparison points with a segment taxi flow value from 1 to 6,figure 7.3 indicates that the average comparison speed difference is increasing as thesegment taxi flow is increasing. For groups of comparison points with a segment taxiflow value from 7 to 10, figure 7.3 indicates that the comparison speed difference isstabilized at approximately 8 km/h.

●

●

●

●

●● ● ●

● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 2 3 4 5 6 7 8 9 10

0

2

4

6

8

Figure 7.3. Average comparison speed difference for comparison points grouped bythe segment taxi flow.

46

In figure 7.4 the density and spread are presented when the segment averagespeed is plotted against the fixed point average speed. The figure contains onescatter plot for some of the number of distinct segment taxi flow values in thecomparison points. If the segment average speed in each comparison point wouldbe equal to the fixed point average speed in the same comparison point, all thepoints would lie on the perfect regression line. In the scatter plot it can be seenthat there is a higher variation in the graphs with lower value of segment taxi flowin the comparison points. For all the scatter plot graphs in figure 7.4 there is ahigher concentration of points in the range from 70 km/h to 100 km/h. In figureC.1 in Appendix C, a scatter plot graph for each of the distinct segment taxi flowvalues are presented.

0 50 100 150 200 250

050

150

250

Fixed point average speed (km/h)

Seg

men

t ave

rage

spe

ed (

km/h

) Segment taxi flow = 1

0 50 100 150 200 250

050

150

250


Seg

men

t ave

rage

spe

ed (

km/h


0 50 100 150 200 250

050

150

250


Seg

men

t ave

rage

spe

ed (

km/h


0 50 100 150 200 250

050

150

250


Seg

men

t ave

rage

spe

ed (

km/h


Figure 7.4. A scatter plot for some of the distinct segment taxi flow values. Present-ing the relationship between the segment average speed and the fixed point averagespeed from the comparison points. The solid line represent the perfect regression lineand the dashed line represent the fitted regression line.

47

●●

●●

●● ●

● ●●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 06−07

●

●

●

●●

●●

●● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 07−08

●

●

●

●●

● ● ●● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 08−09

● ●

●●

●● ●

●●

●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 09−10

● ●

● ●● ●

●●

●

●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 10−11

●

●●

●● ●

● ● ● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 11−12

● ●

● ●

●●

●●

● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 12−13

●●

●

● ● ● ●●

●●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 13−14

● ●

●

●

●● ●

●● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 14−15

●●

●●

● ● ● ● ●

●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 15−16

● ●

●●

●

● ● ● ● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 16−17

●

●● ●

● ●●

●

● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 17−18

●●

● ●

●● ● ●

●

●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 18−19

● ●

● ●●

●

●

●

● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 19−20

●●

●

●● ●

● ●

● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 20−21

●

●● ●

●●

●●

●

●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Hour 21−22

Figure 7.5. The comparison points are grouped into the 16 hours of the day, withone graph for each hour. Each graph presents the comparison speed difference foreach of the distinct segment taxi flow values during the hour.

48

7.3 Certainty of Speed Measurement during DifferentHours of the Day

This section will present the results that will help answer the following researchquestion specified in the beginning of this report:

• Does the number of taxis needed to have certainty in the speed measurementchange during the time of the day?

In figure 7.5 the comparison speed difference is plotted against the distinct seg-ment taxi flow values for each of the 16 hours. For each the 16 hours the comparisonspeed difference for a segment taxi flow value of 1 to 5, follow the same pattern aspresented in figure 7.3. Where the comparison speed difference is approximately 2km/h for the segment taxi flow equal to 1, and then increasing as the number ofdistinct segment taxi flow value is increasing, but at different rates. Looking at thegraphs for distinct segment taxi flow values above 7 for all the of 16 hours, a highervariation can be seen. For some of the graphs in figure 7.5 the comparison speeddifference is approximately 8 km/h, but for some other graphs the comparison speeddifference is slightly higher or much lower.

7.4 Certainty of Speed Measurement of Taxis duringCongestion

This section will present the results that will help answer the following researchquestion specified in the beginning of this report:

• Does the number of taxis needed to have certainty in the speed measurementchange when there is congestion?

Figure 7.6 show that the average comparison speed difference for the three clas-sifications of traffic are different. The average comparison speed difference for free-flow traffic conditions is more equal to the average comparison speed difference forall the traffic presented in figure 7.3. The average comparison speed difference fortraffic conditions classified as either congested or high-speed free-flow is not equalto the average comparison speed difference for all the traffic presented in figure7.3. For congested conditions the segment average speed is between 6 km/h and12 km/h higher than the fixed point average speed. And for the traffic classifiedas high-speed free-flow, the segment average speed is between 29 km/h to 18 km/hlower than the fixed point average speed. This is different results compared to theaverage comparison speed difference presented in figure 7.3.

In figure 7.7 the scatter plots show that the comparison points with fixed pointaverage speed above 110 km/h are clearly different that those below 110 km/h. Forthe comparison points with a fixed point average speed between 100 km/h to 250

49

●

●● ●

●●

●

●●

●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Congestion

●

●

●

●● ● ● ●

● ●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

02468

1012

Free−flow

●

●● ●

● ●●

●

●●

Spe

ed d

iffer

ence

(km

/h)

Segment taxi flow

1 3 5 7 9

−30−25−20−15−10

−50

High speed free−flow

Figure 7.6. Average comparison speed difference grouped by the distinct segmenttaxi flow and one graph for each of the three classifications of traffic conditions:congested, free-flow and high-speed free-flow. Note the different scale for the lastgraph.

km/h, the majority of the comparison points has a segment average speed of ap-proximately 100 km/h. The regression line for the high-speed free-flow classificationis horizontal and show this statistical relationship. In figure C.2 in Appendix C, ascatter plot for each of the distinct segment taxi flow values are presented.

0 50 100 150 200 250

050

150

250


Seg

men

t ave

rage

spe

ed (

km/h

) 1

0 50 100 150 200 250

050

150

250


Seg

men

t ave

rage

spe

ed (

km/h

) 2

0 50 100 150 200 250

050

150

250


Seg

men

t ave

rage

spe

ed (

km/h

) 9

0 50 100 150 200 250

050

150

250


Seg

men

t ave

rage

spe

ed (

km/h

) 10

Figure 7.7. A scatter plot for some of the distinct segment taxi flow values, whereeach figure has a solid line representing the perfect regression, a dotted line for theregression of the congested traffic, dashed line for free-flow traffic and dot-dashed linefor high-speed free-flow traffic. The two dotted vertical lines are the dividers betweenthe three different classifications.

50

Chapter 8

Discussion and Conclusion

In this chapter the results will be discussed and the conclusions will be presented.

8.1 DiscussionThe study conducted, investigating the relationship between the radar sensor dataand the floating car data is a form of an observational study, looking at data from theradar sensors and taxis in retrospective. Since it is an observational study and not acontrolled experiment it is only possible to find statistical relationships between thedata, and not possible to draw any conclusions on any casual relationships. For thisstudy, it was planned to investigate also the certainty in the speed measurementof the taxis during different days of the week. But this research question was notincluded in the results since only data from 7 days were available, and therefore noresults could be obtained for this research question.

From the literature it is clear that since the fixed point radar sensors and thetaxi floating car data are two different kind of technologies, with their own char-acteristics. It is therefore difficult to compare the segment taxi flow with the fixedpoint traffic flow, and the segment average speed with the fixed-point average speed.Some of the potential factors which can be a problem concerning the comparisonbetween the fixed point radar sensor data and the taxi floating car data are:

• The length of the association segments, associated to the gantries varies inlength. The longest association segment is 2 574 m and the shortest is 36 m.

• Because some association segments are very long, there are in some casesseveral gantries associated to the same association segment.

• In relation to the association segment, the gantries are sometimes placed inthe beginning, sometimes in the end, and sometimes in the middle of theassociation segment.

• Calculating the fixed point traffic flow and fixed-point average speed at agantry, the data from the radar sensor readings from each lane is aggregated

51

together. It is possible that the fixed point traffic flow and fixed point averagespeed of the radar sensor readings from a single radar sensor at the gantry isdifferent than the aggregated fixed point traffic flow and fixed point averagespeed for all the radar sensors at a gantry.

• On average for the 7 days only approximately 40 % of radar sensor readingsare available. For some gantries, for some of the aggregation periods thereare no radar sensor readings available. And for some aggregation periods, forsome gantries, there are missing radar sensor readings from some of the radarsensors on the gantry, and the fixed point traffic flow for that gantry and timeperiod has to be estimated. The impact on the results from the estimatedfixed point traffic flow and the missing data for some time periods for someof the gantries is unknown.

A goal of this study is to find out how many taxis are needed to have certainty inthe speed measurement from the taxis. Since there are several potential other factorsthat can affect the certainty of speed measurements, no conclusions on the numberof taxis can be made. As can be seen in Appendix A, especially the comparisonspeed difference but also the segment taxi flow for each gantry varies from gantry togantry. The reason for especially the variation in the comparison speed differencebetween the gantries is unknown.

Another goal of this study is to find out how the certainty of speed changedbetween different days of the week. Since only 7 days are used when comparingthe radar sensor data and the taxi floating car data it is not possible to investigatethis relationship. When comparing the certainty of speed measurement for distinctsegment taxi flow values, for different hours of the days, there is a large variation inthe results when the segment taxi flow came close to 10. There are a low numberof comparison points for these segment taxi flow values, and it can be the reasonfor the higher variation in the average comparison speed difference.

8.2 ConclusionConclusions that can be made from this study are:

• Even if the fixed point traffic flow is estimated for some of the aggregationperiods and fixed point traffic flow for all aggregation periods is not available,no outliers in the taxi penetration can be found.

• The taxi penetration rate of the estimated number of vehicles passing thegantries, for the available aggregation periods is approximately 0.5%.

• There is a statistical relationship between the fixed point average speed calcu-lated from the radar sensor data and segment average speed calculated fromthe taxi floating car data.

52

• The statistical relationship between the fixed point average speed and segmentaverage speed in the comparison points classified as high-speed free-flow isdifferent from the statistical relationship for comparison points classified ascongested or free-flow. For comparison points classified as high-speed free-flowthe segment average speed is not higher than approximately 100 km/h whilethe fixed point average speed reach up to 250 km/h.

• To be able to draw any conclusions about how many taxis are needed tohave certainty of the speed measurement, a controlled experiment need to beconducted to rule out all the potential factors which could affect the speedmeasurement.

• To be able to find more detailed statistical relationships, such as certaintyof speed measurements of taxis during different hours of the day, or taxipenetration rate during different hours of the days, data from a longer periodof time need to be used.

8.3 Future WorkThe road segments representing the road network used in this study is not createdfor the purpose of comparing radar sensor data with floating car data. A possibilityto make the conditions more uniform between the gantries and the associationsegments could be to manipulate the length and placement of the road segmentsbefore the map-matching of the GPS positions and the road segments is conducted.

In this study, factors such as time of the day, and segment taxi flow per com-parison point are studied, but there are unknown factors which can have an impacton the results. Further studies are needed to be able to identify these factors andto identify their impact on the calculated fixed point average speed and segmentaverage speed.

To be able to draw any conclusions of casual relationships between the data, acontrolled experiment need to be conducted.

53

Bibliography

[1] Bar-Gera, Hillel, “Evaluation of a Cellular Phone-Based System for Measure-ments of Traffic Speeds and Travel Times: A Case Study From Israel.” Trans-portation Research Part C: Emerging Technologies 15, no.6 (2007):380–391.

[2] Brockfeld, Elmar, Stefan Lorkowski, Peter Mieth and Peter Wagner, “Benefitsand Limits of Recent Floating Car Data Technology - An Evaluation Study.”For presentation at the 11th WCTR Conference, Berkeley, USA, p. 24–28,2007.

[3] Brockfeld, Elmar, Bert Passfeld, Peter Wagner “Validating Travel Times Cal-culated on the Basis of Taxi Floating Car Data with Test Drives.” For Pre-sentation as a Scientific paper at the 14th ITS Conference, Beijing, China,October 9–13, 2007.

[4] Cayford, Randall “Accuracy of a Floating Car Traffic Information System.”Moving America Forward 2009 ITS America Annual Meeting & Exposition,National Harbor, Maryland, USA, June 1–3, 2009.

[5] Chang Ande, Jiang Guiyan and Niu Shifeng “Traffic Congestion IdentificationMethod Based on GPS Equipped Floating Car.” 2010 International Conferenceon Intelligent Computation Technology and Automation, Changsha, China,May 11–12, 2010.

[6] El-Geneidy, Ahmed M. and Robert L. Bertini “Toward Validation of FreewayLoop Detector Speed Measurements Using Transit Probe Data.” 2004 IEEEIntelligent Transportation Systems Conference, Washington D.C., USA, Oc-tober 3–6, 2004.

[7] Gühnemann, Astrid, Ralf-Peter Schäfer, Kai-Uwe Thiessenhusen and PeterWagner, “Monitoring Traffic and Emissions by Floating Car Data.” ITS Work-ing Papers 2004 (December 2003)

[8] Herrera, Juan C., Daniel B. Work, Ryan Herring, Xuegang (Jeff) Ban, QuinnJacobson, Alexandre M. Bayen, “Evaluation of Traffic Data Obtained viaGPS-Enabled Mobile Phones: The Mobile Century Field Experiment.” Trans-portation Research Part C: Emerging Technologies 18 no. 18 (August 2010):568–583.

54

[9] Kutner, Michael H., Christopher J. Nachtsheim, John Neter and William Li.Applied Linear Statistical Models, 5th ed. New York, New York: McGraw-Hill/Irwin, 2005.

[10] Liu, Chun, Xiaolin Meng and Yeming Fan, “Determination of Routing Ve-locity with GPS Floating Car Data and WebGIS-Based Instantaneous TrafficInformation Dissemination.” The Journal of Navigation 61 (2008):337–353.

[11] Maerivoet, Sven and Steven Logghe “Validation of Travel Times Based onCellular Floating Vehicle Data.” 6th European Congress and Exibition onIntelligent Transportation Systems and Services, ITS’07, Aalborg, Denmark,June 18–20, 2007.

[12] Montgomery, Douglas C., Elizabeth A. Peck and G. Geoffrey Vining. Intro-duction to Linear Regression Analysis, 4th ed. Hoboken, New Jersey: JohnWiley & Sons, 2006.

[13] Nissan, Albania. “Evaluation of Variable Speed Limits: Empirical Evidenceand Simulation Analysis of Stockholm’s Motorway Control System” DoctoralDissertation, Royal institute of Technology, 2010.

[14] Rahmani, Mahmood, Haris N. Koutsopoulos and Anand Ranganathan. “Re-quirements and Potential of GPS-based Floating Car Data for Traffic Man-agement: Stockholm Case Study.” 2010 13th International IEEE AnnualConference on Intelligent Transportation Systems, Madeira Island, Portugal,September 19–22, 2010.

[15] Rakha, H. and M. Van Aerde, “Accuracy of Vehicle Probe Estimates of LinkTravel Time and Instantaneous Speed.” ITS America Conference, Ann ArborMI, March 1995, CD-ROM.

[16] Reinthaler, Martin, Bernhard Nowotny, Dr. Robert Hildebrandt and Flo-rian Weichenmeier, “Evaluation of Speed Estimation by Floating Car DataWithin the Reasearch Project Dmotion.” Technical Paper, 14 th 1710, no.1(2007):114–121

[17] Stockholms Stads Utrednings- och Statistikkontor AB, “Registrerade bi-lar, bussar och motorcyklar m m den 31 dec 1990–2009. Stockholms län.”http://uskab.se/images/stories/excel/b212.htm (accessed March 7, 2011).

[18] Weisberg, Sanford. Applied Linear Regression, 3rd ed. Hoboken, New Jersey:John Wiley & Sons, 2005.

[19] Yoo, Byeong-Seok, Seung-Pil Kang and Chang-Ho Park “Travel Time Es-timation Using Mobile Data.” Proceedings of the Eastern Asia Society forTransportation Studies 5 (2005):1533–1547.

55

Appendix A

List of Gantries

In tables A.1 to A.5 each gantry used when comparing the taxi floating car datawith the data from the radar sensors are presented. For each gantry, the number ofradar sensors, the number of gantries which is associated to the same road segment,the length of the association segment associated to the gantry, the comparison speeddifference, and the segment taxi flow is presented. The data presented in tables A.1to A.5 are based on the taxi floating car data and radar sensor readings from the 7days in July used in this study. For some of the gantries there is no data available,and in that case the cell in the table is noted by Not Available (NA).

Table A.1. List of gantries

Gantry Radarsensorcount

Shared Associationsegmentlength (m)

Comparisonspeed differ-ence (km/h)

Segmenttaxi flow

E4N 47,465 3 3 1 301 -7.7 1 069E4N 47,800 3 3 1 301 -11.4 1 069E4N 48,290 3 3 1 301 -14.2 1 066E4N 48,620 3 1 165 -6.7 1 174E4N 48,935 3 1 326 -4.4 1 189E4N 49,370 3 1 343 -7.6 1 467E4N 49,770 3 1 343 -9.9 382E4N 50,165 3 2 440 -10.8 388E4N 50,395 3 2 440 4.1 388E4N 50,570 3 1 101 0.1 392E4N 50,890 3 1 226 11.7 490E4N 51,085 2 1 138 23.0 312E4N 51,370 2 1 140 2.2 21E4N 51,630 2 1 400 17.4 346E4N 51,895 2 1 194 19.4 355E4N 52,220 4 1 468 -0.5 1 011

56





Segmenttaxi flow

E4N 52,535 4 1 78 -6.2 1 017E4N 52,745 4 1 177 -6.9 1 010E4N 53,120 4 2 938 NA NAE4N 53,590 4 2 938 -5.8 1 142E4N 53,955 4 1 62 -3.5 1 231E4N 54,225 4 2 572 3.2 954E4N 54,630 4 2 572 -5.8 953E4N 55,030 4 1 158 2.1 769E4N 55,185 4 2 386 6.2 744E4N 55,330 4 2 386 4.7 740E4N 55,505 4 1 97 8.4 734E4N 55,650 4 1 197 11.3 736E4N 55,885 3 2 444 3.7 747E4N 56,165 3 2 444 6.4 749E4N 56,575 4 2 559 -1.4 221E4N 56,780 4 2 559 3.7 872E4N 57,010 3 1 87 4.5 966E4N 57,165 3 2 547 14.5 1 079E4N 57,320 3 2 547 7.0 1 080E4N 57,430 3 2 547 2.9 1 080E4N 57,690 2 1 133 -0.9 1 065E4N 58,010 2 2 722 5.6 1 135E4N 58,205 2 2 722 3.6 1 134E4N 58,570 2 1 101 2.7 1 426E4N 58,940 3 1 67 NA NAE4N 59,125 2 5 831 2.1 2 995E4N 59,260 2 5 831 4.4 2 995E4N 59,335 2 5 831 9.4 2 996E4N 59,440 2 5 831 8.8 2 996E4N 59,530 2 5 831 7.2 2 990E4N 59,980 2 1 100 7.8 3 066E4N 60,490 3 1 106 2.5 5 458E4N 60,990 3 1 495 4.0 5 196E4N 61,270 4 1 254 17.3 5 195E4N 61,605 4 1 64 7.3 5 779E4N 62,070 4 1 548 6.1 5 819E4N 62,410 3 1 320 4.4 5 417E4N 63,040 4 1 147 1.4 5 321E4N 63,580 4 2 1 377 1.6 5 879

57





Segmenttaxi flow

E4N 64,090 4 2 1 377 -1.5 5 880E4N 64,650 4 2 863 NA NAE4N 65,000 4 2 863 11.3 5 836E4N 65,420 4 1 359 13.1 5 752E4N 65,815 4 1 69 11.4 5 703E4N 66,270 3 1 855 9.8 5 151E4N 67,230 4 3 1 625 4.6 5 563E4N 67,740 4 3 1 625 7.3 5 550E4N 68,340 4 3 1 625 3.6 5 534E4N 68,960 4 1 829 14.6 3 494E4N 69,430 4 5 1 913 7.0 4 528E4N 69,690 4 5 1 913 6.1 4 526E4N 70,520 4 5 1 913 8.5 4 550E4N 70,830 4 5 1 913 19.1 4 550E4N 71,200 4 1 169 13.1 2 396E4N 71,440 3 1 350 -8.1 845E4Z 46,895 4 1 166 -2.5 152E4Z 47,255 4 1 81 7.4 274E4Z 47,635 4 3 1 251 -4.1 681E4Z 47,980 4 3 1 251 -1.2 681E4Z 48,385 4 3 1 251 -2.6 678E4Z 48,935 4 1 195 -11.3 986E4Z 49,220 4 1 253 2.1 714E4Z 49,530 4 1 411 -1.0 1 289E4Z 49,710 4 1 285 -3.3 1 238E4Z 50,225 3 1 437 -0.4 963E4Z 50,660 3 1 97 -1.7 981E4Z 51,040 3 1 225 0.1 1 201E4Z 51,425 2 2 481 NA NAE4Z 51,670 2 2 481 5.7 760E4Z 51,890 2 1 142 2.2 759E4Z 52,030 4 3 659 -1.0 1 434E4Z 52,215 4 3 659 -4.0 1 435E4Z 52,535 4 3 659 -3.1 1 433E4Z 52,720 5 1 157 -2.6 1 500E4Z 53,115 4 2 960 NA NAE4Z 53,595 4 2 960 -3.4 1 677E4Z 53,870 4 1 204 -0.1 1 686

58





Segmenttaxi flow

E4Z 54,145 4 2 217 0.3 1 776E4Z 54,295 4 2 217 2.6 1 771E4Z 54,630 4 4 956 0.5 1 747E4Z 54,880 4 4 956 1.0 1 746E4Z 55,210 4 4 956 0.7 1 746E4Z 55,400 4 4 956 3.2 1 745E4Z 55,620 3 1 281 2.2 936E4Z 55,975 3 1 147 -0.8 715E4Z 56,160 3 1 198 -2.8 618E4Z 56,490 3 2 716 -1.2 373E4Z 56,780 3 2 716 5.2 374E4Z 57,055 3 2 220 NA NAE4Z 57,140 3 2 220 NA NAE4Z 57,275 2 3 763 21.1 6E4Z 57,435 2 3 763 19.3 6E4Z 57,820 2 3 763 18.6 6E4Z 58,140 2 1 36 3.1 51E4Z 58,590 2 1 200 -4.4 341E4Z 58,865 2 1 152 9.2 30E4Z 59,155 2 6 896 3.5 1 036E4Z 59,255 2 6 896 -4.0 1 034E4Z 59,360 2 6 896 -1.0 1 034E4Z 59,425 2 6 896 1.7 1 029E4Z 59,530 2 6 896 1.1 1 028E4Z 59,835 2 6 896 -2.1 1 028E4Z 60,060 4 1 168 0.9 2 930E4Z 60,265 4 1 320 -0.9 3 146E4Z 60,645 4 1 221 3.2 3 153E4Z 61,000 3 1 418 0.9 2 045E4Z 61,395 4 1 160 -2.3 1 370E4Z 61,860 4 2 713 NA NAE4Z 62,220 4 2 713 NA NAE4Z 62,645 3 1 578 3.0 117E4Z 63,040 4 1 167 2.4 298E4Z 63,215 3 3 1 319 NA NAE4Z 63,580 3 3 1 319 NA NAE4Z 64,090 3 3 1 319 0.4 1 863E4Z 64,650 3 3 1 027 NA NA

59





Segmenttaxi flow

E4Z 64,970 3 3 1 027 13.6 4 046E4Z 65,420 3 3 1 027 2.2 4 046E4Z 65,815 3 1 70 4.6 4 710E4Z 66,310 3 2 854 4.8 4 388E4Z 66,311 3 2 854 5.1 4 375E4Z 66,710 4 1 48 6.3 4 863E4Z 67,400 4 3 1 609 0.9 5 114E4Z 67,910 4 3 1 609 -5.4 5 114E4Z 68,330 4 3 1 609 2.8 5 116E4Z 68,800 3 2 1 078 2.6 4 834E4Z 69,390 4 2 1 078 9.9 4 834E4Z 69,820 4 3 1 432 14.7 5 345E4Z 70,070 4 3 1 432 6.1 5 353E4Z 70,560 4 3 1 432 10.1 5 346E4Z 70,960 4 1 162 21.1 5 054E4Z 71,400 4 1 118 8.7 4 924E4Z 71,660 4 4 2 574 8.8 3 272E4Z 72,010 3 4 2 574 5.4 3 255E4Z 72,330 3 4 2 574 5.1 3 251E4Z 72,890 3 4 2 574 -3.7 3 248

60

Appendix B

Taxi Penetration Rate

In figures B.1 to B.10 the taxi penetration rate for 10 gantries is presented, with 5graphs for each direction. Each graph show the taxi penetration rate at the gantryfor each of the 16 hours on Wednesday 2010-07-07 during 06:00–22:00. The name ofeach gantry includes a kilometer reference which starts at 46.895 km for the gantrylocated farthest south and ends with 72.890 km for the gantry located farthestnorth on the E4 motorway in Stockholm. As can be seen in the figures, the taxipenetration rate for the gantries varies, especially for the gantries located on thenorth part of the E4 motorway.

● ● ● ●● ● ● ●

● ● ●● ● ● ● ●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12

Figure B.1. Gantry E4N_49.370 during 2010-07-07 06:00–22:00.

61

●●

●● ● ● ●

● ● ● ● ●● ● ●

●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12


● ● ●●

● ●●

●● ● ● ● ●

●●

●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12


●

●

● ● ●

● ●

●

●

● ●●

●

● ●

●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12


62

●

●●

●

●●

●

●

●

●●

●●

●●

●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12


● ●

● ● ● ● ● ● ●●

● ● ●●

●

●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12

Figure B.6. Gantry E4S_48.935 during 2010-07-07 06:00–22:00.

● ●● ● ● ● ● ● ●

● ●●

● ● ●●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12


63

● ● ●●

● ●● ● ● ●

●

● ● ● ● ●Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12


● ●●

● ●● ●

● ●●

●● ● ●

●

●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12


●

●

●●

●

●●

●

●

●●

●●

●

●

●

Taxi

pen

etra

tion

rate

(%

)

Hour

7 9 11 13 15 17 19 21 23

0

2

4

6

8

10

12


64

Appendix C

Certainty of Speed Measurement

Figure C.1 presents a scatter plot for each of the distinct segment taxi flow values.Figure C.2 presents a scatter plot for each of the distinct segment taxi flow values,where each figure has a solid line representing the perfect regression, a dotted linefor the regression of the congested traffic, dashed line for free-flow traffic and dot-dashed line for high-speed free-flow traffic.

65

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

Segment taxi flow = 1

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)


0 50 100 150 200 250

010

020

0


Taxi

spe

ed (

km/h

)


0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)


0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)


0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)


0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)


0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)


0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)


0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)


Figure C.1. A scatter plot for each of the distinct segment taxi flow values. Present-ing the relationship between the segment average speed and the fixed point averagespeed from the comparison points. The solid line represent the perfect regression lineand the dashed line represent the fitted regression line.

66

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

1

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

2

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

3

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

4

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

5

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

6

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

7

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

8

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

9

0 50 100 150 200 250

010

020

0


Seg

men

t ave

rage

spe

ed (

km/h

)

10

Figure C.2. A scatter plot for each of the distinct segment taxi flow values, whereeach figure has a solid line representing the perfect regression, a dotted line for theregression of the congested traffic, dashed line for free-flow traffic and dot-dashed linefor high-speed free-flow traffic. The two dotted vertical lines are the dividers betweenthe three different classifications.

67

TRITA-CSC-E 2012:051 ISRN-KTH/CSC/E--12/051-SE

ISSN-1653-5715

www.kth.se

An Observational Study of the Characteristics of Taxi Floating Car

Documents