Real-Time Trip Information Service for a Large Taxi Fleet Rajesh Krishna Balan, Nguyen Xuan Khoa, and Lingxiao Jiang MobiSys 2011
Jan 19, 2016
Real-Time Trip Information Service for a Large Taxi Fleet
Rajesh Krishna Balan, Nguyen Xuan Khoa, and Lingxiao Jiang
MobiSys 2011
Introduction
• Real-time trip information system that provides passengers with the expected fare and trip duration of the taxi ride they are planning to take.
• 15000 taxi, 21 month, 250 million data in Singapore
• Large scale implementation and evaluations
Motivation
• Unscrupulous driver who take longer routes• Passenger can estimate trip time and fares by
themselves.• Failed solution : Google Maps– Latency– Trip fare– Not accurate• 35% time error
Taxi Network
• Taxi are cheap• Taxi are common and found everywhere• Most pickups are street pickups• Used for all activities
Taxi locations in one day
challenge
• Large amount data • Real time query requirement • Various time-related factors• How much data is sufficient?• How to filter the data?
Service requirements
• Accuracy– Time– Fares
• Real time capability• Low computational requirements• Easy to deploy operationally
Method design
• Partition – Time– location
• Prediction – Hash table– KNN
Time partition
• Hour• Days of week(DoW)• Hourly DoW– 24*7=168Hr
• Peak period– Week day 7am~10am, 5pm~8pm +35%– Week day 6am-7am, 10am~5pm non-peak– Weekend 6am~0am non-peak– night 0am~6am +50%
location partition
• Static zone– 25km x 50km– 50x50m~500x500m to divide zones
• Dynamic zone– Adjust zone size for each trip
Prediction
• Input : start time, start GPS, end GPS• Static – Similar historical data and average ( fare, duration,
distance– Index and hash table
• Dynamic – KNN– Data set (start time, S_long, S_latt, E_long, E_latt)
Evaluation
• Set1: 20 subsets for training– 2010/8– 2010/7+8…..– 2009/1~2010/8
• Set2 : 1 subset for testing(query)– 2010/9
Evaluation
• LOC: start and end location• PEAK: peak hour• DoW: days of week• HR: 24 hour• DoW x HR: 168hr
Fare and duration in Static zone
• Fare error : 0.87$~2.53$• Duration error: 2min ~4min
Hit rate in static zone
• Hit rate: % of test trips having a non-empty entry in prediction table
• Hit rate in static zone is 17%~58%
Fare and duration in dynamic
• Fare error : 1.05$~1.25$• Duration error: <3min• K=25 is the optimal choice
PEAK predictor w/ various K
• Save the fare 15 cents at most• Save the time 15 sec at mosy
Radius of dynamic zone
• Mean: 375m• Std.dev. :741m
Speed and memory
• Static is efficient than dynamic• Dynamic costs lots of memory space static zones dynamic zones
Accuracy analysis
• Still not very accurate using three basic features
• Why?– Indirect routing– Traffic conditions
Accuracy analysis
• PEAK predictor with 200m zones• Same start time, start point ,end point• Distance error– 6km max
• Duration error– 1000 sec max
Filter design
• Filter 1:– Trip distance > 2 straight distance of Start and End
• Filter 2:– Average speed <20 km/h or >100km/h
Apply filter result
• Save fare 25 cents • Save time 30 sec
Traffic conditions
• Rainfall is severe• Save fare 10 cents • Save time 60 sec
Future work
• Different zone size for various location• Zone size determined by radius of dynamic
Conclusion
• reducing the data size through aggregation and smart filtering is essential.• real world data needs to be cleaned before
use• deploying a research prototype into a real
production environment requires far more work than we naively expected
contribution
• Detailed description of the steps to build such real time taxi system
• Method of identifying real-time patterns, applicable for other transportation network
• Principled approach to balance the tradeoffs between accuracy, real time performance
• KNN method to produce accurate predictor• Insight into challenge from prototype to
operational environment