Top Banner
Knowledge Discovery And Data Mining Predicting The Availability of Parking Spaces in Ljubljana Luis Rei [email protected] http://luisrei.com Report, slides and code available online.
13

Predicting The Availability of Parking Spaces in Ljubljana

Nov 28, 2015

Download

Documents

Luis Rei

Presentation for my Josef Stefan International Postgraduate School data mining course assignment.
Predicts the availability of parking spaces in Ljubljana car parks: 30min, 1h, 2h and 3h intervals.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Predicting The Availability of Parking Spaces in Ljubljana

Knowledge Discovery And Data Mining

Predicting The Availability of Parking Spaces in Ljubljana

Luis Rei [email protected]

http://luisrei.com

Report, slides and code available online.

Page 2: Predicting The Availability of Parking Spaces in Ljubljana

Parking Spaces• City of Ljubljana (http://www.lpt.si)

• Available via http://opendata.si/

• 11 Car Parks!• Park Name (and id) • Number of Free Spaces!• Total Spaces Available • Price • Coordinates • Timestamp!• Updated Every 5min • From 2011-09-12 to 2013-11-18!

• Test starts: 2013-08-19

Page 3: Predicting The Availability of Parking Spaces in Ljubljana

The ParksPark Total Spaces*PH Kozolec 248Tivoli I 360Mirje 110Trg MDB 40Gospodarsko raz.

550Bežigrad 62Trg preko. brigad

98Kranjčeva 118Žale II 80Petkovskovo II 85PH Kongresni trg

720

Page 4: Predicting The Availability of Parking Spaces in Ljubljana

Buyer Beware: Cleanup• Missing data!

• Collection failed: entire months, weeks, days missing • All parks

• Sensor/communication failed: missing entries • Some parks

• Invalid data!• Negative free spaces • (A lot) more free spaces than the total • Null values

• Strategies!• Interpolating • Replacing with the mean (window variables) • Removing (target variable)

Page 5: Predicting The Availability of Parking Spaces in Ljubljana

Time Series Resampling

2011-01-01 00:00:00 1

2011-01-01 00:45:00 2

2011-01-02 01:30:00 2

2011-01-02 02:15:00 4

2011-01-03 03:00:00 3

2011-01-03 06:00:00 12011-01-03 2.0

How: Mean

2011-01-01 1.5

Interval: Daily

2011-01-02 3.0

How: Min

2011-01-03 1

2011-01-01 1

2011-01-02 2

How: Last

2011-01-03 1

2011-01-01 2

2011-01-02 4

Question (Goal)!At the end of the next time period, how many free spaces will be available in this park?

How: Last

Intervals: {30, 60, 120, 180} min

Page 6: Predicting The Availability of Parking Spaces in Ljubljana

Sliding Windowsw-2!

past statew-1!

past statew!

current state

Target!future state

Interval!t-2 170 180 190 200

Interval!t-1 180 190 200 210

Interval!t 190 200 210 220

window size = 4

Page 7: Predicting The Availability of Parking Spaces in Ljubljana

Baselines & Models• Baselines

• Mean • Previous Value

• Models • Linear Regression • Regression Tree • Random Forest

• Bonus Models • Global Random Forest • Incremental Linear Regression

Page 8: Predicting The Availability of Parking Spaces in Ljubljana

Results Average Root Mean Squared Error

Method 30Min 60Min 120Min 180Min

Mean 41,2 41,4 41,6 41,3

Previous Value 10,1 16,3 26,6 33,9

Linear Regression 3,5 4,2 4,8 4,7

Regression Tree 0,5 0,8 0,4 0,5

Random Forest 0,4 0,5 0,6 0,5

Page 9: Predicting The Availability of Parking Spaces in Ljubljana

Results: RMSE for each park for 120 min intervals

Page 10: Predicting The Availability of Parking Spaces in Ljubljana

!PH Kongressni trg,resampled 120 min intervals

One Week At the Car Park &

The trouble with missing values

Page 11: Predicting The Availability of Parking Spaces in Ljubljana

The Effect of Missing Values The Sliding Window Revisited

w-2!past state

w-1!past state

w!current state

Target!future state

Interval!t-2 170 160 140 100

Interval!t-1 160 140 100

Missing!not

predicted

Interval!t 140 100

Missing!replaced

with mean ?? = 10 E.g Mean = 150 very different from the missing value (e.g. 60)

Missing Values

Percentage of test set 0.7%

Percentage of error (RMSE) 71%

Page 12: Predicting The Availability of Parking Spaces in Ljubljana

Note For window_size = 1, RMSE = 21 - not represented for the sake of clarity

RF Average RMSE vs Window Size

Page 13: Predicting The Availability of Parking Spaces in Ljubljana

Future Work• Better handling of missing values

• Time based interpolation of some of the missing data within a limited max time interval

• Use model to predict the missing data!

• Crawl more data

• Test with a full year

• Evaluate “classical” autoregressive models

• with smoothing

• Predict further into the future

• Additional data: weather, holidays, soccer, social…

• Get the average error down to zero, keep maximum error small