A Topic Model for Traffic Speed Data Analysis

A Topic Model forTraffic Speed Data Analysis

Tomonari MASADANagasaki University

masada@nagasaki-u.ac.jp

Real-Time Traffic Speed Data | NYC Open Datahttps://data.cityofnewyork.us/Transportation/Real-Time-Traffic-Speed-Data/xsat-x5sa

Speed measurements at hundreds of sensors

(Regrettably, the data seems no longer maintained.)

Problem

• Traffic speed data show a periodicity at

one day period.

• However, there is a wide variety not only

between periods but also within periods.

• How can we analyze it?

Solution

• We take intuition from topic models

in text mining.

Topic models for documents

• We can assume that each document contains

multiple topics.

• That is, each document is modeled

– not as a single word probability distribution,

– but as a mixture of word probability distributions.

Latent Dirichlet Allocation (LDA)

• LDA [Blei et al. 03]

topic <-> word probability distribution

document <-> mixing proportions of topics

• LDA models each document as follows:

v1 v2 v3 v4

t3φ31

v1 v2 v3 v4

t2φ21

φ23 φ24

v1 v2 v3 v4

t1φ11

θj1 θj2

An important difference

• Words are discrete entities.

– Therefore, LDA uses multinomial distributions for

modeling per-topic word distributions.

• Speeds (in mph) are continuous entities.

– We can’t use multinomial distributions.

Gamma distribution

Comparing LDA with Patchy

• LDA <-> Patchy

– Word <-> Speed observation (in mph)

– Topic (multinomial) <-> Patch (Gamma)

– Document <-> Roll (from 0 AM to 12 PM)

Full joint distribution of Patchy

• We estimate parameters by a variational

Bayesian inference.

Variational Bayesian inference

• The posterior parameters are estimated

by maximizing ELBO.

– ELBO = the lower bound of the evidence

Context dependency

Observations of the same mph

are assigned to different patches.

Observations of the same mph

are assigned to different patches.

Context dependency

• Context = mixing proportions of patches

– Which patch is dominant?

• Context-dependency

–Observations of the same speed can be

assigned to different patches depending on

their contexts.

Context dependencyOn May 27, this purple patch is

dominant.

On May 28, this yellow patch is

dominant.

Evaluation

• Binary classification

–Weekdays / Weekends (Sat, Sun)

• Data

– Training: May 27 ~ June 16 (three weeks)

– Test: July 23 ~ August 5 (two weeks)

Comparison

• Nearest neighbor

–Measure similarity by Euclidean distance

–Require timestamps

• Patchy

–Measure similarity by predictive probability

–Require no timestamps

Classification results

Nearest neighbor

Summary

• We proposed a topic model for traffic data analysis.

• Patchy can assign the observations of the same

traffic speed to different groups in a context-

dependent manner.

• Patchy achieved a classification accuracy comparable

with NN with no timestamps.

Future work

• Model timestamps

A Topic Model for Traffic Speed Data Analysis

problem traffic speed

traffic data analysis

topic word distributions

context dependency context

topic models

v1 v2 v3 v4 t1

v1 v2 v3 v4 t2

data training

Engineering

Traffic speed study

TRAFFIC SCHEDULE NO. 5 SPEED LIMITS Reference Section 11...

TOPIC 5 SPEED OF CHEMICAL REACTIONS.pdf

New A9 Average Speed Cameras – Traffic Modelling and...

Realtime Traffic Speed Estimation with Sparse Crowdsourced.....

Implementation of the Traffic Speed Deflectometer (TSD ...

Estimating Traffic Flows from Annual Average Daily Traffic.....

Motor Vehicle Traffic Safety Topic Call

Road Traffic (Speed) Regulations 2004

AAA2C Discussion Topic: Types of Traffic in AVB 2 · AAA2C....

Traffic Engineering Studies (Spot Speed Studies)

High Speed Network Monitoring and Traffic Analysis

Traffic Handling Approach with Intelligent Speed Control...

Automated Traffic Density Detection and Speed Monitoring

Longitudinal Analysis of Teletrac Navman Traffic Speed Data

Rolling Bearings in High-Speed Passenger Traffic