Top Banner
A Topic Model for Traffic Speed Data Analysis Tomonari MASADA Nagasaki University masada@nagasaki- u.ac.jp
21

A Topic Model for Traffic Speed Data Analysis

May 21, 2015

Download

Engineering

Tomonari Masada

http://link.springer.com/chapter/10.1007%2F978-3-319-07467-2_8
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Topic Model for Traffic Speed Data Analysis

A Topic Model forTraffic Speed Data Analysis

Tomonari MASADANagasaki University

[email protected]

Page 2: A Topic Model for Traffic Speed Data Analysis

Real-Time Traffic Speed Data | NYC Open Datahttps://data.cityofnewyork.us/Transportation/Real-Time-Traffic-Speed-Data/xsat-x5sa

Speed measurements at hundreds of sensors

(Regrettably, the data seems no longer maintained.)

Page 3: A Topic Model for Traffic Speed Data Analysis

Problem

• Traffic speed data show a periodicity at

one day period.

• However, there is a wide variety not only

between periods but also within periods.

• How can we analyze it?

Page 4: A Topic Model for Traffic Speed Data Analysis

Solution

• We take intuition from topic models

in text mining.

Page 5: A Topic Model for Traffic Speed Data Analysis

Topic models for documents

• We can assume that each document contains

multiple topics.

• That is, each document is modeled

– not as a single word probability distribution,

– but as a mixture of word probability distributions.

Page 6: A Topic Model for Traffic Speed Data Analysis

Latent Dirichlet Allocation (LDA)

• LDA [Blei et al. 03]

topic <-> word probability distribution

document <-> mixing proportions of topics

• LDA models each document as follows:

Page 7: A Topic Model for Traffic Speed Data Analysis

v3v3

v1v1

v3v3

v2v2

v2v2

v1 v2 v3 v4

t3φ31

φ32

φ33

φ34

v1 v2 v3 v4

t2φ21

φ22

φ23 φ24

v1 v2 v3 v4

t1φ11

φ12

φ13

φ14

θj1 θj2

θj3

Page 8: A Topic Model for Traffic Speed Data Analysis

An important difference

• Words are discrete entities.

– Therefore, LDA uses multinomial distributions for

modeling per-topic word distributions.

• Speeds (in mph) are continuous entities.

– We can’t use multinomial distributions.

Page 9: A Topic Model for Traffic Speed Data Analysis

Gamma distribution

Page 10: A Topic Model for Traffic Speed Data Analysis

Comparing LDA with Patchy

• LDA <-> Patchy

– Word <-> Speed observation (in mph)

– Topic (multinomial) <-> Patch (Gamma)

– Document <-> Roll (from 0 AM to 12 PM)

Page 11: A Topic Model for Traffic Speed Data Analysis

Full joint distribution of Patchy

• We estimate parameters by a variational

Bayesian inference.

Page 12: A Topic Model for Traffic Speed Data Analysis

Variational Bayesian inference

• The posterior parameters are estimated

by maximizing ELBO.

– ELBO = the lower bound of the evidence

Page 13: A Topic Model for Traffic Speed Data Analysis
Page 14: A Topic Model for Traffic Speed Data Analysis

Context dependency

Observations of the same mph

are assigned to different patches.

Observations of the same mph

are assigned to different patches.

Page 15: A Topic Model for Traffic Speed Data Analysis

Context dependency

• Context = mixing proportions of patches

– Which patch is dominant?

• Context-dependency

–Observations of the same speed can be

assigned to different patches depending on

their contexts.

Page 16: A Topic Model for Traffic Speed Data Analysis

Context dependencyOn May 27, this purple patch is

dominant.

On May 28, this yellow patch is

dominant.

Page 17: A Topic Model for Traffic Speed Data Analysis

Evaluation

• Binary classification

–Weekdays / Weekends (Sat, Sun)

• Data

– Training: May 27 ~ June 16 (three weeks)

– Test: July 23 ~ August 5 (two weeks)

Page 18: A Topic Model for Traffic Speed Data Analysis

Comparison

• Nearest neighbor

–Measure similarity by Euclidean distance

–Require timestamps

• Patchy

–Measure similarity by predictive probability

–Require no timestamps

Page 19: A Topic Model for Traffic Speed Data Analysis

Classification results

Nearest neighbor

Page 20: A Topic Model for Traffic Speed Data Analysis

Summary

• We proposed a topic model for traffic data analysis.

• Patchy can assign the observations of the same

traffic speed to different groups in a context-

dependent manner.

• Patchy achieved a classification accuracy comparable

with NN with no timestamps.

Page 21: A Topic Model for Traffic Speed Data Analysis

Future work

• Model timestamps