Top Banner
Traffic Speed Data Investigation with Hierarchical Modeling Tomonari MASADA Nagasaki University [email protected]
28

FDSE2015

Apr 13, 2017

Download

Engineering

Tomonari Masada
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FDSE2015

Traffic Speed Data Investigation with

Hierarchical Modeling

Tomonari MASADA

Nagasaki University

[email protected]

Page 2: FDSE2015

Real-Time Traffic Speed Data | NYC Open Datahttps://data.cityofnewyork.us/Transportation/Real-Time-Traffic-Speed-Data/xsat-x5sa

Traffic speed measurements at 128 streets

(Regrettably, no longer maintained)

Page 3: FDSE2015
Page 4: FDSE2015
Page 5: FDSE2015

Problem 1

• Traffic speed data show a clear

periodicity at one day period.

• However, many different traffic speed

distribution patterns can be observed

also within each period.

Page 6: FDSE2015

Solution 1 [Masada+ 14]

• We take intuition from topic models

in text mining.

–The data set of each day should be

modeled as a mixture of many

different speed distributions.

Page 7: FDSE2015

Latent Dirichlet Allocation (LDA) [Blei+ 03]

• LDA achieves a word token level clustering.

• Not a document level clustering

• Each document is modeled as a mixture of

many different word probability distributions.

topic <-> word probability distribution

document <-> topic probability distribution

Page 8: FDSE2015

v3

v1

v3

v2

v2

v1 v2 v3 v4

t3φ31

φ32

φ33

φ34

v1 v2 v3 v4

t2φ21

φ22φ23φ24

v1 v2 v3 v4

t1

φ11

φ12φ13

φ14

θj1 θj2

θj3

Page 9: FDSE2015

An important difference

• Words are discrete entities.

– LDA uses multinomial distribution for modeling

per-topic word distribution.

• Speeds (in mph) are continuous entities.

– Our model uses gamma distribution.

Page 10: FDSE2015

gamma distribution

Page 11: FDSE2015

Comparison with LDA

• word token

<-> speed measurement (in mph)

• topic (multinomial)

<-> topic (gamma)

• document

<-> document (24 hrs from midnight)

Page 12: FDSE2015

Full joint distribution

• We estimated parameters by a variational

Bayesian inference. [Masada+ 14]

Page 13: FDSE2015

Problem 2

• Traffic speed data may show a similarity

at the same time point of day.

• Traffic speed data may show a similarity

for the streets whose locations are close

to one another.

Page 14: FDSE2015

Solution 2 [Masada+ FDSE15]

• We use metadata in topic models.

–time points

–geographic locations

Page 15: FDSE2015

TRINH = TRaffic speed INvestigation

with Hierarchical modeling

• Make topic probabilities dependent on

time points and on locations

– probability that the speed measured by the sensor

s at the time point t is assigned to the topic k

𝜃𝑑𝑡𝑘 ≡exp(𝑚𝑑𝑘 + 𝜆𝑘𝑠 + 𝜏𝑘𝑡)

𝑘′ exp(𝑚𝑑𝑘′ + 𝜆𝑘′𝑠 + 𝜏𝑘′𝑡)

Page 16: FDSE2015

Parameters

• 𝑚𝑑𝑘

– How often the document d provides the topic k

• 𝜆𝑘𝑠

– How often the sensor s provides the topic k

• 𝜏𝑘𝑡

– How often the time point t (of day) provides the

topic k

Page 17: FDSE2015

Priors for parameters ("hierarchical")

• 𝑚𝑑𝑘

–K Gaussian priors

• 𝜆𝑘𝑠

–K Gaussian process priors

• 𝜏𝑘𝑡

–K Gaussian process priors

Page 18: FDSE2015

Full joint distribution

Page 19: FDSE2015

Inference by MCMC

• Sample from the posterior distribution

– Slice sampling for topic probability

parameters 𝑚𝑑𝑘, 𝜆𝑘𝑠, and 𝜏𝑘𝑡

–Metropolis-Hastings for hyperparameters

Page 20: FDSE2015
Page 21: FDSE2015

Context dependency

Observations of the same mph

are assigned to different topics.

Page 22: FDSE2015

Context dependency

On May 27, this topic is dominant. On May 28, this

topic is dominant.

Page 23: FDSE2015

Comparison experiment

• Log likelihood per measurement

– Larger is better.

• Data

–May 27 ~ June 16, 2013 (three weeks)

• Data files were downloaded every minute.

–20% measurements for testing

Page 24: FDSE2015
Page 25: FDSE2015

Prior as regularization

Too strong?

Page 26: FDSE2015

What we achieved

• We obtained an MCMC for a topic model

whose topic probabilities are defined by

combining multiple factors.

• And the factors are correlated via Gaussian.

– Our model can also be applied to other types of

metadata indicating intrinsic similarity of data.

Page 27: FDSE2015

Summary

• We proposed a topic model for traffic data analysis.

• Sensor locations and measurement timestamps

affects topic assignment.

• TRINH achieves better likelihood in earlier iterations.

• However, TRINH gives worse likelihood in later

iterations.

Page 28: FDSE2015

Future work

• Control the strength of regularization

– e.g. by weighting the factors.

𝜃𝑑𝑡𝑘 ≡exp(𝑚𝑑𝑘 + 𝜆𝑘𝑠 + 𝜏𝑘𝑡)

𝑘′ exp(𝑚𝑑𝑘′ + 𝜆𝑘′𝑠 + 𝜏𝑘′𝑡)

• Look for other data sets

– Location information should be more relevant.