It's about time. Choosing Distance Measures for Mining Time Series Data Spencer Schnier 2/22/11.

It's about time.Choosing Distance Measures for Mining Time Series Data

Spencer Schnier 2/22/11

Indexing and clustering make explicit use of a distance measure The others make implicit use of a distance measure

Major Time Series Data Mining Tasks

• Indexing• Clustering• Classification• Prediction• Summarization• Anomaly Detection• Segmentation

(Ratanamahatana et al., 2010)

Popular Distance Measures

• Lock-step Measure (one-to-one)o Minkowski Distance

L1 norm (Manhattan Distance) L2 norm (Euclidean Distance) L∞ norm (Supremum Distance)

• Elastic Measure (one-to-many/one-to-none)o Dynamic Time Warping (DTW)o Edit distance based measure

Longest Common SubSequence (LCSS) Edit Distance on Real Sequence (EDR)

• Threshold-based Measureo Threshold query based similarity search

(TQuEST)• Pattern-based Measure

o Spatial Assembling Distance (SpADe)(Ding et al., 2008)

Minkowski Distance

h = 1: Manhattan (city block, L1 norm) distance

E.g., the Hamming distance: the number of bits that are different between two binary vectors

h = 2: (L2 norm) Euclidean distance

h . “supremum” (Lmax norm, L norm) distance.

This is the maximum difference between any component (attribute) of the vectors

)||...|||(|),( 22

22

2

11 pp jx

ix

jx

ix

jx

ixjid

||...||||),(2211 pp jxixjxixjxixjid

5

Dissimilarity Matricespoint attribute 1 attribute 2

x1 1 2x2 3 5x3 2 0x4 4 5

L x1 x2 x3 x4x1 0x2 5 0x3 3 6 0x4 6 1 7 0

L2 x1 x2 x3 x4x1 0x2 3.61 0x3 2.24 5.1 0x4 4.24 1 5.39 0

L x1 x2 x3 x4

x1 0x2 3 0x3 2 5 0x4 3 1 5 0

Manhattan (L1)

Euclidean (L2)

Supremum

0 2 4

2

4

x1

x2

x3

x4

Minkowski Distance Examples

Similar sequences but they are shifted and have different scales

What’s wrong with Euclidean Distance?

What if a sequence is stretched or compressed along the time axis?

(Goldin and Kanellakis, 1995)

𝑥𝑖′=𝑥 𝑖−μ

σNormalize the time series before measuring the distance between them.

Dynamic Time Warping

• Sequences are similar but accelerate differently along the time axis

• Enforcing a temporal constraint δ on the warping window size improves computation efficiency and accuracy

• Application: Speech recognition

(Berndt and Clifford, 1996)

1

Longest Common Subsequence Similarity

𝐿𝐶𝑆𝑆 (𝐶 ,𝑄 )=𝑚+𝑛−2∙ 𝑙𝑚+𝑛

c

Dissimilarity:

Tolerance:

2 5 4 5 3 1 81234517

0 0 0 0 0 1 1

1 1 1 1 1 1 1

1 1 1 1 2 2 2

1 1 2 2 2 2 2

2 2 3 3 3 3

1

1

2 2 3 3 4 4

2 2 3 3 4 4

2 4 5 1

• Match 2 sequences by allowing some elements to be unmatched

• C = {1,2,3,4,5,1,7} and Q = {2,5,4,5,3,1,8}

Longest is {2,4,5,1}

• Application: Bioinformatics

Vlachos et al., 2002

1

Longest Common Subsequence Similarity

for i := 1..m for j := 1..n if C[i] = Q[j] L[i,j] := L[i-1,j-1] + 1 else: L[i,j] := max(L[i,j-1], L[i-1,j]) return L[m,n]

• Input sequences C[1..m] and Q[1..n] • Compute LCS btwn C[1..i] and Q[1..j]

for all 1 ≤ i ≤ m and 1 ≤ j ≤ n • Stores it in L[i,j] • L[m,n] = length of the LCS

2 5 4 5 3 1 81234517

0 0 0 0 0 1 1

1 1 1 1 1 1 1

1 1 1 1 2 2 2

1 1 2 2 2 2 2

2 2 3 3 3 3

1

1

2 2 3 3 4 4

2 2 3 3 4 4

2 4 5 1

Vlachos et al., 2002

Edit Distance on Real Sequence

• Similar to LCSS

• Uses a threshold parameter ε to quantify the distance between a pair of points to 0 or 1

• Seeks the minimum number of edit operations to change one sequence into another

• Assigns penalties to the unmatched segments according to the lengths of the gaps

• Application: Trajectories of moving objects

(Chen et al., 2005)

TQuEST

SpADe

(Assfalg et al., 2006)

(Chen et al., 2007)

• Uses a threshold parameter τ to transform a time series into a sequence of threshold-crossing intervals (the points within each interval have a value greater than a given τ)

• Each interval is treated as a 2D point: x = starting time, y = ending time

• The similarity between two time series is then defined as the Minkowski sum of the two sequences of time interval points

• A pattern-based similarity measure for time series

• Finds matching segments called patterns by allowing shifting and scaling

• Then finds the most similar set of matching patterns

• Disadvantage: requires many parameters (temporal and amplitude scale factor, pattern length, sliding step size, etc.)

Comparison of Distance Measures

(Din

g e

t al., 2

00

8)

Comparison of Distance Measures

1. The accuracy of elastic measures converge with Euclidean distance as the training set increases. On small data sets, elastic measures can be significantly more accurate than lock-step measures.

2. Constraining the warping window size for elastic measures can reduce the computation cost and increase accuracy.

3. The accuracy of edit distance based similarity measures is very close to that of DTW. Only EDR is potentially slightly better than DTW.

4. The accuracy of several new similarity measures, such as TQuEST and SpADe, is in general inferior to elastic measures.

5. To improve accuracy of a similarity measure, get more training data.

6. If you can’t get more data, trying the other measures might help; however, be careful to avoid overfitting.(Ding et al., 2008)

ELKI 0.2

(Achtert et al., 2009)

• Software for visualization and performance evaluation of distance measures for time series

www.dbs.ifi.lmu.de/research/KDD/ELKI/

http://www.dbs.ifi.lmu.de/research/KDD/ELKI/

http://www.dbs.ifi.lmu.de/research/KDD/ELKI/

Research Questions

Is distance measure performance related to some intrinsic properties of the data set?

If so, can those properties be used to identify the most appropriate distance measure?

References

• Achtert, E., T. Bernecker, H.-P. Kriegel, E. Schubert, and A. Zimek. 2009. “ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series.” SSTD 2009.

• Aßfalg, J., H.-P. Kriegel, P. Kr¨oger, P. Kunath, A. Pryakhin, and M. Renz. 2006. “Similarity search on time series based on threshold queries.” EDBT, 2006.

• Berndt, D., and J. Clifford. 1996. “Finding Patterns in Time Series: A Dynamic Programming Approach.” Advances in Knowledge Discovery and Data Mining AAAI/MIT Press, Menlo Park, CA. pg. 229-248.

• Chen, L., M. Ozsu, and V. Oria. 2005. “Robust and fast similarity search for moving object trajectories. SIGMOD ‘05.

• Chen, Y., M. Nascimento, B. Ooi, and A. Tung. 2007. “SpADe: On Shape-based Pattern Detection in Streaming Time Series. ICDE, 2007.

• Ding, H., G. Trajcevski, P. Scheuermann, X. Wang, E. Keogh. 2008. “Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures.” VLDB ‘08.

• Goldin, D., and P. Kanellakis. 1995. “On Similarity Queries for Time-Series Data: Constraint Specification and Implementation.” Proceedings of the 1st International Conference on the Principles and Practice of Constraint Programming. pp. 137-153.

• Ratanamahatana, C., J. Lin, D. Gunopulos, E. Keogh, M. Vlachos, G. Das. 2010. “Mining Time Series Data.” Data Mining and Knowledge Discovery Handbook. Part 6, pg. 1049-1077.

• Vlachos, M., D. Gunopulos, and G. Kollios. 2002. “Discovering similar multidimensional trajectories.” ICDE, 2002.

It's about time. Choosing Distance Measures for Mining Time Series Data Spencer Schnier 2/22/11.

Documents

norm distance

hamming distance

distance measure performance

norm euclidean distance

appropriate distance

o minkowski distance

norm manhattan distance

minkowski distance h