Efficient Query Efficient Query Filtering for Filtering for Streaming Time Streaming Time Series Series Li Wei Eamonn Keogh Helga Van Herle Agenor Mafra- Neto Computer Science & Engineering Dept. University of California – Riverside Riverside, CA 92521 {wli, eamonn}@cs.ucr.edu David Geffen School of Medicine University of California – Los Angeles Los Angeles, CA 90095 [email protected]ISCA Technologies Riverside, CA 92517 [email protected]ICDM '05
24
Embed
Efficient Query Filtering for Streaming Time Series
Efficient Query Filtering for Streaming Time Series. ICDM '05. Outline of Talk. Introduction to time series Time series filtering Wedge-based approach Experimental results Conclusions. What are Time Series?. Time series are collections of observations made sequentially in time. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Efficient Query Filtering Efficient Query Filtering for Streaming Time Seriesfor Streaming Time Series
Li Wei Eamonn Keogh Helga Van Herle Agenor Mafra-Neto
Time Series Data Mining TasksTime Series Data Mining Tasks
10
2
1
4
3 7
6
5 9
8
10
11
12Candidates
Time Series FilteringTime Series Filtering
Given a Time Series T, a set of Candidates C and a distance threshold r, find all subsequences in T that are within r distance to any of the candidates in C.
Matches Q11
Time Series
2
1
4
3 7
6
5 9
8
10
11
12Queries
Matches Q11
Database
Database
Query (template)
2
1
4
3
5
7
6
9
8
10
Database
Best match
Filtering vs. QueryingFiltering vs. Querying
Euclidean Distance MetricEuclidean Distance MetricGiven two time series Q = q1…qn and C = c1…cn ,
the Euclidean distance between them is defined as:
n
iii cqCQD
1
2,
0 10 20 30 40 50 60 70 80 90 100
Q
C
Early AbandonEarly AbandonDuring the computation, if current sum of the squared differences between each pair of corresponding data points exceeds r 2, we can safely stop the calculation.
0 10 20 30 40 50 60 70 80 90 100
calculation abandoned at this point
Q
C
2
1
4
3 7
6
5 9
8
10
11
12Candidates
Classic ApproachClassic Approach
Individually compare each candidate sequence to the query using the early abandoning algorithm.
Time Series
WedgeWedge
C2
C1
U
L
W
U
L
Q
W
Having candidate sequences C1, .. , Ck , we can form two new sequences U and L : Ui = max(C1i , .. , Cki ) Li = min(C1i , .. , Cki )
They form the smallest possible bounding envelope that encloses sequences C1, .. ,Ck .
We call the combination of U and L a wedge, and denote a wedge as W. W = {U, L}
A lower bounding measure between an arbitrary query Q and the entire set of candidate sequences contained in a wedge W:
n
iiiii
iiii
otherwise
LqifLq
UqifUq
WQKeoghLB1
2
2
0
)(
)(
),(_
Generalized WedgeGeneralized Wedge• Use Use WW(1,2)(1,2) to denote that a wedge is built to denote that a wedge is built
from sequences from sequences CC11 and and CC2 2 ..
• Wedges can be hierarchally nested. For Wedges can be hierarchally nested. For example, example, WW((1,2),3)((1,2),3) consists of consists of WW(1,2)(1,2) and and CC3 3 ..
C1 (or W1 ) C2 (or W2 ) C3 (or W3 )
W(1, 2)
W((1, 2), 3)
2
1
4
3 7
6
5 9
8
10
11
12Candidates
Wedge Based ApproachWedge Based Approach
• Compare the query to the wedge using LB_Keogh
• If the LB_Keogh function early abandons, we are done
• Otherwise individually compare each candidate sequences to the query using the early abandoning algorithm
Time Series
Examples of Wedge MergingExamples of Wedge Merging
W(1,2)
Q
W((1,2),3)
Q
C1 (or W1 ) C2 (or W2 )
W(1, 2)
C1 (or W1 ) C2 (or W2 ) C3 (or W3 )
W(1, 2)
W((1, 2), 3)
Hierarchal Clustering Hierarchal Clustering
C1 (or W1)
C4 (or W4)
C2 (or W2)
C5 (or W5)
C3 (or W3)
W3
W2
W5
W1
W4
W3
W(2,5)
W1
W4
W3
W(2,5)
W(1,4)
W((2,5),3)
W(1,4)
W(((2,5),3), (1,4))
K = 5 K = 4 K = 3 K = 2 K = 1
Which wedge set to choose ?
Which Wedge Set to Choose ?Which Wedge Set to Choose ?
• Test all Test all kk wedge sets on a representative wedge sets on a representative sample of datasample of data
• Choose the wedge set which performs the Choose the wedge set which performs the bestbest
Upper Bound on Wedge Based ApproachUpper Bound on Wedge Based Approach
• Wedge based approach seems to be efficient when Wedge based approach seems to be efficient when comparing a set of time series to a large batch dataset.comparing a set of time series to a large batch dataset.
• But, what about streaming time series ?But, what about streaming time series ?– Streaming algorithms are limited by their Streaming algorithms are limited by their worstworst case. case.– Being efficient on Being efficient on averageaverage does not help. does not help.
• We measure the number of computational steps used by the We measure the number of computational steps used by the following methods:following methods:– Brute forceBrute force
– Brute force with early abandoning (classic)Brute force with early abandoning (classic)
Audio DatasetAudio Dataset• Batch time seriesBatch time series
– 37,583,512 data points (one hour’s 37,583,512 data points (one hour’s sound)sound)
• Candidate setCandidate set– 68 time series with length 5168 time series with length 51– 3 species of harmful mosquitoes3 species of harmful mosquitoes
ConclusionsConclusions• We introduce the problem of time series We introduce the problem of time series
filtering.filtering.
• Combining similar sequences into a wedge is a Combining similar sequences into a wedge is a quite promising idea.quite promising idea.
• We have provided the upper bound of the cost We have provided the upper bound of the cost of the algorithm to compute the fastest arrival of the algorithm to compute the fastest arrival rate we can guarantee to handle.rate we can guarantee to handle.