Page 1
Themis Palpanas 1VLDB - Aug 2004
Fair Use AgreementFair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully.
• You may freely use these slides for teaching, if • You send me an email telling me the class number/ university in advance.• My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides).
• You may freely use these slides for a conference presentation, if • You send me an email telling me the conference name in advance.• My name appears on each slide you use.
• You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is highly likely I will grant you permission.
(c) Eamonn Keogh, [email protected]
Page 2
Indexing Large Human-Motion Databases
Eamonn Keogh, Themis Palpanas Victor B. Zordan,Dimitrios Gunopulos
University of California, RiversideMarc Cardle
University of Cambridge
Page 3
Themis Palpanas 3VLDB - Aug 2004
Motion Capture
records motion data from live actors
Page 4
Themis Palpanas 4VLDB - Aug 2004
Motion Capture
records motion data from live actors used for data-driven animation
Page 5
Themis Palpanas 5VLDB - Aug 2004
Motion Capture in Games Industry
Street NBA
Madden
Page 6
Themis Palpanas 6VLDB - Aug 2004
Motion Capture in Movie Industry
Troy
Lord of the Rings
Page 7
Themis Palpanas 7VLDB - Aug 2004
Motivation
motion capture data segmented in short sequences, stored in motion libraries composed to create long, realistic motion sequences
important to find similar sequences form pool of similar sequences choose the most promising, to continue the motion
Page 8
Themis Palpanas 8VLDB - Aug 2004
Motivation Dynamic Time Warping (DTW)
Considers only local adjustments in time, to match two time series However sometimes global adjustments are required
DTW is being extensively used uniform scaling is complementary
combination of both techniques offers rich, high-quality result set
DTW Uniform Scaling
Page 9
Themis Palpanas 9VLDB - Aug 2004
Uniform Scaling
time series query, Q, length n candidate, C, length m (m>n)
0 100 200 300 400
0 100 200 300 400
C
Q
Page 10
Themis Palpanas 10VLDB - Aug 2004
Uniform Scaling
time series query, Q, length n candidate, C, length m (m>n)
stretch Q to length p (n≤p≤m): Qp
Qpj = Q┌j*n/p┐, 1 ≤ j ≤ p
scaling factor, sf = p/n max scaling factor, sfmax= m/n
0 100 200 300 400
0 100 200 300 400
C
Q
0 100 200 300 400
0 100 200 300 400
Q
Qp
Page 11
Themis Palpanas 11VLDB - Aug 2004
Problem Statement
given time series, Q database of candidate time series, {D}
find argminp{ dist(Qp, {D} ) } dist(Qp, {D} )= Euclidean Distance between time series
Page 12
Themis Palpanas 12VLDB - Aug 2004
Problem Statement
given time series, Q database of candidate time series, {D}
find argminp{ dist(Qp, {D} ) } dist(Qp, {D} )= Euclidean Distance between time series
challenges quickly solve the problem for two time series extend solution to scale-up to large time series
databases
Page 13
Themis Palpanas 13VLDB - Aug 2004
Outline
Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions
Page 14
Themis Palpanas 14VLDB - Aug 2004
Best Uniform Scaling Match
brute force algorithm: for each time series in {D}
for each sf, 1 ≤ sf ≤ sfmax
compute distance between the two time series find the best overall match
time complexity: O(|D|(m-n)) extremely expensive!
Page 15
Themis Palpanas 15VLDB - Aug 2004
Lower Bounding Uniform Scaling
lower bound distance between two time series,for any sf, 1 ≤ sf ≤ sfmax
desiderata: fast to compute tight bound
results in fast pruning of candidates that are guaranteed not to belong to the solution compute distance only for time series not pruned by
lower bound
Page 16
Themis Palpanas 16VLDB - Aug 2004
Lower Bounding Uniform Scaling
assume: candidate C, length 100 query Q, length 80 wish to find best match for any
scaling of Q between 80-100
0 10 20 30 40 50 60 70 80 90 100
C
m = 100
Page 17
Themis Palpanas 17VLDB - Aug 2004
Lower Bounding Uniform Scaling
assume: candidate C, length 100 query Q, length 80 wish to find best match for any
scaling of Q between 80-100 build envelopes, length 80:
0 10 20 30 40 50 60 70 80 90 100
U
L
n = 80Ui = max( C (i-1)*m/n +1,…, C i*m/n )
Li = min( C (i-1)*m/n +1,…, C i*m/n )
Page 18
Themis Palpanas 18VLDB - Aug 2004
Lower Bounding Uniform Scaling
assume: candidate C, length 100 query Q, length 80 wish to find best match for any
scaling of Q between 80-100 build envelopes, length 80:
0 10 20 30 40 50 60 70 80 90 100
Q
Ui = max( C (i-1)*m/n +1,…, C i*m/n )
Li = min( C (i-1)*m/n +1,…, C i*m/n )
Page 19
Themis Palpanas 19VLDB - Aug 2004
Lower Bounding Uniform Scaling
assume: candidate C, length 100 query Q, length 80 wish to find best match for any
scaling of Q between 80-100 build envelopes, length 80:
0 10 20 30 40 50 60 70 80 90 100
Ui = max( C (i-1)*m/n +1,…, C i*m/n )
Li = min( C (i-1)*m/n +1,…, C i*m/n )
Page 20
Themis Palpanas 20VLDB - Aug 2004
Lower Bounding Uniform Scaling
assume: candidate C, length 100 query Q, length 80 wish to find best match for any
scaling of Q between 80-100 compute lower bound:
0 10 20 30 40 50 60 70 80 90 100
n
iiiii
iiii
otherwiseLQifLQUQifUQ
CQKeoghLB1
2
2
0)()(
),(_
Page 21
Themis Palpanas 21VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high
0 10 20 30 40 50 60 70 80 90 100
80 points
Page 22
Themis Palpanas 22VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high reduce dimensionality by approximating them
Piecewise Constant Approximation
0 10 20 30 40 50 60 70 80 90 100
8 points
UU
U
L
Page 23
Themis Palpanas 23VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high reduce dimensionality by approximating them
Piecewise Constant Approximation
assume query Q, length 80
0 10 20 30 40 50 60 70 80 90 100
Q
Page 24
Themis Palpanas 24VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high reduce dimensionality by approximating them
Piecewise Constant Approximation
assume query Q, length 80 we approximate it with 8 points
0 10 20 30 40 50 60 70 80 90 100
Q
Page 25
Themis Palpanas 25VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high reduce dimensionality by approximating them
Piecewise Constant Approximation
assume query Q, length 80 approximated with 8 points
compute approximation of lower bound:
0 10 20 30 40 50 60 70 80 90 100
N
iiiii
iiii
otherwiseLQifLQUQifUQ
NnRQMINDIST
1
2
2
0
ˆ)ˆ(
ˆ)ˆ()ˆ,(
Page 26
Themis Palpanas 26VLDB - Aug 2004
Algorithms for Secondary Storage
use a multidimensional index VA-file -> FastScan algorithm R-tree -> RtreeProbe algorithm
2-pass algorithms:1. scan approximated envelopes,
prune search space2. find exact answer using original series
Page 27
Themis Palpanas 27VLDB - Aug 2004
Outline
Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions
Page 28
Themis Palpanas 28VLDB - Aug 2004
Datasets Used
motion capture data from 124 sensors placed on human actors
mixed bag time series coming from:
medicine, manufacturing, environmental monitoring, economics, sensor data
experimented with time series databases of: size 5,000 – 80,000 time series length 64 – 1,024 points
Page 29
Themis Palpanas 29VLDB - Aug 2004
Main Memory Experiments
assume database fits in memory measure pruning power:
fraction of times each approach calls distance function
our technique: 1 order of magnitude
faster than CD-criterion
256
128
64
256
128
64
1.20
1.10
1.05
0
0.05
0.1
0.15
0.2
0.25
LB_Keogh
CD- criterion
Page 30
Themis Palpanas 30VLDB - Aug 2004
Main Memory Experiments
assume database fits in memory measure pruning power:
fraction of times each approach calls distance function
our technique: 1 order of magnitude
faster than CD-criterion 3 orders of magnitude
faster than brute force
256
128
64
256
128
64
1.20
1.10
1.05
0
0.05
0.1
0.15
0.2
0.25
LB_Keogh
CD- criterion
brute force
Page 31
Themis Palpanas 31VLDB - Aug 2004
Disk-Based Experiments
comparison of: brute force FastScan RtreeProbe
25612864
25612864
25612864
1.201.101.05
0
5
10
15
20
25
LinearScan
FastScan
RtreeProbe
Sec
onds
25612864
25612864
25612864
1.201.101.05
0
5
10
15
20
25
LinearScan
FastScan
RtreeProbe
Sec
onds
Page 32
Themis Palpanas 32VLDB - Aug 2004
Disk-Based Experiments
comparison of: FastScan RtreeProbe
LinearScanLB
FastScan
RtreeBF
RtreeProbe
Sec
onds
64
0
10
20
30
40
50
60
70
80
1282565121024LinearScanLB
FastScan
RtreeBF
RtreeProbe
Sec
onds
64
0
10
20
30
40
50
60
70
80
0
10
20
30
40
50
60
70
80
1282565121024
Page 33
Themis Palpanas 33VLDB - Aug 2004
Disk-Based Experiments
comparison of: FastScan RtreeProbe
Sec
onds
0
LinearScanLB
FastScan
RtreeBF
RtreeProbe
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
500010000200004000080000
Sec
onds
0
LinearScanLB
FastScan
RtreeBF
RtreeProbe
LinearScanLB
FastScan
RtreeBF
RtreeProbe
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
500010000200004000080000
500010000200004000080000
Page 34
Themis Palpanas 34VLDB - Aug 2004
Case Study
video
Page 35
Themis Palpanas 35VLDB - Aug 2004
Outline
Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions
Page 36
Themis Palpanas 36VLDB - Aug 2004
Related Work
Dynamic Time Warping (DTW) [Yi & Faloutsos’00][Keogh’02][Zhu & Shasha’03][Fung &
Wong’03]
Longest Common SubSequence (LCSS) [Das et al.’97][Vlachos et al.’03]
uniform scaling [Argyros & Ermopoulos’03]
Page 37
Themis Palpanas 37VLDB - Aug 2004
Outline
Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions
Page 38
Themis Palpanas 38VLDB - Aug 2004
Conclusions
studied utility of uniform scaling similarity matching applications in:
motion capture libraries, music retrieval, historical handwritten archives
introduced first lower bounding technique proposed indexing method for bounding envelopes
suitable for very large time series databases experimentally evaluated efficiency of technique demonstrated quality of results with real motion
capture data
Page 39
Themis Palpanas 39VLDB - Aug 2004
Outline
Page 40
Themis Palpanas 40VLDB - Aug 2004
Lower Bounding Uniform Scaling
assume: candidate C, length 100 query Q, length 80 wish to find best match for any
scaling of Q between 80-100 build envelopes, length 80:
0 10 20 30 40 50 60 70 80 90 100
Ui = max( C (i-1)*m/n +1,…, C i*m/n )
Li = min( C (i-1)*m/n +1,…, C i*m/n )