Transcript
Talk Outline
• Ubiquity of time series
• What are time series motifs?
• Rare Motif Discovery
• Conclusions
Talk Outline
• Ubiquity of time series
• What are time series motifs?
• Rare Motif Discovery
• Conclusions
Time Series is Ubiquitous
0 20
0
40
0
60
0
80
0
100
0
120
0
0 50 100 150 200 250 300 350 400 450 0
0.5
1
Unstructured audio stream
Sesnors on machine Shapes Hand Writing
Motion Capture Human Speech Web Clicks
Electrocardiogram
Insect Wingbeat Sound
Talk Outline
• Ubiquity of time series
• What are time series motifs?
• Rare Motif Discovery
• Conclusions
What are time series motifs?
- Approximately repeated subsequences - An example: Activity Recognition
walking walking stretching walking
0 200 400 600 800 1000
vacuuming
Motifs are useful as a subroutine for: -Classification -Clustering -Rule Discovery -Anomaly Detection
Talk Outline
• Ubiquity of time series
• What are time series motifs?
• Rare Motif Discovery
• Conclusions
Rare Motif Discovery
• Motivation
• Algorithms
– Brute Force
– Limited cache
• Performance Improvement
– Changing Data Representation
– Sticky Cache
• Experiments
Rare Motif Discovery
• Motivation
• Algorithms
– Brute Force
– Limited cache
• Performance Improvement
– Changing Data Representation
– Sticky Cache
• Experiments
What are time series motifs?
- Approximately repeated subsequences - An example: Activity Recognition
walking walking stretching walking
0 200 400 600 800 1000
vacuuming
Situations where current motif finding algorithms can perform
poorly/ fail Far apart in space (Motifs occurring in different data chunks )
Infrequent (Computationally expensive!)
Rare Motifs: A real life example
(Four months
omitted)
3 days ago 2 days ago now 131 days ago 129 days ago 127 days ago : : : : :
0
20
40
Solar Panel
Current (mA)
A never-ending time series stream from a weather station’s solar panel, only a fraction of which we can buffer. A pattern we are observing now seems to have also occurred about four months ago.
Rare Motif Discovery
• Motivation
• Algorithms
– Brute Force
– Limited cache
• Performance Improvement
– Changing Data Representation
– Sticky Cache
• Experiments
Brute Force Approach
I1 I2 I3 I4 …
“current item is a motif pattern” if we find that D(k + 1, j) < T and j < k + 1.
Ik
Brute Force Approach
• Brute Force with Limited Memory
– A cache of fixed size w
Success Metric Expected number of objects we see before we report success
I1 I2 I3 I4 …
“current item is a motif pattern” if we find that D(k + 1, j) < T and j < k + 1.
Ik
Rare Motif Discovery
• Motivation
• Algorithms
– Brute Force
– Limited cache
• Performance Improvement
– Changing Data Representation
– Sticky Cache
• Experiments
Changing Data Representation
Emulating virtually large cache - Downsampling the data - Reducing the dimensionality of the data - Reducing the cardinality of the data
16 20 24 30 0
2000
4000
6000
8000
10000
12000
2 4 8 12
Exp
ecte
d n
um
ber
of
ob
ject
s p
roce
ssed
b
efo
re s
ucc
ess
Virtual Cache Size
Dimensionality Reduction
Cardinality Reduction
Downsampling
Rare Motif Discovery
• Motivation
• Why the problem is hard?
• Algorithms – Brute Force
– Limited cache
• Performance Improvement – Changing Data Representation
– Sticky Cache
• Experiments
Sticky Cache
0 300 600 900 1200 1500
0.4
0.6
0.8
0.99 1
P100 = Probability of discarding an element from R is 100 times greater
P50 = Probability of discarding an element from R is 50 times greater
P100
P50
P1
Pro
bab
ility
of
succ
ess
Number of objects seen before success
•A magic cache where potential motif patterns tend to remain for longer •Biased cache replacement policy
Sticky Cache
Algorithm for detecting potential motifs
– Discretize each time series subsequence
– Query the Bloom Filter for the instance in question
• If Bloom Filter saw the instance before – Tag it as potential motif pattern
• Else – Tag it as random pattern
0 300 600 900 1200 1500
0.4
0.6
0.8
0.99 1
P100 = Probability of discarding an element from R is 100 times greater
P50 = Probability of discarding an element from R is 50 times greater
P100
P50
P1
Pro
bab
ility
of
succ
ess
Number of objects seen before success
0 50 100 150 200 250 300 350
100
1000
400
Exp
ecte
d n
um
be
r o
f e
lem
en
ts s
een
b
efo
re s
ucc
ess
Virtual Cache Size
Downsampling
Dimensionality Reduction
Cardinality Reduction
Cardinality Reduction with Sticky cache
Which approach is best?
Comparison of all approaches in commensurate scale
Rare Motif Discovery
• Motivation
• Algorithms
– Brute Force
– Limited cache
• Performance Improvement
– Changing Data Representation
– Sticky Cache
• Case Studies
7.88 7.9 7.92 7.94 7.96 7.98 8
x 10 4
Dish washer
TS: Dishwasher + Refrigerator Motif Length: 160 (2 hrs 40 mins) Sampling Rate: 0.017 Hz
Day 11 Day 19 : : : : : : : : : : :
(omitted section)
Day 70 Day 140 Day 210 Day 280 Day 350
Day 70 Day 140 Day 210 Day 280 Day 350
Ground Truth
Motifs Detected
Time Series Length: 2245824 (10 hours) Sampling Frequency: 62.3 Hz Motif Length: 188 (3 sec)
White-crowned Sparrow (Zonotrichia leucophrys)
37 minutes : : : : : : : : : : : : : : : : : : : 140 minutes
(omitted section)
0 40 80 120 160 200
36 min 54 sec
2.3 hours
A
0 40 80 120 160 200
1 min 57 sec
B
0 40 80 120 160 200
31 min 27 sec
C
Dataset: NPR August 01, 2013 Time Series Length: 29 hr 21 min 57 sec MFCC space length: 6596741 (6.5 million) Sampling Frequency: 62.4 Hz Motif Length: 4 sec
Conclusions
• We address the problem of detecting rare motifs
– Changing Data representation
– Sticky Cache
• All the code and data for this paper is publicly available!
top related