Top Banner
STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/02 1 Chen Yi-Chun
14

STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Dec 15, 2015

Download

Documents

Josef Dandridge
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

STAGGER: Periodicity Mining of Data Streams using Expanding Sliding

Windows

Mohamed G. ElfekyWalid G.Aref

Ahmed K. Elmagarmid

ICDM 2006

2007/10/02 1Chen Yi-Chun

Page 2: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Outline

• Motivation• Previous Approach

– SPD algorithm– Max-Subpattern Tree

• Approximate Incremental Technique• Conclusion

2007/10/02 2Chen Yi-Chun

Page 3: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Motivation

abcabcabcabcabc….

p=3 p=3

Single sliding window

Smaller w, real-time output supportedLager w, long period found possible

Real-time output and long period found

……………………….

Multiple sliding window is proposed

p=3abc,*b*,a**,…

p=3abc,*b*,a**,…

p=3,5abc,*b*,a**,…

Period detection : SPD algorithm is usedPatterns mining : max-subpattern tree is used

2007/10/02 3Chen Yi-Chun

Page 4: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Periodicity Detection

• : the projection of a data stream S according to a period p starting from position l ,where n is the length of S.

• Ex. If S= abcabbabdb

,( 1)

= , ,...,p l l l p n ll p

p

S e e e

4,1

3,0

( ) bbb

( ) aaab

S

S

outlier2007/10/02 4Chen Yi-Chun

Page 5: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Cont.

• : the number of times the symbol s occurs in two consecutive positions in the data stream

• Ex. If S = abbaaabaa

• indicates how often the sysbol s occurs every p timestamps in a data stream S

2 ( , )F s S

2 2( , ) 3, ( , ) 1F a S F b S

2 ,( , ( ))

( ) / 1p lF s S

n l p

2007/10/02 5Chen Yi-Chun

Page 6: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Cont.

• If a data stream S of length n contains a symbol s and

• Then s is said to be periodic in S with a period of length p at position l with respect to periodicity threshold

• Ex. S= abcabbabdb ,

– The symbol a is periodic with a period of length 3 at position 0 where respect to a periodicity threshold

– The pattern a * * is a frequent single periodic pattern of length 3

2 ,( , ( ))

( ) / 1p lF s S

n l p

2 3,0( , ( )) 2

(10 0) / 3 1 3

F a S

2 / 3

2007/10/02 6Chen Yi-Chun

Page 7: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

SPD-algorithm

• To detect the symbols that are periodic with period length p within S

• Shift S by p positions , denoted as • Ex. If S = a b c a b b a b c b• .. = * * * a b c a b b a

( )pS

(3)S

2007/10/02 7Chen Yi-Chun

Page 8: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

SPD algorithm in Time-Series

a:001 b:010 c:100

(a c c c a b b)

P=1 ………..

P=4 …………………………………………

=XXX

=YYY

Reference “Periodicity Detection in Time Series Databases” [TKDE05]

2007/10/02 8Chen Yi-Chun

Page 9: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Single Window with SPD

0 0 1 0 0 1

1 0 0 0 0 1 0 0 10 1 0 0 0 1

Shift 1 slide 2

12(a c c c a b b)

2007/10/02 9Chen Yi-Chun

Page 10: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Multi Windows with SPD

output

output

output

output

Smaller w, real-time output supportedLager w, long period found possible

2007/10/02 10Chen Yi-Chun

Page 11: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Max-Subpattern Tree

Reference “Incremental, Online, and Merge Mining of Partial Periodic Patternsin Time-Series Databases” [TKDE04]Reference “Efficient Mining of Partial Periodic Patterns in Time Series Database” [ICDE99]

abdeacdfabdjacdsabdxakdyFor p=4

d*}**,***,*c***,*b*{aF1

*c}da{b,Cmax

0

1 12

c b

23

2007/10/02

Page 12: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Approximate Incremental Tech.

Streaming data = > maintain the max-subpattern tree over the new data

Q=a{b,c}d* Q’=a{b,e}df

Intersection with Q and Q’ is abd* (equal to Q without c)

Difference from Q’ and abd* are e and f (equal to Q’ adding f and e)

The approximation happens on the insertion step2007/10/02

Page 13: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Hysteresis Threshold

• A pattern q will lose all the history information as soon as it becomes infrequent. When q becomes frequent again, it will be treated as a newly appeared frequent pattern.

• As a pattern is – Frequent i.e. the frequency is above the

higher threshold– Infrequent i.e. the frequency is below the

lower threshold– The frequencies are above the lower

threshold are kept in the tree. 2007/10/02 13

Page 14: STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Conclusion

• Discover potential periodicity rates in data streams

• Use a incremental tree-structure to mining periodic patterns

• Use two thresholds to preserving the history of candidate frequent patterns

2007/10/02 14Chen Yi-Chun