Top Banner
DATA MINING IN TIME RELATED DATA
33

Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Jun 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

DATA MINING IN TIME RELATED

DATA

Page 2: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Time Series Data Mining

Data mining concepts to analyzing time series data

Revels hidden patterns that are characteristic and predictive time series events

Traditional analysis is unable to identify complex characteristics (complex, non-periodic, irregular, chaotic)

Page 3: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Time series

„a sequence of observed data, usually ordered

in time”

X=(xt, t=1..N)

Page 4: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Example 1: seismic time series

Diamonds =

observations

E.g. Seismic activity

Squares = important

observations =

events

E.g. Earthquakes

Goal: to

characterize, when

peeks occur

Page 5: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Example 2: welding time series

Diamonds: measured stickout length of droplet (in pixels)

Squares: droplet release (chaotic, noisy, irregular nature – impossible using traditional methods)

Goal: prediction of release of metal droplet

Page 6: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Example 3: stock prices

Diamonds: daily open

price

Squares: days when

price increases more

than 5%

Goal: to find hidden

patterns that provide the

desired trading edge

Page 7: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Event = important occurrence

Ex1: earthquake

Ex2: release of the droplet

Ex3: sharp rise (fall) of stock price

Page 8: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Temporal pattern

Hidden structure in time series that is characteristic

and predictive of events

Temporal pattern p = real vector of length Q

Page 9: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Temporal pattern cluster

Temporal patterns usually do not match time series

TPC is a set of all points within delta from temporal

pattern: P={aRQ: d(p, a)}

Page 10: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Phase space

Q dimensional metric space embedding time series

Mapping of set of Q observations of time series

into xt=(xt-(Q-1) , ...,xt-2 , xt- , xt)

Page 11: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Phase space example - constant

X={xt=c: t=1..N}

=1, Q=2

Page 12: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Phase space example - seismic

Page 13: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Phase space example - welding

Page 14: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Phase space example – stock open

price

Page 15: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Event characterization function

Represents the value of future „eventness” for current time index

Addresses the specific goal

Examples: g(t)=xt+1; g(t)=xt+3;g(t)=max{xt+1, xt+2, xt+3}

Welding: g(t)=yt+1;

Stock prices change: g(t)=(xt+1-xt)/xt

Page 16: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Augmented Phase space

Q+1 dimensional space formed by extending

phase space with g(·) = space of vectors <xt,

g(t)>RQ+1

Page 17: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Augmented Phase space example

seismic

Page 18: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Augmented Phase space example

welding

Page 19: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Augmented Phase space example

stock open price

Page 20: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Objective function

Measures how a temporal pattern cluster

characterizes events

M ( )– set of all time indices t when xt is within

(outside) temporal pattern cluster P

M = {t: xtP, t }

M~

Mt

M tgMcard

)()(

1

Mt

MM tgMcard

22 ))(()(

1

Page 21: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Objective function

t test for the difference between two independent

means (for statistically significant and high average

eventness clusters)

)~

()(

)(2~

2

~

McardMcard

Pf

MM

MM

Page 22: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Objective function

tp=card({xt: PiC xtPi g(t)=1})

fp=card({xt: PiC xtPi g(t)=0})

tn=card({xt: PiC xtPi g(t)=1})

fn=card({xt: PiC xtPi g(t)=0})

npnp

np

fftt

ttCf

)(

• When every event is required to be predicted by temporal

pattern

• g() is binary

• C - collection of temporal pattern clusters

• Ratio of correct predictions to all predictions

Page 23: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Optimization problem

Genetic Algorithm

Chromosome consists of Q+1 genes

E.g. Q=2

(xt-1,xt,)

)(max,

pfx

Page 24: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Seismic example

Page 25: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

DISCOVERY OF FREQUENT

EPISODES IN EVENT SEQUENCES

Page 26: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Events, event sequences

event: (A,t) AE

event sequence s on E: (s, Ts,Te)

s=<(A1,t1),(A2,t2),...,(An,tn)>

window on s: w=(w,ts,te), ts<Te, te>Ts

width(w)= te -ts

30 35 45 55 605040 65 70

t

E D F A B C E F C D B A D C E F C B E A E C F

Page 27: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Episodes

Collection of events occurring together

serial, parallel, non-serial & non-parallel

(V, , g)V – set of nodes

– partial order on V

g:V E mapping associating each node with event type

A

B

A

B

CE F

g

Page 28: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Occurrence of episodes

w=(w,37,44)

A

B

A

B

CE F

g

30 35 45 55 605040 65 70t

E D F A B C E F C D B A D C E F C B E A E C F

Page 29: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Frequency of an episode

W(s,win) – all windows in s of length win

)),((

})in occurs :),(({),,(

winWcard

winWcardwinfr

s

wsws

Page 30: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Goal

Given (1) a frequency threshold min_fr, (2) window

width win, discover all episodes (from a given

class of episodes) such that

fr(,s,win)min_fr

Page 31: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Episode rule generation algorithm

INPUT: event sequence s, win, min_fr, confidence threshold min_conf

OUTPUT: Episode rules that hold in s with respect to win, min_fr, min_conf

1. /* find all frequent episodes */

2. compute F(s,win,min_fr)

3. /* generate rules */

4. for all F(s,win,min_fr) do

5. for all do

6. if fr()/fr() min_conf then

7. output the rule and the conf. fr()/fr()

Page 32: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Example

g

if we know that occurs in 4.2% of windows and g in 4.0% we can estimate that after seeing a window with A and B there is a chance 0.95 that C follows in the same window.

A

B

A

B

CE F

g

30 35 45 55 605040 65 70t

E D F A B C E F C D B A D C E F C B E A E C F

Page 33: Time Series Data Mining - mimuw.edu.plson/datamining/DM2008/W11... · Time Series Data Mining Data mining concepts to analyzing time series data Revels hidden patterns that are characteristic

Frequent episode generation algorithm

INPUT: event sequence s, win, min_fr

OUTPUT: Collection F(s,win,min_fr) of frequent episodes

1. compute C1={: ||=1}

2. l = 1

3. while Cl do

4. compute Fl = { Cl: fr(,s,win) min_fr}

5. l = l + 1

6. compute Cl = {: ||=l and for all such that||<l we have F||}

7. for all l do output Fl