Top Banner
Nurjahan Begum, Liudmila Ulanova, Jun Wang 1 and Eamonn Keogh University of California, Riverside UT Dallas 1 Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Presented By: Nurjahan Begum
29

TADPole_Nurjahan Begum

Jan 23, 2017

Download

Documents

Nurjahan Begum
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TADPole_Nurjahan Begum

Nurjahan Begum, Liudmila Ulanova, Jun Wang1 and Eamonn Keogh University of California, Riverside UT Dallas1

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy

Presented By: Nurjahan Begum

Page 2: TADPole_Nurjahan Begum

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

Page 3: TADPole_Nurjahan Begum

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

Page 4: TADPole_Nurjahan Begum

Motivation of DTW Clustering

#kanyewest

#Michael

#MichaelJackson

#taylorswift0 40 80 120

hours

Page 5: TADPole_Nurjahan Begum

Motivation of DTW Clustering

#kanyewest

#Michael

#MichaelJackson

#taylorswift0 40 80 120

hours

Synonym Discovery ?

Page 6: TADPole_Nurjahan Begum

Motivation of DTW Clustering

#kanyewest

#Michael

#MichaelJackson

#taylorswift0 40 80 120

hours

Synonym Discovery ?

Association Discovery ?

“I’mma let you finish”

Page 7: TADPole_Nurjahan Begum

Two Questions…

0 200 400 600 800 1000 1200

Query Q

Black-Faced

leafhopper

Beet leafhopper

How do we define similar? • We need to be invariant to noise, amplitude, linear drift, scaling, warping… • Dozens of claimed measures, many with dubious empirical work (cherry picking datasets, crippling rival methods, training on the test data….)

Nothing significantly beats a 40-year old technique called Dynamic Time Warping (DTW).

Page 8: TADPole_Nurjahan Begum

Comparison Between DTW and ED

Bos taurus

Hyperoodon ampullatus

Talpa europaea

Bos taurus

Hyperoodon ampullatus

Talpa europaea

Cetartiodactyla

DTW ED

Page 9: TADPole_Nurjahan Begum

Why is DTW Clustering Hard? Observation 1: The convergence of DTW and Euclidean distance results for increasing data sizes.

Observation 2: The increasing effectiveness of lower-bounding pruning for increasing data sizes.

Neither of these two observations help!

0 1000 2000

0.01

0.03

0.05

0.07

1-N

N er

ror r

ate

Size of training set

Euclidean

DTW

0 1000 20000.6

0.7

0.8

0.9

Dataset Size

Rand

Inde

x

DTW

Euclidean

Page 10: TADPole_Nurjahan Begum

Why Existing Work is not the Answer?

Scalability Issue: DTW is not a metric, therefore very difficult to index Quality Issue: Need clustering algorithm which is insensitive to outliers

1

2

3

4

5

6

7

8

9 10

11

12

13

1

2

3

4

5

6

7

8

9 10

11

12

13

Mislabeled

by k-means

Outlier

Page 11: TADPole_Nurjahan Begum

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

Page 12: TADPole_Nurjahan Begum

Density Peaks (DP) Algorithm • Why?

Parameter-lite Can handle arbitrary shape clusters Insensitive to noise/outliers

• 3 Steps

Density Calculation NN within Higher Density List Calculation Cluster Assignment

Page 13: TADPole_Nurjahan Begum

Density Peaks (DP) Algorithm Density

1 2 3

4

5

6 8

7

9 10

11 12 13 1

2

3

4

5

6

7

8

9

10

11

12

13

4

3

6

4

5

3

1

3

1

1

2

2

2

ρ

1 dc

Page 14: TADPole_Nurjahan Begum

Density Peaks (DP) Algorithm Nearest NN from High Density List

1 2 3

4

5

6 8

7

9 10

11 12 13 1

2

3

4

5

6

7

8

9

10

11

12

13

4

3

6

4

5

3

1

3

1

1

2

2

2

ρ Elements with higher density

4.2 6

1 dc

3 5

Page 15: TADPole_Nurjahan Begum

Density Peaks (DP) Algorithm Cluster Assignment

1 2 3

4

5

6 8

7

9 10

11 12 13 1

2

3

4

5

6

7

8

9

10

11

12

13

4

3

6

4

5

3

1

3

1

1

2

2

2

ρ Elements with higher density

4.2 6

1 dc

3 5

Item 1’s cluster label = item 3’s cluster label

Page 16: TADPole_Nurjahan Begum

Plot of values of step 1 (density[X]) and step 2 (NN distance[Y])

Page 17: TADPole_Nurjahan Begum

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

Page 18: TADPole_Nurjahan Begum

TADPole

j

LBMatrix(i,j)

Dij

UBMatrix(i,j)

LBMatrix(i,j) Dij

UBMatrix(i,j)

dc

LBMatrix(i,j) Dij

UBMatrix(i,j)

B)

C)

D)

i j

i

i

j

j

i Dij = 0 A)

Pruning During Local Density Computation

Calculate distance!

Page 19: TADPole_Nurjahan Begum

TADPole Pruning During NN Distance Calculation From Higher Density List

LBMatrix(i,j1) D1

UBMatrix(i,j1)

D2 UBMatrix(i,j2)

D3

UBMatrix(i,j3)

A)

B)

C)

i j1

i

i

j2

j3

D4 UBMatrix(i,j4)

i j4

D)

LBMatrix(i,j2)

LBMatrix(i,j4)

LBMatrix(i,j3)

Page 20: TADPole_Nurjahan Begum

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

Page 21: TADPole_Nurjahan Begum

How Effective is TADPole’s Pruning? D

ista

nce

Cal

cula

tio

ns

0 3500

1

3

5

7 x 10 6

TADPole

Number of objects

Absolute

Number

0 3500 0

100

Number of objects

Brute force

TADPole

Percentage

DP: 9 Hours TADPole: 9 minutes

Page 22: TADPole_Nurjahan Begum

Distance Computation Ordering: Anytime TADPole

Distance Computation Percentage 100%

0.4

1

0

Ran

d

Ind

ex Euclidean

Distance

Oracle

Order TADPole

Order

0 10%

0.4

1

Oracle Order

Random Order

TADPole Order

Random

Order

Ran

d I

nd

ex

Distance Computation Percentage

Zoom-In of Above Figure

This reflects the 90%

of DTW calculations

that were admissibly

pruned

This reflects the 10%

of DTW calculations

that were calculated

in anytime ordering

10%

Page 23: TADPole_Nurjahan Begum

How ‘good’ are TADPole Clusters?

Dataset TADPoleDTW

(TADPoleED)

k-means

DTWversion

Hierarchical

DTWversion

DBSCAN

DTWversion

Spectral

DTWversion

CBF 1 (0.66) 0.78 0.73 0.77 0.76

FacesUCR 0.92 (0.86) 0.87 0.85 0.77 0.94

MedicalImages 0.66 (0.67) 0.67 0.62 0.65 0.69

Symbols 0.98 (0.81) 0.93 0.78 0.91 0.95

uWaveGesture_Z 0.86 (0.84) 0.85 0.83 0.8 0.86

Page 24: TADPole_Nurjahan Begum

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Prunning Strategies

• Experimental Results • Case Studies • Conclusions

Page 25: TADPole_Nurjahan Begum

Case Study 1 Electromagnetic Articulograph

0 150

Y

Z

Y

Z

1 2 3 4 5 6 7

0.84

0.92

1

Distance Computation Percentage

Ran

d I

nd

ex

Euclidean Distance

Oracle Order

Random Order

TADPole Order

Pruning: 94%

Page 26: TADPole_Nurjahan Begum

Case Study 2 Pulsus Dataset

Suspected Pulsus

Severe Pulsus

Healthy

Oximeter

Vein Artery

Photo Detector

LED

0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60

Patient 639 Patient 523 Patient 618 Patient 2975918

0 10 20 30 40 50 60 0 10 20 30 40 50 60

Normalized Respiration Rate Normalized Heart Rate

Po

wer

Sp

ectr

al

Den

sity

Frequency

A) B)

C) D) E) F)

200 600 1000 1400 1800 200 600 1000 1400 1800

Non-Severe Pulsus Severe Pulsus

PP

G

Pruning: 88%

Page 27: TADPole_Nurjahan Begum

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Prunning Strategies

• Experimental Results • Case Studies • Conclusions

Page 28: TADPole_Nurjahan Begum

Conclusions • Proposed a robust DTW clustering algorithm

TADPole Exploit both upper and lower bounds Compute the clustering in an anytime fashion

• Demonstrated the utility of our algorithm on diverse domains Electromagnetic Articulograph Pulsus Dataset

Page 29: TADPole_Nurjahan Begum

Thanks to NSF!