Top Banner
Classifying Multivariate Time Series Scalably Ashfaq Munshi, Saeed Bidhendi, Faramarz Munshi November 10, 2017
29

Ashfaq Munshi, ML7 Fellow, Pepperdata

Jan 22, 2018

Download

Technology

MLconf
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ashfaq Munshi, ML7 Fellow, Pepperdata

Classifying Multivariate Time Series Scalably

Ashfaq Munshi, Saeed Bidhendi, Faramarz Munshi

November 10, 2017

Page 2: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Background and Motivation

• Univariate Time Series (UTS)

• Multivariate Time Series (MTS)

• Conclusion

Overview

© Pepperdata, Inc.2

Page 3: Ashfaq Munshi, ML7 Fellow, Pepperdata

Background

Page 4: Ashfaq Munshi, ML7 Fellow, Pepperdata
Page 5: Ashfaq Munshi, ML7 Fellow, Pepperdata

Pepperdata Telemetry Data Scale

Example production deployment:

© Pepperdata, Inc.5

570Nodes

20Tasks /Node

300Metrics /

Task

5-Sec Sampling

41 MillionPoints / Minute

Page 6: Ashfaq Munshi, ML7 Fellow, Pepperdata

300Trillion

PerformanceData Points Collected

Our Big Data About Production Big Data

© Pepperdata, Inc.6

22Thousand

ProductionNodes

50MillionJobs/Year

Page 7: Ashfaq Munshi, ML7 Fellow, Pepperdata

Example Time Series

© Pepperdata, Inc.7

Page 8: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Highly variable in length

• 10 data points to 10K+ data points

• Missing data

• Extremely noisy

Characteristics of our TS

© Pepperdata, Inc.8

Page 9: Ashfaq Munshi, ML7 Fellow, Pepperdata

Problem

© Pepperdata, Inc.9

Classify this collection of time series

to give operators a better understanding of

resource utilization on their clusters and to

enable a scheduler to better optimize cluster

resources

Page 10: Ashfaq Munshi, ML7 Fellow, Pepperdata

Univariate Time Series

Page 11: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Two recent approaches from the literature

• Transform the TS into an image then use a tiled CNN

[Wang & Oats 2015]

• Transform the TS into a bag of patterns

[Schafer & Leser 2017]

• Dataset is the UCR data set

• 82 time series data sets

• Number of series < 10K

• Data points per series < 2K

Approaches and Data Set

© Pepperdata, Inc.11

Page 12: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Map the time series into

• Gramian Angular Summation Fields

• Gramian Angular Difference Fields

• Markov Transition Fields

• Feed images into a tiled CNN for classification

Time Series and Images

© Pepperdata, Inc.12

[Wang & Oats, 2015]

Page 13: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Normalize the time series into [-1,1]

• Transform to Polar Coordinates

Gramian Angular Fields

© Pepperdata, Inc.13

[Wang & Oats, 2015]

Page 14: Ashfaq Munshi, ML7 Fellow, Pepperdata

Example GADF Image

© Pepperdata, Inc.14

[Wang & Oats, 2015]

Page 15: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Divide TS into windows

• Fourier Transform TS in window

• Apply low-pass filter

• Quantize the Fourier coefficients

• Map window to words

• Extract features from sentences

• Use Logistic Regression classifier

Time Series and Bag of Patterns

© Pepperdata, Inc.15

[Schafer & Leser 2017]

Page 16: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Convert TS into image (GADF)

• Use Google’s pre-trained CNN; trained on inception v3

• Embed into 2,048-dimensional vector space

• Train MLP

• 2 hidden layers (50 nodes each)

• ReLU activation

• Dropout for regularization (.1, .2)

• Softmax final layer

Our “Off the shelf” Approach (PD)

© Pepperdata, Inc.16

Page 17: Ashfaq Munshi, ML7 Fellow, Pepperdata

Accuracies for a subset of UCR

© Pepperdata, Inc.17

0%

20%

40%

60%

80%

100%

BOSS (91.1)

PD (89.8)

GADF+GASF+MTF (86.4)

Page 18: Ashfaq Munshi, ML7 Fellow, Pepperdata

Accuracy on a subset of UCR

© Pepperdata, Inc.18

68%

70%

72%

74%

76%

78%

80%

82%

84%

86%

WEASEL 1-NN DTW CV 1-NN DTW BOSS LearningShapelet (LS)

TSBF ST EE (PROP) COTE(ensemble)

PD

Page 19: Ashfaq Munshi, ML7 Fellow, Pepperdata

Training Time Comparison

© Pepperdata, Inc.19

PD

Page 20: Ashfaq Munshi, ML7 Fellow, Pepperdata

Multivariate Time Series

Page 21: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Two recent approaches from the literature

• Use an ESN (“Echo State Network”) to map MTS into

state clouds [Wang, Wang, Liu 2015]

• Use Dynamic Time Warping with Mahalanobis distance

metric [Mei, Liu, Wang, Gao 2016]

• Dataset is from UCI, a small subset of UCR and others

• Number of series ~ 10K

• Data points per series ~ 200

Approaches and Data Set

© Pepperdata, Inc.21

Page 22: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Make TS for each variable the same length by zero

padding

• Convert each TS into a GADF image

• Interpolate any missing data points in the image using

linear interpolation on the image

• Stack the images for the five variables

• Use the same process as before for univariate time

series

Our “Off the Shelf” Approach (PD)

© Pepperdata, Inc.22

Page 23: Ashfaq Munshi, ML7 Fellow, Pepperdata

5-Fold Cross Validation Error

© Pepperdata, Inc.23

0

5

10

15

20

25

30

Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5

MDDTW Best

PD 5-fold

Page 24: Ashfaq Munshi, ML7 Fellow, Pepperdata

10-Fold Cross Validation Error

© Pepperdata, Inc.24

0

5

10

15

20

25

30

Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5

Echo Network Best

PD 10-fold

Page 25: Ashfaq Munshi, ML7 Fellow, Pepperdata

• Four variables:

• CPU, Virtual Memory, HDFS reads, Network Ops

• Each time series collected over one week

• 10 data points to 10K+ data points

• Missing data

• Extremely noisy

• For periods longer than a week, data is much larger

• Sampling rate is the same for all TS

PD Data

© Pepperdata, Inc.25

Page 26: Ashfaq Munshi, ML7 Fellow, Pepperdata

Accuracy per Label on PD Dataset G

© Pepperdata, Inc.26

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Accuracy

Number of TS = 3092

Lengths per TS = 5 to 8500

Average Accuracy = 78.14%

Page 27: Ashfaq Munshi, ML7 Fellow, Pepperdata

Accuracy per Label on PD Dataset R

© Pepperdata, Inc.27

Number of TS = 6715

Lengths per TS = 5 to 9400

Average Accuracy = 75.95

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Page 28: Ashfaq Munshi, ML7 Fellow, Pepperdata

Summary

© Pepperdata, Inc.28

Our “Off the Shelf” approach is as good as the

best approaches for both UTS and MTS. And,

the methodology is the same for both types of

TS.

Page 29: Ashfaq Munshi, ML7 Fellow, Pepperdata

Thank You