Top Banner
Clustering Seasonality Patterns in the Presence of Errors Advisor Dr. Hsu Graduate You-Cheng Che n Author Mahesh Kumar Nitin R. Patel Jonathan Woo
23

Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Dec 30, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Clustering Seasonality Patterns in the Presence of Errors

Advisor : Dr. HsuGraduate : You-Cheng ChenAuthor : Mahesh Kumar

Nitin R. Patel Jonathan Woo

Page 2: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Motivation Objective Introduction Seasonality Estimation Distance Function Experimental results Conclusions Personal opinion

Outline

Page 3: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Motivation

Most traditional clustering algorithms assume that the data is provided without measurement error

Page 4: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Objective

To present a clustering method that incorporates information contained in these error estimates and a new distance function that is based on the distribution of errors in data

Page 5: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Introduction

Definition of a good distance or dissimilarity function is a critical step in any distance based clustering method.

Problem:Most traditional clustering methods assume that data is without any error,but errors are natural in any data measurement.

Example:Sample average

Page 6: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Introduction

This study and results are focused on time-series clustering in the retail industry

This study assume that each point comes from a multidimensional Gaussian distribution

Page 7: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Seasonality Estimation (1/4)

Seasonality is defined as the normalized underlying demand of a group of similar merchandize as a function of time of the year after taking into account other factors that impact sales such as discounts,inventory,promotions and random effects.

Saleit=fI(Iit)*fP(Pit)*fQ(Qit)*fR(Rit)*PLCi(t-ti0)*Seasit (1)

After (1) remove the effects of all these nonseasonal factors Saleit= PLCi(t-ti

0)*Seasit

Page 8: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Seasonality Estimation (2/4)

S is a set of items following similar seasonality ,therefore, S consists of items having a variety of PLCs differing in their shape and time duration

Theorem 1:

Page 9: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Seasonality Estimation (3/4)

If we take the average of weekly sales of all items in S then it would nullify the effect of PLCs as suggested by the following equations.

Page 10: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Seasonality Estimation (4/4)

Seasonality values,Seast, can be estimated by appropriate Scaling of weekly sales average, Salet

The above procedure provides us with a large number of seasonal patterns, one for each set S, along with estimates of associated errors.

Page 11: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Distance Function(1/4)

Consider two seasonalities : Ai={(xi1,σi1),(xi2, σi2),…,(xiT, σiT)}Aj={(xj2, σj2),(xj2, σj2),…,(xjT, σjT)}

We define similarity between two seasonalities as follows: If the null hypothesis H0:Ai~Aj is true then similarity between Ai and Aj is the probability of accepting the hypothesis.

The distance dij between Ai and Aj is defined as ( 1-similarity)which is the probability of rejecting the H0

Page 12: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Distance Function(2/4)

Consider tth samples of both seasonalities

Ait=(xit, σit) and Ajt=(xjt, σjt).

(xit-xjt) ~ N( uit-ujt, (σ2it+ σ2

jt)1/2 ) (1)

If Ai~Aj then uit=ujt and consequently the statistic follows a t-distribution.

22jtit

jtit xx

Page 13: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Distance Function(3/4)

Finally distance

)22

2)((2

1jtit

jtxitx

TXijd

Comparison with Euclidean Distance

dij is monotonically increasing with respect to 22

2

1

)(

jtit

jtitTt

xx

Page 14: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Distance Function(4/4)

Comparison with Euclidean Distance If all σ’s were the same and equal to σ then it would become the rank order of (1) which is the same as the rank order of the Euclidean distance,(2)2

12)(

2

1jtit

Tt xx

21 )( jtit

Tt xx

Page 15: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Clustering Clustering

Algorithm

Page 16: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Experimental Results (1/6)

Simulated Data

Figure 5: Individual(prior to clustering) seasonality estimates with associated errors

Page 17: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Experimental Results (2/6)Figure 6:Seasonalities obtained by hError

Page 18: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Experimental Results (3/6)

Figure 7: Seasonalities obtained by kmeans and Ward’s method using Euclidean distances

Page 19: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Experimental Results (4/6)

Clustering Method

Average # misclassification

Average Estimation Error

hError Ward’s method kmeans

0.87 2.63 2.94

2.0182 4.7021 5.0337

Table 1:Average # misclassifications and Average Estimation Error for different clustering methods

Page 20: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Experimental Results (5/6)

tTt

ttTt

ActualSale

eForeastSalActualSalerorForecastEr

1

1

Clustering Method

Average Forecast Error %

hError Ward’s Kmeans No clustering

18.7 23.9 24.2 31.5

Table 2: Average Forecast Error(Retailer Data)

Page 21: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Experimental Results (6/6)

Page 22: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Conclusions

The distance function dij is invariant under different scales for data and the clustering method obtain better cluster than others.

Page 23: Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Personal Opinion

The concept of incorporating information abouterrors in the distance function is very good and can beused in many other clustering applications.