Top Banner
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee- Keong Ng, and Kok- Leong Ong (DaWaK 2006) 2008/3/19 1 Yi-Chun Chen
19

Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Dec 18, 2015

Download

Documents

Gillian Hunt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Adaptive Load Shedding for Mining Frequent Patterns from

Data Streams

Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong

(DaWaK 2006)

2008/3/19 1Yi-Chun Chen

Page 2: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Outline

• Motivation

• Objective

• Definition

• Adaptive Load Shedding in Data Stream

• Performace Results

• Conclusion

2008/3/19 2Yi-Chun Chen

Page 3: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Motivation• Finding frequent itemsets plays an important role

in analyzing data streams

• Only assuming that the machinery itself is fast enough to handle all incoming transactions without incurring any unwanted latencies

2008/3/19 Yi-Chun Chen 3

Page 4: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

(Cont.)

• The arrival rate of data streams usually exceeds the system capacity

• Algorithms mining from data streams must cope with system overload situations

2008/3/19 Yi-Chun Chen 4

Page 5: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Objective• Given a processing capacity C of a mining

system and a data stream DS with high arrival rates

• Load(DS) : the workload of the system

• If , a load shedding is invoked

• Guarantee

• Discover a set of patterns closely approximates to the set of actual frequent itemsets

2008/3/19 Yi-Chun Chen 5

( )Load DS C

( )Load DS C

Page 6: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

(Cont.)

• How to determine overload situations?

• How much load to shed?

• How to approximate frequent patterns under the introduction of load shedding?

2008/3/19 Yi-Chun Chen 6

Page 7: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Definition

• • : the occurrence count of X in DS up to the

transaction

MFIs: maximal frequent itemset

2008/3/19 Yi-Chun Chen 7

1 2, ,..., mI a a a

1 2, ,..., ,...NDS t t t

( )freq X thN

( )sup( )

freq XX

N

Page 8: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Adaptive Load Shedding in Data Streams

• Overload Detection

• Load Shedding by Sampling Transactions

2008/3/19 Yi-Chun Chen 8

Page 9: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Overload Detection

• To quickly estimate the system workload, we propose an approximate method on MFIs– MFIs also contains all frequent itemsets

– The # of MFIs is smaller than the # of frequent itemsets

– The support of MFIs is always closest to

2008/3/19 Yi-Chun Chen 9

Page 10: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

(Cont.)

• load coefficient:– k be the # of MFIs in a transaction

– be a MFI, where

• Suppose we measure the above statistics for n transactions over one time unit

– r be the current rate of the data stream

2008/3/19 Yi-Chun Chen 10

1 , 1

2 2 i ji

k kX XX

i i j

L

iX 1 i k

1

n

iiL

r Cn

Page 11: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Load Shedding by Sampling Transactions

• In order to estimate how much load to shed

– P be a parameter expressing the fraction of transactions that should be discarded

– Suppose P < 1 , then we use Hoeffding bound to discard transactions and to approximate frequent patterns

2008/3/19 Yi-Chun Chen 11

1

n

iiL

P r Cn

Page 12: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

(Cont.)• Hoeffding bound:

– , – r be the number of times that occurs in these

transactions– sup(X) = p : the true support of X

– : the estimated support of X

– We want to satisfy the inequality, so the required number of sampling transactions is at least

2008/3/19 Yi-Chun Chen 12

0 0Pr r n p n 2

022 ne 0 1 1iX

0n

0sup ( ) /E X r n

0 2

1 2ln

2n

Page 13: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

(Cont.)

• Sample batch: each incoming transaction is chosen with probability P until we sample enough transactions

• Local patterns: all freq. itemsets in this sample batch are found only within part of the stream

• Global freq. itemsets in the entire stream

2008/3/19 Yi-Chun Chen 13

0n

Page 14: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

(Cont.)

• Due to the non-uniform distribution of the stream

– False global patterns

– Significant support : the max. support error of each pattern

• : frequent

• : sub-frequent

• : infrequent

2008/3/19 Yi-Chun Chen 14

0 ( )

sup( )X

0 sup( )X

0sup( )X

Significant patterns

Page 15: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

(Cont.)

• The required number of sampling transactions is at least

• If and ,then is too huge• we assume that each itemset appearing more than 0.01% ,then if

, then every itemset will be chosen

• ,

2008/3/19 Yi-Chun Chen 15

0 2

1 2ln

2n

0.001 0.01 0 2600000n

0 10000n

0 20

1 2; ln2

n Max

1

Page 16: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Performance Results

• Accuracy Measurements

• Adaptability• Recall: 找到的 true freq. patterns / 實際上是 true freq. patterns

• Precision: 找到 true freq. patterns / 找到的 total freq. patterns

• Synthetic: T5I3D1000K, T8I4D1000K with 10000 unique items

• Real-life: “BMS-POS” T6.5 D515597 with 1657 distinct items

• Fix , select

2008/3/19 Yi-Chun Chen 16

00.01, 0.01 25n K 04, 0.1

0 ;250.1

n K

Page 17: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

2008/3/19 Yi-Chun Chen 17

Page 18: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

2008/3/19 Yi-Chun Chen 18

Page 19: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Conclusion

• To address the problem of finding frequent patterns from data streams where the mining system may not keep up with the arrival reat of the stream

2008/3/19 Yi-Chun Chen 19