Efficient Data Stream Classification via Probabilistic Adaptive Windows

Efficient Data Stream Classification viaProbabilistic Adaptive Windows

Albert Bifet1, Jesse Read2,Bernhard Pfahringer3, Geoff Holmes3

1Yahoo! Research Barcelona2Universidad Carlos III, Madrid, Spain

3University of Waikato, Hamilton, New Zealand

SAC 2013, 19 March 2013

Data Streams

Big Data & Real Time

Data Streams

Data StreamsI Sequence is potentially infiniteI High amount of data: sublinear spaceI High speed of arrival: sublinear time per exampleI Once an element from a data stream has been processed

it is discarded or archived


Data Streams

Approximation algorithms

I Small error rate with high probabilityI An algorithm (ε, δ)−approximates F if it outputs F̃ for which

Pr[|F̃ − F | > εF ] < δ.


Data Stream Sliding Window

Sampling algorithms

I Giving equal weight to old and new examples: RESERVOIR

SAMPLING

I Giving more weight to recent examples: PROBABILISTIC

APPROXIMATE WINDOW


8 Bits Counter

1 0 1 0 1 0 1 0

What is the largest number we canstore in 8 bits?

8 Bits Counter


8 Bits Counter

0 20 40 60 80 1000

20

40

60

80

100

x

f (x) = log(1 + x)/ log(2)

f (0) = 0, f (1) = 1

8 Bits Counter

0 2 4 6 8 100

2

4

6

8

10

x

f (x) = log(1 + x)/ log(2)

f (0) = 0, f (1) = 1

8 Bits Counter

0 2 4 6 8 100

2

4

6

8

10

x

f (x) = log(1 + x/30)/ log(1 + 1/30)

f (0) = 0, f (1) = 1

8 Bits Counter

0 20 40 60 80 1000

20

40

60

80

100

x

f (x) = log(1 + x/30)/ log(1 + 1/30)

f (0) = 0, f (1) = 1

8 bits Counter

MORRIS APPROXIMATE COUNTING ALGORITHM

1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1


8 bits Counter



With p = 1/2 we can store 2× 256with standard deviation σ =

√n/2

8 bits Counter



With p = 2−c then E [2c] = n + 2 withvariance σ2 = n(n + 1)/2

8 bits Counter



If p = b−c then E [bc] = n(b − 1) + b,σ2 = (b − 1)n(n + 1)/2

PROBABILISTIC APPROXIMATE WINDOW

1 Init window w ← ∅2 for every instance i in the stream3 do store the new instance i in window w4 for every instance j in the window5 do rand = random number between 0 and 16 if rand > b−1

7 then remove instance j from window w

PAW maintains a sample of instancesin logarithmic memory, giving greater

weight to newer instances

Experiments: Methods

Abbr. Classifier Parameters

NB Naive BayesHT Hoeffding TreeHTLB Leveraging Bagging with HT n = 10kNN k Nearest Neighbour w = 1000, k = 10kNNW kNN with PAW w = 1000, k = 10kNNWA kNN with PAW+ADWIN w = 1000, k = 10kNNLB

W Leveraging Bagging with kNNW n = 10

The methods we consider. Leveraging Baggingmethods use n models. kNNWA empties its

window (of max w) when drift is detected (usingthe ADWIN drift detector).

Experimental Evaluation

Table : The window size for kNN and corresponding performance.

Accuracy−w 100 −w 500 −w 1000 −w 5000

Real Avg. 77.88 77.78 79.59 78.23Synth. Avg. 57.99 81.93 84.74 86.03Overall Avg. 62.53 80.28 82.59 83.11

Results



Time (seconds)−w 100 −w 500 −w 1000 −w 5000

Real Tot. 297 998 1754 7900Synth. Tot. 371 1297 2313 10671Overall Tot. 668 2295 4067 18570

Results



RAM Hours−w 100 −w 500 −w 1000 −w 5000

Real Tot. 0.007 0.082 0.269 5.884Synth. Tot. 0.002 0.026 0.088 1.988Overall Tot. 0.009 0.108 0.357 7.872

Results


Table : Summary of Efficiency: Accuracy and RAM-Hours.

NB HT HTLB kNN kNNW kNNWA kNNLBW

Accuracy 56.19 73.95 83.75 82.59 82.92 83.19 84.67RAM-Hrs 0.02 1.57 300.02 0.36 8.08 8.80 250.98

Results

Conclusions

Sampling algorithms for kNN

I Giving equal weight to old and new examples: RESERVOIR

SAMPLING

I Giving more weight to recent examples: PROBABILISTIC

APPROXIMATE WINDOW


Thanks!

Efficient Data Stream Classification via Probabilistic Adaptive Windows

Technology