Efficient Data Stream Classification via Probabilistic Adaptive Windows Albert Bifet 1 , Jesse Read 2 , Bernhard Pfahringer 3 , Geoff Holmes 3 1 Yahoo! Research Barcelona 2 Universidad Carlos III, Madrid, Spain 3 University of Waikato, Hamilton, New Zealand SAC 2013, 19 March 2013
23
Embed
Efficient Data Stream Classification via Probabilistic Adaptive Windows
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Efficient Data Stream Classification viaProbabilistic Adaptive Windows
Albert Bifet1, Jesse Read2,Bernhard Pfahringer3, Geoff Holmes3
1Yahoo! Research Barcelona2Universidad Carlos III, Madrid, Spain
3University of Waikato, Hamilton, New Zealand
SAC 2013, 19 March 2013
Data Streams
Big Data & Real Time
Data Streams
Data StreamsI Sequence is potentially infiniteI High amount of data: sublinear spaceI High speed of arrival: sublinear time per exampleI Once an element from a data stream has been processed
it is discarded or archived
Big Data & Real Time
Data Streams
Approximation algorithms
I Small error rate with high probabilityI An algorithm (ε, δ)−approximates F if it outputs F̃ for which
Pr[|F̃ − F | > εF ] < δ.
Big Data & Real Time
Data Stream Sliding Window
Sampling algorithms
I Giving equal weight to old and new examples: RESERVOIR
SAMPLING
I Giving more weight to recent examples: PROBABILISTIC
APPROXIMATE WINDOW
Big Data & Real Time
8 Bits Counter
1 0 1 0 1 0 1 0
What is the largest number we canstore in 8 bits?
8 Bits Counter
What is the largest number we canstore in 8 bits?
8 Bits Counter
0 20 40 60 80 1000
20
40
60
80
100
x
f (x) = log(1 + x)/ log(2)
f (0) = 0, f (1) = 1
8 Bits Counter
0 2 4 6 8 100
2
4
6
8
10
x
f (x) = log(1 + x)/ log(2)
f (0) = 0, f (1) = 1
8 Bits Counter
0 2 4 6 8 100
2
4
6
8
10
x
f (x) = log(1 + x/30)/ log(1 + 1/30)
f (0) = 0, f (1) = 1
8 Bits Counter
0 20 40 60 80 1000
20
40
60
80
100
x
f (x) = log(1 + x/30)/ log(1 + 1/30)
f (0) = 0, f (1) = 1
8 bits Counter
MORRIS APPROXIMATE COUNTING ALGORITHM
1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1
What is the largest number we canstore in 8 bits?
8 bits Counter
MORRIS APPROXIMATE COUNTING ALGORITHM
1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1
With p = 1/2 we can store 2× 256with standard deviation σ =
√n/2
8 bits Counter
MORRIS APPROXIMATE COUNTING ALGORITHM
1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1
With p = 2−c then E [2c] = n + 2 withvariance σ2 = n(n + 1)/2
8 bits Counter
MORRIS APPROXIMATE COUNTING ALGORITHM
1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1
If p = b−c then E [bc] = n(b − 1) + b,σ2 = (b − 1)n(n + 1)/2
PROBABILISTIC APPROXIMATE WINDOW
1 Init window w ← ∅2 for every instance i in the stream3 do store the new instance i in window w4 for every instance j in the window5 do rand = random number between 0 and 16 if rand > b−1
7 then remove instance j from window w
PAW maintains a sample of instancesin logarithmic memory, giving greater
weight to newer instances
Experiments: Methods
Abbr. Classifier Parameters
NB Naive BayesHT Hoeffding TreeHTLB Leveraging Bagging with HT n = 10kNN k Nearest Neighbour w = 1000, k = 10kNNW kNN with PAW w = 1000, k = 10kNNWA kNN with PAW+ADWIN w = 1000, k = 10kNNLB
W Leveraging Bagging with kNNW n = 10
The methods we consider. Leveraging Baggingmethods use n models. kNNWA empties its
window (of max w) when drift is detected (usingthe ADWIN drift detector).
Experimental Evaluation
Table : The window size for kNN and corresponding performance.