Improving Adaptive Bagging Methods for Evolving Data Streams A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà University of Waikato Hamilton, New Zealand Laboratory for Relational Algorithmics, Complexity and Learning LARCA UPC-Barcelona Tech, Catalonia Nanjing, 4 November 2009 1st Asian Conference on Machine Learning (ACML’09)
42
Embed
Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Improving Adaptive Bagging Methodsfor Evolving Data Streams
A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà
University of WaikatoHamilton, New Zealand
Laboratory for Relational Algorithmics, Complexity and Learning LARCAUPC-Barcelona Tech, Catalonia
Nanjing, 4 November 20091st Asian Conference on Machine Learning (ACML’09)
Motivation
MOA Software for Mining Data StreamsBuild a useful software mining for massive data sets
Bagging for data streamsImprove accuracy on classification methods for datastreams
2 / 26
Data stream classification cycle
1 Process an example at a time,and inspect it only once (atmost)
2 Use a limited amount ofmemory
3 Work in a limited amount oftime
4 Be ready to predict at anypoint
3 / 26
Realtime analytics: from Databases to Dataflows
Data streamsData streams are ordered datasetsNot all datasets are data streamsAll dataset may be processed incrementally as a datastream
MOA: Massive Online AnalysisFaster Mining Software using less resources
Instant mining: more for less
4 / 26
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.
It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:
boosting and baggingHoeffding Trees
with and without Naïve Bayes classifiers at the leaves.
5 / 26
WEKA
Waikato Environment for Knowledge AnalysisCollection of state-of-the-art machine learning algorithmsand data processing tools implemented in Java
Released under the GPLSupport for the whole process of experimental data mining
Preparation of input dataStatistical evaluation of learning schemesVisualization of input data and the result of learning
Used for education, research and applicationsComplements “Data Mining” by Witten & Frank
6 / 26
WEKA: the bird
7 / 26
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
8 / 26
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
8 / 26
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
8 / 26
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
8 / 26
Concept Drift Framework
t
f (t) f (t)
α
α
t0W
0.5
1
DefinitionGiven two data streams a, b, we define c = a⊕W
t0 b as the datastream built joining the two data streams a and b
New Ensemble Methods For Evolving Streams (KDD’09)a new experimental data stream framework for studyingconcept drifttwo new variants of Bagging:
ADWIN BaggingAdaptive-Size Hoeffding Tree (ASHT) Bagging.
an evaluation study on synthetic and real-world datasets
10 / 26
Outline
1 Adaptive-Size Hoeffding Tree bagging
2 ADWIN Bagging
3 Empirical evaluation
11 / 26
Adaptive-Size Hoeffding Tree
T1 T2 T3 T4
Ensemble of trees of different sizeeach tree has a maximum sizeafter one node splits, it deletes some nodes to reduce itssize if the size of the tree is higher than the maximum value
12 / 26
Adaptive-Size Hoeffding Tree
T1 T2 T3 T4
Ensemble of trees of different sizesmaller trees adapt more quickly to changes,larger trees do better during periods with little changediversity
Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging(right) on dataset RandomRBF with drift, plotting 90 pairs ofclassifiers.
13 / 26
Improvement for ASHT Bagging Method
Improvement for ASHT Bagging ensemble methodBagging using trees of different size
add a change detector for each tree in the ensembleDDM: Gama et al.EDDM: Baena, del Campo, Fidalgo et al.
14 / 26
Outline
1 Adaptive-Size Hoeffding Tree bagging
2 ADWIN Bagging
3 Empirical evaluation
15 / 26
ADWIN Bagging
ADWIN
An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.
ADWIN has rigorous guarantees (theorems)On ratio of false positives and negativesOn the relation of the size of the current window andchange rates
ADWIN BaggingWhen a change is detected, the worst classifier is removed anda new classifier is added.
16 / 26
Optimal Change Detector and PredictorADWIN
High accuracyFast detection of changeLow false positives and false negatives ratiosLow computational cost: minimum space and time neededTheoretical guaranteesNo parameters neededEstimator with Memory and Change Detector
17 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 1
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 1 W1 = 01010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 10 W1 = 1010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 101 W1 = 010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 1010 W1 = 10110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 10101 W1 = 0110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 101010 W1 = 110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 1010101 W1 = 10111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111W0= 10101011 W1 = 0111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 101010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
Example
W= 01010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W
18 / 26
Algorithm ADaptive Sliding WINdowADWIN
TheoremAt every time step we have:
1 (False positive rate bound). If µt remains constant withinW, the probability that ADWIN shrinks the window at thisstep is at most δ .
2 (False negative rate bound). Suppose that for somepartition of W in two parts W0W1 (where W1 contains themost recent items) we have |µW0−µW1 |> 2εc . Then withprobability 1−δ ADWIN shrinks W to W1, or shorter.
ADWIN tunes itself to the data stream at hand, with no need forthe user to hardwire or precompute parameters.
19 / 26
Algorithm ADaptive Sliding WINdowADWIN
ADWIN using a Data Stream Sliding Window Model,can provide the exact counts of 1’s in O(1) time per point.tries O(logW ) cutpointsuses O(1
εlogW ) memory words
the processing time per example is O(logW ) (amortizedand worst-case).