New ensemble methods for evolving data streams A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà Laboratory for Relational Algorithmics, Complexity and Learning LARCA UPC-Barcelona Tech, Catalonia University of Waikato Hamilton, New Zealand Paris, 29 June 2009 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2009
Talk about MOA, its extension to evolving data streams, bagging using ADWIN, and bagging using trees of different size.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
New ensemble methods for evolving data streams
A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà
Laboratory for Relational Algorithmics, Complexity and Learning LARCAUPC-Barcelona Tech, Catalonia
University of WaikatoHamilton, New Zealand
Paris, 29 June 200915th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining 2009
New Ensemble Methods For Evolving Data Streams
Outlinea new experimental data stream framework for studyingconcept drifttwo new variants of Bagging:
ADWIN BaggingAdaptive-Size Hoeffding Tree (ASHT) Bagging.
an evaluation study on synthetic and real-world datasets
2 / 25
Outline
1 MOA: Massive Online Analysis
2 Concept Drift Framework
3 New Ensemble Methods
4 Empirical evaluation
3 / 25
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.
It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:
boosting and baggingHoeffding Trees
with and without Naïve Bayes classifiers at the leaves.
4 / 25
WEKA
Waikato Environment for Knowledge AnalysisCollection of state-of-the-art machine learning algorithmsand data processing tools implemented in Java
Released under the GPLSupport for the whole process of experimental data mining
Preparation of input dataStatistical evaluation of learning schemesVisualization of input data and the result of learning
Used for education, research and applicationsComplements “Data Mining” by Witten & Frank
5 / 25
WEKA: the bird
6 / 25
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
7 / 25
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
7 / 25
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
7 / 25
Data stream classification cycle
1 Process an example at a time,and inspect it only once (atmost)