Top Banner
13

Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

Apr 28, 2018

Download

Documents

dinhcong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream
Page 2: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

Abstract

In this paper, we present the problem of online structured output prediction. We de-scribe the online data stream mining approach, how it differs from the batch miningapproach and outline difficulties of online mining. An orthogonal aspect of data min-ing is whether the target variable we are trying to predict is simple or structured. Inthe case of structured target variables we talk about structured output prediction prob-lem, which is commonly solved using predictive clustering trees. We describe severalissues that arise in online structured output prediction, i.e., evaluation, change detec-tion and resource complexity. In this paper, we present and focus on the structuredoutput prediction tasks of multi-label classification, multi-target regression and hierar-chical multi-label classification. We provide an overview of the current research in theareas of batch and online methods for multi-label classification, multi-target regressionand hierarchical multi-label regression, as well as some of the evaluation metrics usedin these cases. We conclude the paper with a discussion of directions of further workin the area of improving existing multi-target regression methods and how those can beapplied to the tasks of multi-target classification and multi-label classification, as well asadapting current batch hierarchical multi-label classification for the online setting. Weplan to implement these algorithms in several data stream mining framework, and applythem to the area of discrete time modeling of dynamical systems.

Keywords: data stream mining, multi-label classification, multi-target regression, hi-erarchical multi-label classification

2

Page 3: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

Contents

1 Introduction 41.1 Learning from Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Structured Output Prediction with Predictive Clustering Trees . . . . . . 51.3 Issues with Structured Output Prediction on Data Streams . . . . . . . . 6

2 Problem Statement 7

3 Related Work 73.1 Multi-label Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Multi-target Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Hierarchical multi-label classification . . . . . . . . . . . . . . . . . . . . . 9

4 Directions for Further Work 10

5 Conclusion 11

3

Page 4: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

1 Introduction

Nowadays, data is generated at ever increasing rates from sensor networks, measure-ments in network monitoring, manufacturing processes, call-detail records, email, socialnetworks, and other sources. More and more of this data is also becoming structured,e.g., into vectors, hierarchies, networks, etc. Naturally, real-time analysis of such data isbecoming a key area of data mining research, as applications demanding such processinggrow in number.

1.1 Learning from Data Streams

One of the aspects of data mining is how the data arrives and is processed. The maindistinction in this case is into batch mining, where the entire data for the construction ofthe predictive model is available from the beginning, and into online (or stream) mining,where the data consisting of instances arrives sequentially.

In the batch approach, we are given training data in the form of instances with thecorresponding values of the target variable(s) – discrete in case of classification andcontinuous in case of regression. The goal is to use these instances to infer a mappingfrom instances to the values of the target(s), i.e., to produce a predictive model. Thealgorithm that is tasked with building a model is not (overly) constrained with timeand memory, since the model need only be built once. Once the model is constructed itcan be used for predicting the values of potentially unlabeled instances. This types ofmodels are typically not adaptive, i.e., after they are constructed from the training datathey do not change when processing additional instances.

However, due to the ever-increasing amount of data which arrives fast, can potentiallychange and needs to be processed quickly, the online approach is becoming more andmore important. In this case, instances arrive sequentially which indicates an inherenttime component, and generally the value of the target variable(s) is available soon after.Because of this fast pace of arrival the algorithm must be able to process instancesquickly.

Since there is no specific training data, every instance that arrives can be used toupdate the model once the value of the target variable(s) becomes known, with the goalof improving its accuracy (according to some quality measure). This kind of sequences ofinstances, called data streams, are often found in, e.g., social networks, sensor networksor dynamical systems. It is clear that the dynamics in a social network can change overtime, which highlights another important aspect of data streams, i.e., that they are notnecessarily stationary. Their distribution can change in terms of either the distributionin the descriptive space or the functional dependencies that determine the values of thetarget variables. This is called concept drift.

Additionally, a data stream is not necessarily finite. While in practical applicationsthis is impossible, it is however possible that the data stream is arbitrarily large. Thisforces learning algorithms to prudently use the memory available when updating themodel. Instances cannot be kept in memory indefinitely, since the amount of memoryis limited. This means that instances are generally only processed once or stored for a

4

Page 5: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

short time, and are then discarded.The evaluation of online algorithms is also complicated. There are two main evalua-

tion procedures used for evaluating predictive performance of an online method, holdoutevaluation and prequential evaluation. To perform holdout evaluation, we periodicallyevaluate the method using a holdout set of the most recent instances. After the perfor-mance is evaluated on this set, the model is updated using the held back instances. Thisway we get performance evaluations for each holdout set considered. This is a naturalextension of the batch training set and testing set methodology.

In the case of prequential evaluation, each arriving instance is first used to evaluatethe performance of the method and is immediately there after used to update the model.This allows us to evaluate the performance of a method on a per-instance basis. Sincethe model is expected to become more accurate as it processes more and more instances,we use incremental fading performance measures, which emphasize the performance ofthe method on the most recent instances.

1.2 Structured Output Prediction with Predictive Clustering Trees

An orthogonal aspect of data mining is how the target variable is composed. If thetarget variable is a single discrete or continuous value, we talk about the single-targetprediction problem. However, in many cases we need to predict a more complex type oftarget variable. Some examples of structured classifications problems are listed below.

• Multi-target classification refers to the problem of predicting single-label values ofseveral, possibly multi-class, discrete variables.

• In the case of multi-label classification we are tasked with predicting the possiblymultiple labels of a discrete target variable from the input attributes of a datainstance. We can see this problem as predicting a yes-or-no value for each of thepossible labels of the target variable.

• In the related hierarchical multi-label classification problem, we are tasked withclassifying the instance into one or more hierarchically arranged labels, where alabel lower in the hierarchy automatically implies all of its predecessors. This isknown as the hierarchy constraint.

There are also structured regression problems. For example, similar to multi-targetclassification is the multi-target regression problem, where the task is to predict thevalues of multiple continuous target variables.

Additionally, we can construct a structured regression problem when considering thetask of discrete time modeling of dynamical systems. Given the a representation of thedynamical system at a given time composed of input variables u(t−1) and state variablesy(t− 1), we are looking for a function f so that y(t) = f(y(t− 1), u(t− 1)). In essencewe are trying to predict the state of the dynamical system at the point in time. Multiplehorizons regression for modeling dynamical systems is a task where we wish to predictthe value of the target variable(s) at different time points in the future, e.g., for points

5

Page 6: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

t, t + ∆1 and t + ∆2, we look for a function f such that [y(t), y(t + ∆1), y(t + ∆2)] =f(y(t− 1), u(t− 1)). This can be seen as a subproblem of multi-target regression.

These examples are only a small fraction of possible structured outputs and hint atthe complexity of the area of structured output prediction problems, as we can easilyconstruct more complex structured outputs by combining simpler structured outputs in,e.g., tuples or time series. One of the approaches to solving such problems is to constructa unique methods for specific structured outputs, which are able to utilize the problems’specific properties.

A more common paradigm used in structured output prediction are the predictiveclustering trees. A predictive clustering tree can be seen as a hierarchy of nested clusters,each of which is represented by a node. Predictive clustering trees are able to unifydifferent types of structured outputs by using a distance measure on the structuredoutput space and using it to evaluate possible tests to be induced in the tree. Thisprovides a unified approach to structured output prediction, as long as we are able todefine an appropriate distance measure on the output space. Kocev et al. [11] providea study of predictive clustering trees used for different structured output predictionproblems (in the batch case).

1.3 Issues with Structured Output Prediction on Data Streams

The structured output prediction problem coupled with the online mining approachnaturally presents many difficulties.

Evaluation. Both the structured output and the online aspect share the lack ofstraightforward evaluation. In the case of structured output prediction, evaluation hasto be specifically tailored to the problem at hand, and even then the evaluation is notsimple, e.g., in the case of multi-target regression we can define standard evaluationmeasures such as mean squared error per each of the target variable, however, thisonly allows us to compare the models on a per-target basis and overall. The differentevaluation methodologies in online learning further complicate this matter.

Change detection. Another related area of difficulty is the change detection in onlinestructured output prediction. Change detection and adaptation mechanisms that areused in the online mining approach are heavily reliant on well-conceived and consistentperformance measures. In the case of structured output prediction we must additionallyconsider the specific properties of the structured output problem. This indicates thatthe change detection and adaptation mechanisms should also be tailored to the specificstructured output prediction problem.

Complexity. Structured output problems are more complex than their single targetcounterparts, which means that the construction of predictive models requires more re-sources, i.e., time and memory, which are constrained in the online mining approach.Special care must me taken to ensure that the models do not waste memory on improve-ments which provide minimal impact, e.g., in the case of a predictive tree, we do notgrow the tree in the leaves where the recorded predictive performance is high.

6

Page 7: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

2 Problem Statement

We define the task of online structured output prediction as follows. We are given:

• An input (or description) space X that consists of tuples of discrete or continuousvalues, i.e., ∀Xi ∈ X,Xi = (xi1, x

i2, . . . , x

in), where n is the number of descriptive

attributes.

• An output (or target) space S, which is defined specifically for each type of struc-tured output, e.g., multi-label classification, hierarchical multi-label classification,multi-target regression, multiple horizons regression, etc. In the case of multi-target regression for example, the space S contains tuples of k continuous variables(similarly as X above).

Additionally we must be aware of several constraints of the online data mining aspect,outlined bellow:

• the examples arrive sequentially,

• the algorithm has no control over the order of arrival of the examples,

• there can potentially be infinitely many examples,

• the distribution of examples need not be stationary, and

• after an example is processed it is discarded or archived – we can not access itdirectly, unless we explicitly store it into memory (which is comparatively smallcomparing to the number of examples).

Given the spaces X, S and being aware of the constraints listed above, we must finda function f : X → S, which is able to produce accurate models that are able to adaptto changes in the distribution of examples.

3 Related Work

In this section, we give an overview of different methods available for different types ofstructured output problems. We focus on multi-label classification, multi-target regres-sion and hierarchical multi-label classification. We present online methods where theyare available, as well as general approaches that can be used with each specific structuredoutput problem and cover batch methods that can either inspire online methods or beadapted to the online approach. Additionally, some evaluation metrics will be discussedfor each type of structured output.

3.1 Multi-label Classification

Problem transformation methods. A common approach to structured output pre-diction is to transform the prediction problem into several (simple) subproblems. A

7

Page 8: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

base-line method of this sort, used both for batch and online mining, is the binary rele-vance method (BR). Binary relevance transforms the multi-label classification probleminto several binary classification problems, one for each of the possible labels. The binarymodels are then employed to learn and predict the relevance of each label. This method,however, overlooks potential correlations between labels.

An alternative approach is the label combination or label powerset method, where themulti-label problem is transformed into a single-label multi-class problem by treating alllabel combinations as single labels, i.e., each labelset becomes a single class-label withina single-label problem. This leads to worst-case computational complexity and tendencyto over-fit the (training) data. However, over-fitting has mostly been rectified by themethods proposed by Tsoumakas et al. [21] and Read et al. [19]. Label combination is anapproach which is a cross between the binary relevance and the label powerset methods.Instead of only single labels or all possible combinations of labels, we construct predictivebinary models for combinations of labels, e.g., pairs of labels (pairwise classification).

These problem transformation methods are interesting mainly in terms of their flex-ibility and general applicability. As such, they are the default base-line methods usedin the literature. An extensive experimental study of these and other batch multi-labellearners was done by Madjarov et al. [12].

Online methods. There are several methods that address the task of multi-labelclassification in the online setting. A simple approach to the task of mining data streamsis the batch-incremental approach, i.e., we train a batch classifier on batches of newexamples that replace classifiers built from earlier batches. The first method for onlinemulti-label classification was introduced in [23]. This is a batch-incremental methodthat uses batches to train meta-BR (MBR); a well known multi-label method, wherethe outputs of an initial BR are applied to a second BR [5]. It was however shown thatinstance-incremental methods, which update the predictive model on a per instancebasis, are advantageous for online mining.

A Hoeffding tree [4] is an incremental top-down tree induction algorithm, that usesthe Hoeffding bound, which allows the selection of the best splitting attribute from asmall sample. With the use of the Hoeffding bound it was also shown that the Hoeffd-ing trees are asymptotically nearly identical to non-incremental (batch) methods withinfinite examples in terms of predictive performance. Kirkby [10] introduced several im-provements to Hoeffding trees and also introduced the MOA (Massive Online Analysis)framework for online mining. Bifet et al. [2] deal with the problem of concept drift inthe case of Hoeffding trees through the use of the ADWIN change detection mechanisms,and applies the online versions of bagging and boosting introduced by Oza and Russel[15] to produce ensembles of Hoeffding trees. Read et al. [18] compare Hoeffding treeswith several base-line methods, namely BR, label combination and pairwise problemtransformation.

Evaluation. Several metrics for predictive evaluation are used in online multi-labelclassification. Some of these are the Hamming-loss, the F-measure, the 0/1 loss, theaccuracy and the log-loss metrics. A review of these metrics is provided by Read [17].

8

Page 9: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

3.2 Multi-target Regression

Online multi-target regression is significantly less popular that that of online multi-labelclassification. While there is an abundance of batch (multi-target) regression methods,there are very few methods for online regression.

FIMT-DD [8] is an incremental top down single-target tree induction algorithm similarto Hoeffding trees for classification. It also relies on the Hoeffding bound to select thebest splits from small samples of the data. It uses the variance reduction heuristic toestimate the feasibility of splits and is capable of detecting concept drift with the Page-Hinckley test as well as adapting to it. An extension of FIMT-DD, ORTO [9] introducesoption nodes into the tree structure. This increases predictive performance, as it reducesthe impact of the myopic tree induction methodology. Additionally, ensembles of FIMT-DD trees have also been proposed by Ikonomovska [6]. Out of these methods, ORTO isthe best in terms of the predictive performance.

FIMT-DD was extended to the task of multi-target regression by Ikonomovska [7],producing the FIMT-MT algorithm. If we look through the lens of predictive clustering,trees can bee seen as hierarchies of clusters, each of which is represented by a node inthe tree. The heuristic used in FIMT-MT, intra-cluster variance reduction, is commonin the predictive clustering framework. This heuristic can also be seen as a naturalextension of the variance-reduction heuristic used for single-target prediction in FIMT-DD. However, the task of detecting (and adapting to) concept drift can not be extendedto the multi-target case so easily and awaits further work.

Another single-target method for online regression is AMRules [1]. It is similar toFIMT-DD, as it also uses the Hoeffding bound, the variance reduction heuristic and thesame change detection mechanism.

IBL-DS [20] is an incremental-based learner similar to the k-nearest neighbors (kNN)method. It does not produce a model per se, but it stores a portion of the data, known asa case base. It incrementally updates the case base adding new examples and removingoutliers and noisy examples. It is also able to detect changes in the distribution ofexamples, by using a statistical z-test.

3.3 Hierarchical multi-label classification

While we are currently unaware of any online methods for hierarchical multi-label clas-sification, there are a few batch methods available. Vens et al. [22] introduces predictiveclustering decision trees for hierarchical multi-label classification (HMC) as well as sev-eral base-line methods related to binary relevance, described in the multi-label classifi-cation section. Namely, these base lines are single class (SC) and hierarchical single class(HSC), where single class models are hierarchically arranged. Both of these base-linemethods have the problem that they may not accurately capture the attributes that areimportant to the actual hierarchy, but only for the selected class. Additionally, follow-ing the hierarchy constraint is not always possible in SC. These methods are then alsoextended to the case when the tree-shaped hierarchy is replaced by a directed acyclicalgraph (DAG). It is shown that HMC performs better than the base-line methods in both

9

Page 10: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

the tree and the DAG case.Several metrics for evaluation of predictive performance are used in batch hierarchical

multi-label classification. The concepts of precision, recall and precision-recall curvescan all be adapted to the hierarchical case. The main metrics used by Vens et al. arethe area under the average PR curve and the average area under the PR curves.

4 Directions for Further Work

Initial directions for further work focus on the development of novel methods for thedescribed structured output types as well as corresponding change detection mechanisms.

Change detection for structured outputs. We predict that the most challengingaspect will be the development of change detection mechanisms for structured outputs,as well as methods for adaptation to detected change. While the structured outputs aredifferent, they contain similar elements, which we would like to exploit to potentiallyprovide a general framework for change detection in the structured output case.

Multi-target regression. For online multi-target regression we wish to introduceoptions into the online multi-target method FIMT-MT. We plan to investigate whetherthe improvements from the single-target case can be achieved in the multi-target caseas well. Additionally, we plan to use ensembles of multi-target trees, constructed withthe online bagging and online random forest methodology.

Multi-label classification and multi-target classification. In the context ofmulti-label classification we wish to employ the online multi-target methods and adaptthem to multi-label classifiers. This can be done for example with the regression forclassification methodology [16], where we create a continuous target variable for each ofthe possible labels of the classification problem. Instances which have a given class labelhave a 1 value of the corresponding continuous variable, while those which do not havea 0 value. Instances for which the predicted value of a given continuous variable is givena certain threshold, e.g., 0.5, are then classified as having the corresponding label.

The problem of multi-target classification can be approached similarly. For each dis-crete target variable we create a regression variable for each of the possible labels. Thevalue of these regression variables are determined as before, 1 if an instance is of thecorresponding class and 0 otherwise. Since the task is to always select one of the classes,we cannot use thresholding. Instead we classify the instance with the label, whose valueof the corresponding continuous variable is the highest among all of the continuousvariables associated with the given classification target variable.

Hierarchical multi-label classification. In the area of online hierarchical multi-label classification, we wish to adapt the currently available batch algorithms to theonline setting by using online predictive clustering trees. An additional aspect to beconsidered in hierarchical multi-label classification, is the possibility of changes in thehierarchy over time.

Hierarchical multi-target regression. A novel structured output problem we willconsider is hierarchical multi-target regression. The task is to predict the values ofseveral continuous variables which are hierarchically arranged. The values of variables

10

Page 11: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

higher in the hierarchy impact the values of their descendants and weighted more heavilyin the evaluation.

Applications. We plan to apply these methods on the task of discrete time modelingof dynamical systems, through the use of the external dynamics approach [14]. We willconsider both single-output and multi-output dynamical systems, where we will usesingle-target and multi-target regression methods, respectively. Additionally, we willexplore the multiple horizons regression task, which is naturally connected with the taskof modeling dynamical systems. Significant attention will also be devoted to the task ofchange detection in modeling dynamical systems.

Implementations. We wish to implement these methods into data stream miningplatforms such as MOA (Massive Online Analysis) [3] or SAMOA (Scalable AdvancedMassive Online Analysis) [13]. MOA is a framework for data stream mining, similarin design to WEKA, implemented in Java. SAMOA is a framework similar to MOA,also implemented in Java, which is specifically designed for distributed mining of datastreams and supports multi-target classification and regression.

The starting point is the implementation of algorithms for online single-target andmulti-target regression by Ikonomovska in the VFML (Very Fast Machine Learning)framework, written in C. Currently, some of the single-target algorithms have alreadybeen ported to MOA, namely, the (single-target) algorithms FIMT-DD and ORTO. Thiswill be upgraded to support multi-target regression in the SAMOA framework.

5 Conclusion

Structured output prediction from data streams is a very important and challengingresearch area, which is, with the exception of online multi-label classification, largelyunexplored. While there are some online multi-target regression methods, there are sev-eral innovations possible, e.g., introducing online option trees and ensembles to onlinetrees for multi-target regression. In the area of hierarchical multi-label classification,specifically, there are currently no online methods available to our knowledge. Whilethere exist batch methods for other types of structured outputs, such as time seriesprediction or even more complex structures (such as combinations of time series, hier-archies, continuous variables, etc.), these have currently not been explored in the onlinesetting.

There are several unique challenges in this area which are largely dependent on the typeof structured output we are considering. The most notable of these are the definition ofconsistent and informative evaluation measures for different types of structured outputs,as well as the closely connected aspect of detecting change within a stream of structuredoutputs.

References

[1] Ezilda Almeida, Carlos Ferreira, and Joao Gama. Adaptive model rules fromdata streams. In Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip

11

Page 12: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

Zelezny, editors, Machine Learning and Knowledge Discovery in Databases, vol-ume 8188 of Lecture Notes in Computer Science, pages 480–492. Springer BerlinHeidelberg, 2013.

[2] Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and RicardGavalda. New ensemble methods for evolving data streams. In Proceedings of the15th ACM SIGKDD International Conference on Knowledge Discovery and DataMining, KDD ’09, pages 139–148, New York, NY, USA, 2009. ACM.

[3] Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer,Timm Jansen, and Thomas Seidl. MOA: Massive online analysis, a frameworkfor stream classification and clustering. In Journal of Machine Learning Research(JMLR) Workshop and Conference Proceedings, Volume 11: Workshop on Appli-cations of Pattern Analysis, pages 44–50. Journal of Machine Learning Research,2010.

[4] Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedingsof the Sixth ACM SIGKDD International Conference on Knowledge Discovery andData Mining, KDD ’00, pages 71–80, New York, NY, USA, 2000. ACM.

[5] Shantanu Godbole and Sunita Sarawagi. Discriminative methods for multi-labeledclassification. In Honghua Dai, Ramakrishnan Srikant, and Chengqi Zhang, editors,PAKDD, volume 3056 of Lecture Notes in Computer Science, pages 22–30. Springer,2004.

[6] Elena Ikonomovska. Algorithms for Learning Regression Trees and Ensembles onEvolving Data Streams. PhD thesis, Jozef Stefan International Postgraduate School,Ljubljana, 2012.

[7] Elena Ikonomovska, J. Gama, and S. Dzeroski. Incremental multi-target modeltrees for data streams. In Proceedings of the 2011 ACM Symposium on AppliedComputing, pages 988–993. ACM, New York, 2011.

[8] Elena Ikonomovska, J. Gama, and S. Dzeroski. Learning model trees from evolvingdata streams. Data Mining and Knowledge Discovery, 23:128–168, 2011.

[9] Elena Ikonomovska, J. Gama, B. Zenko, and S. Dzeroski. Speeding-up hoeffding-based regression trees with options. In Proceedings of 28th International Conferenceon Machine Learning, pages 537–544. ACM, New York, 2011.

[10] Richard Kirkby. Improving Hoeffding Trees. PhD thesis, University of Wakiato,2007.

[11] Dragi Kocev, Celine Vens, Jan Struyf, and Saso Dzeroski. Tree ensembles forpredicting structured outputs. Pattern Recognition, 46(3):817 – 833, 2013.

[12] Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, and Saso Dzeroski. An extensiveexperimental comparison of methods for multi-label learning. Pattern Recognition,

12

Page 13: Abstract - kt.ijs.sikt.ijs.si/markodebeljak/Lectures/Seminar_MPS/2012_on/Seminars2013... · adapting current batch hierarchical multi-label classi cation for the online ... data stream

45(9):3084 – 3104, 2012. Best Papers of Iberian Conference on Pattern Recognitionand Image Analysis (IbPRIA’2011).

[13] Gianmarco De Francisci Morales. SAMOA: a platform for mining big data streams.In WWW (Companion Volume), pages 777–778. International World Wide WebConferences Steering Committee / ACM, 2013.

[14] O. Nelles. Nonlinear System Identification: From Classical Approaches to NeuralNetworks and Fuzzy Models. Springer, Berlin, 2001.

[15] Nikunj C. Oza and Stuart Russell. Online bagging and boosting. In In ArtificialIntelligence and Statistics 2001, pages 105–112. Morgan Kaufmann, 2001.

[16] Aleksandar Peckov. A Machine Learning Approach to Polynomial Regression. PhDthesis, Jozef Stefan International Postgraduate School, Ljubljana, 2012.

[17] Jesse Read. Scalable multi-label classification. PhD thesis, University of Wakiato,2010.

[18] Jesse Read, Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. Scalable andefficient multi-label classification for evolving data streams. Mach. Learn., 88(1-2):243–272, July 2012.

[19] Jesse Read, Bernhard Pfahringer, and Geoff Holmes. Multi-label classification usingensembles of pruned sets. In Data Mining, 2008. ICDM ’08. Eighth IEEE Interna-tional Conference on, pages 995–1000, Dec 2008.

[20] Ammar Shaker and Eyke Hullermeier. Instance-based classification and regressionon data streams. In Moamar Sayed-Mouchaweh and Edwin Lughofer, editors, Learn-ing in Non-Stationary Environments, pages 185–201. Springer New York, 2012.

[21] Grigorios Tsoumakas and Ioannis Vlahavas. Random k-labelsets: An ensemblemethod for multilabel classification. In Proceedings of the 18th European Conferenceon Machine Learning, ECML ’07, pages 406–417, Berlin, Heidelberg, 2007. Springer-Verlag.

[22] Celine Vens, Jan Struyf, Leander Schietgat, Saso Dzeroski, and Hendrik Block-eel. Decision trees for hierarchical multi-label classification. Machine Learning,73(2):185–214, 2008.

[23] Qu Wei, Zhang Yang, Zhu Junping, and Wang Yong. Mining multi-label concept-drifting data streams using ensemble classifiers. In Proceedings of the 6th In-ternational Conference on Fuzzy Systems and Knowledge Discovery - Volume 5,FSKD’09, pages 275–279, Piscataway, NJ, USA, 2009. IEEE Press.

13