Top Banner
Adaptive XML Tree Mining on Evolving Data Streams Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Porto, 21 May 2009
39

Adaptive XML Tree Mining on Evolving Data Streams

May 08, 2015

Download

Technology

Albert Bifet

Talk about mining XML trees on evolving data streams.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Adaptive XML Tree Mining on Evolving Data Streams

Adaptive XML Tree Mining on Evolving Data Streams

Albert Bifet

Laboratory for Relational Algorithmics, Complexity and Learning LARCADepartament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

Porto, 21 May 2009

Page 2: Adaptive XML Tree Mining on Evolving Data Streams

Mining Evolving Massive Structured Data

The Disintegration of Persistenceof Memory 1952-54

Salvador Dalí

The basic problemFinding interesting structureon data

Mining massive data

Mining time varying data

Mining on real time

Mining XML data

2 / 30

Page 3: Adaptive XML Tree Mining on Evolving Data Streams

XML Tree Classification on evolving datastreams

D

D

B

C

A

C

D

B

C

B

D

B

C C

B

D

B

C

A

B

CLASS1 CLASS2 CLASS1 CLASS2

Figure: A dataset example

3 / 30

Page 4: Adaptive XML Tree Mining on Evolving Data Streams

Tree Pattern Mining

Trees are sanctuaries.Whoever knows how

to listen to them,can learn the truth.

Herman Hesse

Given a dataset of trees, find thecomplete set of frequent subtrees

Frequent Tree Pattern (FT):

Include all the trees whosesupport is no less than min_sup

Closed Frequent Tree Pattern(CT):

Include no tree which has asuper-tree with the samesupport

CT ⊆ FT

4 / 30

Page 5: Adaptive XML Tree Mining on Evolving Data Streams

Mining Closed Frequent Trees

Our trees are:

Labeled and Unlabeled

Ordered and Unordered

Our subtrees are:

Induced

Top-down

Two different ordered treesbut the same unordered tree

5 / 30

Page 6: Adaptive XML Tree Mining on Evolving Data Streams

A tale of two trees

Consider D = {A,B}, where

A:

B:

and let min_sup = 2.

Frequent subtreesBA

6 / 30

Page 7: Adaptive XML Tree Mining on Evolving Data Streams

A tale of two trees

Consider D = {A,B}, where

A:

B:

and let min_sup = 2.

Closed subtreesBA

6 / 30

Page 8: Adaptive XML Tree Mining on Evolving Data Streams

XML Tree Classification on evolving datastreams

D

D

B

C

A

C

D

B

C

B

D

B

C C

B

D

B

C

A

B

CLASS1 CLASS2 CLASS1 CLASS2

Figure: A dataset example

7 / 30

Page 9: Adaptive XML Tree Mining on Evolving Data Streams

XML Tree Classification on evolving datastreams

Tree Trans.Closed Freq. not Closed Trees 1 2 3 4

c1

D

B

C C

B

C C 1 0 1 0

c2

D

B

C

A

B

C

A

C

A

A

1 0 0 1

8 / 30

Page 10: Adaptive XML Tree Mining on Evolving Data Streams

XML Tree Classification on evolving datastreamsFrequent Trees

c1 c2 c3 c4Id c1 f 1

1 c2 f 12 f 2

2 f 32 c3 f 1

3 c4 f 14 f 2

4 f 34 f 4

4 f 54

1 1 1 1 1 1 1 0 0 1 1 1 1 1 12 0 0 0 0 0 0 1 1 1 1 1 1 1 13 1 1 0 0 0 0 1 1 1 1 1 1 1 14 0 0 1 1 1 1 1 1 1 1 1 1 1 1

Closed MaximalTrees Trees

Id Tree c1 c2 c3 c4 c1 c2 c3 Class1 1 1 0 1 1 1 0 CLASS12 0 0 1 1 0 0 1 CLASS23 1 0 1 1 1 0 1 CLASS14 0 1 1 1 0 1 1 CLASS2

9 / 30

Page 11: Adaptive XML Tree Mining on Evolving Data Streams

XML Tree Framework on evolving datastreams

XML Tree Classification Framework Components

An XML closed frequent tree miner

A Data stream classifier algorithm, which we will feed with tuplesto be classified online.

10 / 30

Page 12: Adaptive XML Tree Mining on Evolving Data Streams

Mining Evolving Tree Data Streams

ProblemGiven a data stream D of rooted and unordered trees, findfrequent closed trees.

D

We provide three algorithms,of increasing power

Incremental

Sliding Window

Adaptive

11 / 30

Page 13: Adaptive XML Tree Mining on Evolving Data Streams

Mining Closed Unordered Subtrees

CLOSED_SUBTREES(t ,D ,min_sup,T )

123 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6789

10 return T

12 / 30

Page 14: Adaptive XML Tree Mining on Evolving Data Streams

Mining Closed Unordered Subtrees

CLOSED_SUBTREES(t ,D ,min_sup,T )

1 if not CANONICAL_REPRESENTATIVE(t)2 then return T3 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6789

10 return T

12 / 30

Page 15: Adaptive XML Tree Mining on Evolving Data Streams

Mining Closed Unordered Subtrees

CLOSED_SUBTREES(t ,D ,min_sup,T )

1 if not CANONICAL_REPRESENTATIVE(t)2 then return T3 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6 do if Support(t ′) = Support(t)7 then t is not closed8 if t is closed9 then insert t into T

10 return T

12 / 30

Page 16: Adaptive XML Tree Mining on Evolving Data Streams

ExampleD = {A,B}

min_sup = 2.

〈A〉= (0,1,2,3,2,1) 〈B〉= (0,1,2,3,1,2,2)

(0) (0,1)

(0,1,1)

(0,1,2)

(0,1,2,1)

(0,1,2,2)

(0,1,2,3)

(0,1,2,2,1)

(0,1,2,3,1)

13 / 30

Page 17: Adaptive XML Tree Mining on Evolving Data Streams

ExampleD = {A,B}

min_sup = 2.

〈A〉= (0,1,2,3,2,1) 〈B〉= (0,1,2,3,1,2,2)

(0) (0,1)

(0,1,1)

(0,1,2)

(0,1,2,1)

(0,1,2,2)

(0,1,2,3)

(0,1,2,2,1)

(0,1,2,3,1)

13 / 30

Page 18: Adaptive XML Tree Mining on Evolving Data Streams

Experimental results

TreeNat

Unlabeled Trees

Top-Down Subtrees

No Occurrences

CMTreeMiner

Labeled Trees

Induced Subtrees

Occurrences

14 / 30

Page 19: Adaptive XML Tree Mining on Evolving Data Streams

Closure Operator on Trees

D : the finite input dataset of trees

T : the (infinite) set of all trees

DefinitionWe define the following the Galois connection pair:

For finite A⊆D

σ(A) is the set of subtrees of the A trees in T

σ(A) = {t ∈T∣∣ ∀ t ′ ∈ A(t � t ′)}

For finite B ⊂T

τD (B) is the set of supertrees of the B trees in D

τD (B) = {t ′ ∈D∣∣ ∀ t ∈ B (t � t ′)}

Closure OperatorThe composition ΓD = σ ◦ τD is a closure operator.

15 / 30

Page 20: Adaptive XML Tree Mining on Evolving Data Streams

Galois Lattice of closed set of trees

1 2 3

12 13 23

123

16 / 30

Page 21: Adaptive XML Tree Mining on Evolving Data Streams

Galois Lattice of closed set of trees

D

B = { }

1 2 3

12 13 23

12317 / 30

Page 22: Adaptive XML Tree Mining on Evolving Data Streams

Galois Lattice of closed set of trees

B = { }

τD(B) = { , }

1 2 3

12 13 23

12317 / 30

Page 23: Adaptive XML Tree Mining on Evolving Data Streams

Galois Lattice of closed set of trees

B = { }

τD(B) = { , }

ΓD(B) = σ ◦τD(B) = { and its subtrees }

1 2 3

12 13 23

12317 / 30

Page 24: Adaptive XML Tree Mining on Evolving Data Streams

Algorithms

Algorithms

Incremental: INCTREENAT

Sliding Window: WINTREENAT

Adaptive: ADATREENAT Uses ADWIN to monitor change

ADWIN

An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.

ADWIN has rigorous guarantees (theorems)

On ratio of false positives and false negatives

On the relation of the size of the current window and changerates

18 / 30

Page 25: Adaptive XML Tree Mining on Evolving Data Streams

Experimental Validation: TN1

INCTREENAT

CMTreeMiner

Time(sec.)

Size (Milions)2 4 6 8

100

200

300

Figure: Experiments on ordered trees with TN1 dataset

19 / 30

Page 26: Adaptive XML Tree Mining on Evolving Data Streams

What is MOA?

{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.

It is closely related to WEKA

It includes a collection of offline and online as well as tools forevaluation:

boosting and baggingHoeffding Trees

with and without Naïve Bayes classifiers at the leaves.

20 / 30

Page 27: Adaptive XML Tree Mining on Evolving Data Streams

WEKA: the bird

21 / 30

Page 28: Adaptive XML Tree Mining on Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

22 / 30

Page 29: Adaptive XML Tree Mining on Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

22 / 30

Page 30: Adaptive XML Tree Mining on Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

22 / 30

Page 31: Adaptive XML Tree Mining on Evolving Data Streams

Data stream classification cycle

1 Process an example at atime, and inspect it onlyonce (at most)

2 Use a limited amount ofmemory

3 Work in a limited amountof time

4 Be ready to predict at anypoint

23 / 30

Page 32: Adaptive XML Tree Mining on Evolving Data Streams

Environments and Data Sources

Environments

Sensor Network: 100Kb

Handheld Computer: 32 Mb

Server: 400 Mb

Data Sources

Random Tree Generator

Random RBF Generator

LED Generator

Waveform Generator

Function Generator

24 / 30

Page 33: Adaptive XML Tree Mining on Evolving Data Streams

Algorithms

Naive Bayes

Decision stumps

Hoeffding Tree

Hoeffding Option Tree

Bagging and Boosting

Prediction strategies

Majority class

Naive Bayes Leaves

Adaptive Hybrid

25 / 30

Page 34: Adaptive XML Tree Mining on Evolving Data Streams

Hoeffding Option Tree

Hoeffding Option TreesRegular Hoeffding tree containing additional option nodes thatallow several tests to be applied, leading to multiple Hoeffdingtrees as separate paths.

26 / 30

Page 35: Adaptive XML Tree Mining on Evolving Data Streams

GUIjava -cp .:moa.jar:weka.jar-javaagent:sizeofag.jar moa.gui.TaskLauncher

27 / 30

Page 36: Adaptive XML Tree Mining on Evolving Data Streams

GUIjava -cp .:moa.jar:weka.jar-javaagent:sizeofag.jar moa.gui.TaskLauncher

27 / 30

Page 37: Adaptive XML Tree Mining on Evolving Data Streams

Ensemble Methods

http://www.cs.waikato.ac.nz/∼abifet/MOA/

New ensemble methods:

ADWIN bagging: When a change is detected, the worst classifieris removed and a new classifier is added.

Adaptive-Size Hoeffding Tree bagging

28 / 30

Page 38: Adaptive XML Tree Mining on Evolving Data Streams

XML Tree Framework on evolving datastreams

Maximal Closed

# Trees Att. Acc. Mem. Att. Acc. Mem.

CSLOG12 15483 84 79.64 1.2 228 78.12 2.54CSLOG23 15037 88 79.81 1.21 243 78.77 2.75CSLOG31 15702 86 79.94 1.25 243 77.60 2.73CSLOG123 23111 84 80.02 1.7 228 78.91 4.18

Table: BAGGING on unordered trees.

29 / 30

Page 39: Adaptive XML Tree Mining on Evolving Data Streams

Conclusions

XML tree stream classifier system.

Using Galois Latice Theory, we present methods for miningclosed trees

IncrementalSliding WindowAdaptive: using ADWIN to monitor change

We use MOA data stream classifiers.

30 / 30