Top Banner
Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou Université de Rennes 1 INRIA Rennes - Bretagne Atlantique
26

Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Mar 26, 2015

Download

Documents

Madison Blevins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Towards Data Mining Without Information on Knowledge

Structure

Wednesday, September 19th 2007

Alexandre Vautier, Marie-Odile Cordier and René Quiniou

Université de Rennes 1INRIA Rennes - Bretagne Atlantique

Page 2: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 2

Usual KD Process

User needs:• A data mining task• Domain knowledge

Data

Selection

Preprocessing

Transformation

Data Mining

Interpretation/Evaluation

Models

TransformedDataPreprocessed

DataTarget Data

Knowledge

Page 3: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 3

Usual KD Process

User needs:• A data mining task• Domain knowledge

Data

Selection

Preprocessing

Transformation

Data Mining

Interpretation/Evaluation

Models

TransformedDataPreprocessed

DataTarget Data

Knowledge

What can a user extract from data without domain knowledge ?

Page 4: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 4

• Represent network alarms• Understand network behavior• Detect new DDoS attacks

• An alarm is composed of– A directed link between two IP addresses– A date– A severity (low,med,high) (related to the link rate)

Application context Network Alarms

Page 5: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 5

• Represent network alarms• Understand network behavior• Detect new DDoS attacks

• An alarm is composed of– A directed link between two IP addresses– A date– A severity (low,med,high) (related to the link rate)

Application context Network Alarms

Page 6: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 6

Application context Network Alarms

Data Mining Algorithms

Alarms

Models

Generalized links:M1 = {192.168.2.1 ! *, * ! 192.168.2.5,…}

SequencesM2 = {1.5.5.* ! 2.2.3.* > 2.2.3.* ! 1.2.3.4 ,…}

Clustering on date and severityM3 = {{ 11/01/05…11/03/05, low}, { 11/07/05…11/15/05, high}}

Page 7: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 7

Objectives

• Goal : search models that fit the given data

– Current assumption: the user has sufficient knowledge to • define the type of model

• choose the relevant DM algorithm

– Our proposition: alleviate the current assumption by

• executing automatically DM algorithms to extract models from data

• evaluating the resulting models in a generic manner to propose to the user the “best suited” model(s)

Page 8: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 8

Framework

DM algorithm specifications‚Data SpecificationƒUnification of specifications

„Model extraction…Generic evaluation†Model ranking

Page 9: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 9

Schemas for specification

• Enhanced algebraic specifications (Types, operations and equations)

• Category theory [Mac Lane 1942]– Sketch [Ehresmann 1965]

• Use specification inheritance

Page 10: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 10

Data specificationNetwork Alarm Schema

• Node: a type

• Edge: – A function– A relation

• Green dotted edge: projection) Cartesian product

• Red dashed edge:inclusion) union

Page 11: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 11

Data specificationNetwork Alarm Schema

• Node: a type

• Edge: – A function– A relation

• Green dotted edge: projection) Cartesian product

• Red dashed edge:inclusion) union

Page 12: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 12

DM Algorithm specification Generalized edges

Page 13: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 13

DM Algorithm specification Generalized edges

Covering relation

Model type

DM algorithm

Page 14: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 14

?

Schema unification

Page 15: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 15

?

Schema unification

Abstract Data Type

Data Type

Page 16: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 16

?

Unification of Schema

Abstract Data Type

Data Type

Page 17: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 17

Framework

DM algorithm specifications‚Data SpecificationƒUnification of specifications

„Model extraction…Generic evaluation†Model ranking

Page 18: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 18

Generic evaluation

• Compare different kinds of model

• Inspired by Kolmogorov complexity

The complexity of an object x is the size s(p) of the shortest program p that outputs x executed on a universal machine f

Cf(x) = min { s(p) | f(p) = x }

Page 19: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 19

Generic evaluation

• Complexity of data d in a schema S relatively to a model m (c: M $ D) :

complexity ofK(d,m,S) =

k(M) the model structure+ k(D) the data structure+ k(c) the covering relation+ k(m|M) the model + k(d|m,c,D) the data knowing …

Page 20: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 20

Path Indexing Covering Relation Decomposition

m

c: M $ D

M D

c(m)

d

k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D)

Null Decomposition

Page 21: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 21

Path Indexing Covering Relation Decomposition

m

c: M $ D

M D

c(m)

d

k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D)

m

t: M $ A

M A D

ds: A $ D

c = s ± t: M $ D

Null Decomposition

Decomposition relying onrelation composition

t(m)c(m) = s ± t(m)

Page 22: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 22

Path Indexing Covering Relation Decomposition

m

c: M $ D

M D

c(m)

d

k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D)

m

t: M $ A

M A D

ds: A $ D

c = s ± t: M $ D

Null Decomposition

Decomposition relying onrelation composition

t(m)c(m) = s ± t(m)

k(d|m, s ± t ,D) = k(a|t(m)) + k(d|s(a)) + k(d\s(a)|D)

a

s(a)

Page 23: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 23

Experiments

• Extraction of clusters, generalized edges, and sequences – Dataset: 10.000 alarms– Duration: 400 seconds (without DM algorithm

duration)– 6 operational algorithms

• Experiments on datasets generated by models

• Network alarm from real network

Page 24: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 24

Discussions

• Unification :– Exponential in time with respect to the number of

nodes in a schema

• Generic evaluation – Linear in time and space

• Adapt the evaluation method– User defined– According to a model visualization– According to local data instead of global data

Page 25: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 25

What do schemas bring to Data Mining ?

• Describe data and DM algorithms with a common language

• Allow to unify data structure with DM algorithms input

• Provide a way to compute the model complexity relatively to a type in a schema

• Provide a way to compute the data complexity relatively to – A model– A covering relation and its decomposition

• Are implementable in an efficient manner

Page 26: Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Towards Data Mining Without Information on Knowledge

Structure

Thank you !

Alexandre Vautier, Marie-Odile Cordier and René Quiniou

INRIA Rennes - Bretagne AtlantiqueUniversité de Rennes 1