Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou Université de Rennes 1 INRIA Rennes - Bretagne Atlantique
Mar 26, 2015
Towards Data Mining Without Information on Knowledge
Structure
Wednesday, September 19th 2007
Alexandre Vautier, Marie-Odile Cordier and René Quiniou
Université de Rennes 1INRIA Rennes - Bretagne Atlantique
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 2
Usual KD Process
User needs:• A data mining task• Domain knowledge
Data
Selection
Preprocessing
Transformation
Data Mining
Interpretation/Evaluation
Models
TransformedDataPreprocessed
DataTarget Data
Knowledge
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 3
Usual KD Process
User needs:• A data mining task• Domain knowledge
Data
Selection
Preprocessing
Transformation
Data Mining
Interpretation/Evaluation
Models
TransformedDataPreprocessed
DataTarget Data
Knowledge
What can a user extract from data without domain knowledge ?
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 4
• Represent network alarms• Understand network behavior• Detect new DDoS attacks
• An alarm is composed of– A directed link between two IP addresses– A date– A severity (low,med,high) (related to the link rate)
Application context Network Alarms
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 5
• Represent network alarms• Understand network behavior• Detect new DDoS attacks
• An alarm is composed of– A directed link between two IP addresses– A date– A severity (low,med,high) (related to the link rate)
Application context Network Alarms
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 6
Application context Network Alarms
Data Mining Algorithms
Alarms
Models
Generalized links:M1 = {192.168.2.1 ! *, * ! 192.168.2.5,…}
SequencesM2 = {1.5.5.* ! 2.2.3.* > 2.2.3.* ! 1.2.3.4 ,…}
Clustering on date and severityM3 = {{ 11/01/05…11/03/05, low}, { 11/07/05…11/15/05, high}}
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 7
Objectives
• Goal : search models that fit the given data
– Current assumption: the user has sufficient knowledge to • define the type of model
• choose the relevant DM algorithm
– Our proposition: alleviate the current assumption by
• executing automatically DM algorithms to extract models from data
• evaluating the resulting models in a generic manner to propose to the user the “best suited” model(s)
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 8
Framework
DM algorithm specifications‚Data SpecificationƒUnification of specifications
„Model extraction…Generic evaluation†Model ranking
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 9
Schemas for specification
• Enhanced algebraic specifications (Types, operations and equations)
• Category theory [Mac Lane 1942]– Sketch [Ehresmann 1965]
• Use specification inheritance
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 10
Data specificationNetwork Alarm Schema
• Node: a type
• Edge: – A function– A relation
• Green dotted edge: projection) Cartesian product
• Red dashed edge:inclusion) union
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 11
Data specificationNetwork Alarm Schema
• Node: a type
• Edge: – A function– A relation
• Green dotted edge: projection) Cartesian product
• Red dashed edge:inclusion) union
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 12
DM Algorithm specification Generalized edges
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 13
DM Algorithm specification Generalized edges
Covering relation
Model type
DM algorithm
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 14
?
Schema unification
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 15
?
Schema unification
Abstract Data Type
Data Type
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 16
?
Unification of Schema
Abstract Data Type
Data Type
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 17
Framework
DM algorithm specifications‚Data SpecificationƒUnification of specifications
„Model extraction…Generic evaluation†Model ranking
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 18
Generic evaluation
• Compare different kinds of model
• Inspired by Kolmogorov complexity
The complexity of an object x is the size s(p) of the shortest program p that outputs x executed on a universal machine f
Cf(x) = min { s(p) | f(p) = x }
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 19
Generic evaluation
• Complexity of data d in a schema S relatively to a model m (c: M $ D) :
complexity ofK(d,m,S) =
k(M) the model structure+ k(D) the data structure+ k(c) the covering relation+ k(m|M) the model + k(d|m,c,D) the data knowing …
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 20
Path Indexing Covering Relation Decomposition
m
c: M $ D
M D
c(m)
d
k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D)
Null Decomposition
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 21
Path Indexing Covering Relation Decomposition
m
c: M $ D
M D
c(m)
d
k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D)
m
t: M $ A
M A D
ds: A $ D
c = s ± t: M $ D
Null Decomposition
Decomposition relying onrelation composition
t(m)c(m) = s ± t(m)
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 22
Path Indexing Covering Relation Decomposition
m
c: M $ D
M D
c(m)
d
k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D)
m
t: M $ A
M A D
ds: A $ D
c = s ± t: M $ D
Null Decomposition
Decomposition relying onrelation composition
t(m)c(m) = s ± t(m)
k(d|m, s ± t ,D) = k(a|t(m)) + k(d|s(a)) + k(d\s(a)|D)
a
s(a)
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 23
Experiments
• Extraction of clusters, generalized edges, and sequences – Dataset: 10.000 alarms– Duration: 400 seconds (without DM algorithm
duration)– 6 operational algorithms
• Experiments on datasets generated by models
• Network alarm from real network
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 24
Discussions
• Unification :– Exponential in time with respect to the number of
nodes in a schema
• Generic evaluation – Linear in time and space
• Adapt the evaluation method– User defined– According to a model visualization– According to local data instead of global data
Vautier et al . – Towards Data Mining Without Information on Knowledge Structure 25
What do schemas bring to Data Mining ?
• Describe data and DM algorithms with a common language
• Allow to unify data structure with DM algorithms input
• Provide a way to compute the model complexity relatively to a type in a schema
• Provide a way to compute the data complexity relatively to – A model– A covering relation and its decomposition
• Are implementable in an efficient manner
Towards Data Mining Without Information on Knowledge
Structure
Thank you !
Alexandre Vautier, Marie-Odile Cordier and René Quiniou
INRIA Rennes - Bretagne AtlantiqueUniversité de Rennes 1