Introduction Three classifiers Experiments and results Summary Similarity Features, and their Role in Concept Alignment Learning Shenghui Wang 1 Gwenn Englebienne 2 Christophe Gu´ eret 1 Stefan Schlobach 1 Antoine Isaac 1 Martijn Schut 1 1 Vrije Universiteit Amsterdam 2 Universiteit van Amsterdam SEMAPRO 2010 Florence
33
Embed
Similarity Features, and their Role in Concept Alignment Learning
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction Three classifiers Experiments and results Summary
Similarity Features, and their Role in ConceptAlignment Learning
Introduction Three classifiers Experiments and results Summary
Outline
1 IntroductionClassification of concept mappings based on instancesimilarity
2 Three classifiersMarkov Random FieldMulti-objective Evolution StrategySupport Vector Machine
3 Experiments and results
4 Summary
Introduction Three classifiers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of conceptsHuge collections (e.g., National Library of the Neterlands:80km of books in one collection)Heterogeneous (e.g., books, manuscripts, illustrations, etc.)Multi-lingual problem
Solving matching problems is one step to the solution of theinteroperability problem.
e.g., “plankzeilen” vs. “surfsport”e.g., “archeology” vs. “excavation”
Introduction Three classifiers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of conceptsHuge collections (e.g., National Library of the Neterlands:80km of books in one collection)Heterogeneous (e.g., books, manuscripts, illustrations, etc.)Multi-lingual problem
Solving matching problems is one step to the solution of theinteroperability problem.
e.g., “plankzeilen” vs. “surfsport”e.g., “archeology” vs. “excavation”
Introduction Three classifiers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of conceptsHuge collections (e.g., National Library of the Neterlands:80km of books in one collection)Heterogeneous (e.g., books, manuscripts, illustrations, etc.)Multi-lingual problem
Solving matching problems is one step to the solution of theinteroperability problem.
e.g., “plankzeilen” vs. “surfsport”e.g., “archeology” vs. “excavation”
Introduction Three classifiers Experiments and results Summary
Automatic alignment techniques
Lexical
labels and textual information of entities
Structural
structure of the formal definitions of entities, position in thehierarchy
Extensional
statistical information of instances, i.e., objects indexed withentities
Background knowledge
using a shared conceptual reference to find links indirectly
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: common instance based
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: common instance based
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: common instance based
Introduction Three classifiers Experiments and results Summary
Pros and cons
Advantages
Simple to implementInteresting results
Disadvantages
Requires sufficient amounts of common instancesOnly uses part of the available information
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: Instance similarity based
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: Instance similarity based
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: Instance similarity based
Introduction Three classifiers Experiments and results Summary
Representing concepts and the similarity between them
Instance features Concept features Pair features
Cos. dist.
Bag of words
Bag of words
Bag of words
Bag of words
Bag of words
Bag of words
Creator
Title
Publisher
...
Creator
Title
Publisher
...
Creator
Title
Publisher
...
...
f1
f2
f3
Con
cept
1C
on
cept
2
{{ { {
{Creator
Title
Publisher
...
Creator
Title
Publisher
...
CreatorTerm 1: 4Term 2: 1Term 3: 0...
TitleTerm 1: 0Term 2: 3Term 3: 0...
PublisherTerm 1: 2Term 2: 1Term 3: 3...
CreatorTerm 1: 2Term 2: 0Term 3: 0...
TitleTerm 1: 0Term 2: 4Term 3: 1...
PublisherTerm 1: 4Term 2: 1Term 3: 1...
Cos. dist.
Cos. dist.
Introduction Three classifiers Experiments and results Summary
Classification of concept mappings based on instance similarity
Classification based on instance similarity
Each pair of concepts is treated as a point in a “similarityspace”
Its position is defined by the features of the pair.The features of the pair are the different measures of similaritybetween the concepts’ instances.
Hypothesis: the label of a point — which represents whetherthe pair is a positive mapping or negative one — is correlatedwith the position of this point in this space.
With already labelled points and the actual similarity values ofconcepts involved, it is possible to classify a point, i.e., to giveit a right label, based on its location given by the actualsimilarity values.
Introduction Three classifiers Experiments and results Summary
Classification of concept mappings based on instance similarity
Research questions
How do different classifiers perform on this instance-basedmapping task?
What are the benefits of using a machine learning algorithmto determine the importance of features?
Are there regularities wrt. the relative importance given tospecific features for similarity computation? Are these weightsrelated to application data characteristics?
Introduction Three classifiers Experiments and results Summary
Three classifiers used
Markov Random Field (MRF)
Evolutionary Strategy (ES)
Support Vector Machine (SVM)
Introduction Three classifiers Experiments and results Summary
Markov Random Field
Markov Random Field
Let T = { (x(i), y (i)) }Ni=1 be the training set
x(i) ∈ RK , the featuresy (i) ∈ Y = {positive, negative}, the label
The conditional probability of a label given the input ismodelled as
p(y (i)|xi , θ) =1
Z (xi , θ)exp
( K∑j=1
λjφj(y(i), x(i))
), (1)
where θ = {λj }Kj=1 are the weights associated to the featurefunctions φ and Z (xi , θ) is a normalisation constant
Introduction Three classifiers Experiments and results Summary
Markov Random Field
The classifier used: Markov Random Field (cont’)
The likelihood of the data set for given model parametersp(T |θ) is given by:
p(T |θ) =N∏
i=1
p(y (i)|x(i)) (2)
During learning, our objective is to find the most likely valuesfor θ for the given training data.
The decision criterion for assigning a label y (i) to a new pairof concepts i is then simply given by:
y (i) = argmaxy
p(y |x(i)) (3)
Introduction Three classifiers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy
Evolutionary strategies (ES) have two characteristic properties:firstly, they are used for continuous value optimisation, and,secondly, they are self-adaptive.
An ES individual is a direct model of the searched solution,defined by Λ and some evolution strategy parameters:
〈Λ,Σ〉 ↔ 〈λ1, . . . , λK , σ1, . . . , σK 〉 (4)
The fitness function is related to the decision criterion for theES, which is sign-based:
LESi =
{1 if
∑Kj=1 λiFij > 0
0 otherwise(5)
Introduction Three classifiers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy (cont’)
Maximising the number of positive results and negative resultsare two opposite goals.
f1(Λ | F , L) = #{Fi |K∑
j=1
λiFij > 0 ∧ Li = 1} (6)
f2(Λ | F , L) = #{Fi |K∑
j=1
λiFij ≤ 0 ∧ Li = 0} (7)
Introduction Three classifiers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy (cont’)
Evolution process
Recombination: Two parent individuals are combined usingdifferent weighting, producing two new individualsMutation: One parent individual changes itself into a new childindividualSurvivor selection: NSGA-II
Introduction Three classifiers Experiments and results Summary
Support Vector Machine
Support Vector Machine
Support Vector Machine (SVM) is used as a maximum marginclassifier whose task consists in finding an hyperplane separatingthe two classes.
The objective is to maximise the margin separating the twoclasses whilst minimizing classification error risk.
Introduction Three classifiers Experiments and results Summary
Introduction Three classifiers Experiments and results Summary
Quality of learning
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
MRF 1-30
MRF 3-30
ES
SVM
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
Figure: Precision, recall and F-Measure for mappings with a positivelabel (top) and a negative label (bottom). Error bars indicate onestandard deviation over the 10 folds of cross-validation.
Introduction Three classifiers Experiments and results Summary
Relative importance of features
Which features of our instances are important for mapping?
Figure: Mutual information between features and labels
Introduction Three classifiers Experiments and results Summary
Relative importance of features
ES lambdas are not really conclusive
ES lambdas that are most inconclusive correspond to the leastinformative features
Introduction Three classifiers Experiments and results Summary
Relative importance of features
Important features in terms of mutual information areassociated to large MRF weights
Introduction Three classifiers Experiments and results Summary
A more detailed analysis
Expected important features:
Label similarity (1), instance overlap (2), subject (28), etc.
Expected unimportant features:
Size of the book (16), format description (17) and language(22), etc.
Surprisingly important features:
Date (14)
Surprisingly unimportant features:
Description (15) and abstract (8)
Introduction Three classifiers Experiments and results Summary
A more detailed analysis
Expected important features:
Label similarity (1), instance overlap (2), subject (28), etc.
Expected unimportant features:
Size of the book (16), format description (17) and language(22), etc.
Surprisingly important features:
Date (14)
Surprisingly unimportant features:
Description (15) and abstract (8)
Introduction Three classifiers Experiments and results Summary
Summary
We tried three machine learning classifiers on theinstance-based mapping task, among which MRF and ES canautomatically identify meaningful features.
The MRF and the ES, result in a performance in theneighbourhood of 90%, showing the validity of the approach.
Our analysis suggests that when many different descriptionfeatures interact, there is no systematic correlation betweenwhat a learning method could find and what an applicationexpert may anticipate.
Introduction Three classifiers Experiments and results Summary