Similarity Features, and their Role in Concept Alignment Learning

Introduction Three classifiers Experiments and results Summary

Similarity Features, and their Role in ConceptAlignment Learning

Shenghui Wang1 Gwenn Englebienne2 Christophe Gueret 1

Stefan Schlobach1 Antoine Isaac1 Martijn Schut1

1 Vrije Universiteit Amsterdam

2 Universiteit van Amsterdam

SEMAPRO 2010Florence


Outline

1 IntroductionClassification of concept mappings based on instancesimilarity

2 Three classifiersMarkov Random FieldMulti-objective Evolution StrategySupport Vector Machine

3 Experiments and results

4 Summary


Thesaurus mapping

SemanTic Interoperability To access Cultural Heritage(STITCH) through mappings between thesauri

Scope of the problem:

Big thesauri with tens of thousands of conceptsHuge collections (e.g., National Library of the Neterlands:80km of books in one collection)Heterogeneous (e.g., books, manuscripts, illustrations, etc.)Multi-lingual problem

Solving matching problems is one step to the solution of theinteroperability problem.

e.g., “plankzeilen” vs. “surfsport”e.g., “archeology” vs. “excavation”


Thesaurus mapping







Thesaurus mapping







Automatic alignment techniques

Lexical

labels and textual information of entities

Structural

structure of the formal definitions of entities, position in thehierarchy

Extensional

statistical information of instances, i.e., objects indexed withentities

Background knowledge

using a shared conceptual reference to find links indirectly


Instance-based techniques: common instance based






Pros and cons

Advantages

Simple to implementInteresting results

Disadvantages

Requires sufficient amounts of common instancesOnly uses part of the available information


Instance-based techniques: Instance similarity based






Representing concepts and the similarity between them

Instance features Concept features Pair features

Cos. dist.

Bag of words

Bag of words

Bag of words

Bag of words

Bag of words

Bag of words

Creator

Title

Publisher

...

Creator

Title

Publisher

...

Creator

Title

Publisher

...

...

f1

f2

f3

Con

cept

1C

on

cept

2

{{ { {

{Creator

Title

Publisher

...

Creator

Title

Publisher

...

CreatorTerm 1: 4Term 2: 1Term 3: 0...

TitleTerm 1: 0Term 2: 3Term 3: 0...

PublisherTerm 1: 2Term 2: 1Term 3: 3...

CreatorTerm 1: 2Term 2: 0Term 3: 0...

TitleTerm 1: 0Term 2: 4Term 3: 1...

PublisherTerm 1: 4Term 2: 1Term 3: 1...

Cos. dist.

Cos. dist.


Classification of concept mappings based on instance similarity

Classification based on instance similarity

Each pair of concepts is treated as a point in a “similarityspace”

Its position is defined by the features of the pair.The features of the pair are the different measures of similaritybetween the concepts’ instances.

Hypothesis: the label of a point — which represents whetherthe pair is a positive mapping or negative one — is correlatedwith the position of this point in this space.

With already labelled points and the actual similarity values ofconcepts involved, it is possible to classify a point, i.e., to giveit a right label, based on its location given by the actualsimilarity values.


Classification of concept mappings based on instance similarity

Research questions

How do different classifiers perform on this instance-basedmapping task?

What are the benefits of using a machine learning algorithmto determine the importance of features?

Are there regularities wrt. the relative importance given tospecific features for similarity computation? Are these weightsrelated to application data characteristics?


Three classifiers used

Markov Random Field (MRF)

Evolutionary Strategy (ES)

Support Vector Machine (SVM)


Markov Random Field

Markov Random Field

Let T = { (x(i), y (i)) }Ni=1 be the training set

x(i) ∈ RK , the featuresy (i) ∈ Y = {positive, negative}, the label

The conditional probability of a label given the input ismodelled as

p(y (i)|xi , θ) =1

Z (xi , θ)exp

( K∑j=1

λjφj(y(i), x(i))

), (1)

where θ = {λj }Kj=1 are the weights associated to the featurefunctions φ and Z (xi , θ) is a normalisation constant


Markov Random Field

The classifier used: Markov Random Field (cont’)

The likelihood of the data set for given model parametersp(T |θ) is given by:

p(T |θ) =N∏

i=1

p(y (i)|x(i)) (2)

During learning, our objective is to find the most likely valuesfor θ for the given training data.

The decision criterion for assigning a label y (i) to a new pairof concepts i is then simply given by:

y (i) = argmaxy

p(y |x(i)) (3)


Multi-objective Evolution Strategy


Evolutionary strategies (ES) have two characteristic properties:firstly, they are used for continuous value optimisation, and,secondly, they are self-adaptive.

An ES individual is a direct model of the searched solution,defined by Λ and some evolution strategy parameters:

〈Λ,Σ〉 ↔ 〈λ1, . . . , λK , σ1, . . . , σK 〉 (4)

The fitness function is related to the decision criterion for theES, which is sign-based:

LESi =

{1 if

∑Kj=1 λiFij > 0

0 otherwise(5)



Multi-objective Evolution Strategy (cont’)

Maximising the number of positive results and negative resultsare two opposite goals.

f1(Λ | F , L) = #{Fi |K∑

j=1

λiFij > 0 ∧ Li = 1} (6)

f2(Λ | F , L) = #{Fi |K∑

j=1

λiFij ≤ 0 ∧ Li = 0} (7)



Multi-objective Evolution Strategy (cont’)

Evolution process

Recombination: Two parent individuals are combined usingdifferent weighting, producing two new individualsMutation: One parent individual changes itself into a new childindividualSurvivor selection: NSGA-II


Support Vector Machine

Support Vector Machine

Support Vector Machine (SVM) is used as a maximum marginclassifier whose task consists in finding an hyperplane separatingthe two classes.

The objective is to maximise the margin separating the twoclasses whilst minimizing classification error risk.


Experiments and results

Thesauri to match: GTT (35K) and Brinkman (5K)

Instances: 1 million books

GTT annotated books: 307KBrinkman annotated books: 490KDually annotated books: 222K


Feature slection for similarity calculation

λj Feature

1 Lexical2 Jaccard3 Date4 ISBN5 NBN6 PPN7 SelSleutel8 abstract9 alternative

10 annotation

λj Feature

11 author12 contributor13 creator14 dateCopyrighted15 description16 extent17 hasFormat18 hasPart19 identifier20 isVersionOf

λj Feature

21 issued22 language23 mods:edition24 publisher25 refNBN26 relation27 spatial28 subject29 temporal30 title

Table: List of the features


Quality of learning

0

0.2

0.4

0.6

0.8

1

Precision Recall F-measure

MRF 1-30

MRF 3-30

ES

SVM

0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


Figure: Precision, recall and F-Measure for mappings with a positivelabel (top) and a negative label (bottom). Error bars indicate onestandard deviation over the 10 folds of cross-validation.


Relative importance of features

Which features of our instances are important for mapping?

Figure: Mutual information between features and labels



ES lambdas are not really conclusive

ES lambdas that are most inconclusive correspond to the leastinformative features



Important features in terms of mutual information areassociated to large MRF weights


A more detailed analysis

Expected important features:

Label similarity (1), instance overlap (2), subject (28), etc.

Expected unimportant features:

Size of the book (16), format description (17) and language(22), etc.

Surprisingly important features:

Date (14)

Surprisingly unimportant features:

Description (15) and abstract (8)


A more detailed analysis

Expected important features:

Label similarity (1), instance overlap (2), subject (28), etc.

Expected unimportant features:

Size of the book (16), format description (17) and language(22), etc.

Surprisingly important features:

Date (14)

Surprisingly unimportant features:

Description (15) and abstract (8)


Summary

We tried three machine learning classifiers on theinstance-based mapping task, among which MRF and ES canautomatically identify meaningful features.

The MRF and the ES, result in a performance in theneighbourhood of 90%, showing the validity of the approach.

Our analysis suggests that when many different descriptionfeatures interact, there is no systematic correlation betweenwhat a learning method could find and what an applicationexpert may anticipate.


Thank you

Similarity Features, and their Role in Concept Alignment Learning

Education