Top Banner
Link Discovery Tutorial Part II: Accuracy Axel-Cyrille Ngonga Ngomo (1) , Irini Fundulaki (2) , Mohamed Ahmed Sherif (1) (1) Institute for Applied Informatics, Germany (2) FORTH, Greece October 18th, 2016 Kobe, Japan Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 1 / 54
81

Link Discovery Tutorial Part II: Accuracy

Jan 14, 2017

Download

Science

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Link Discovery Tutorial Part II: Accuracy

Link Discovery TutorialPart II: Accuracy

Axel-Cyrille Ngonga Ngomo(1), Irini Fundulaki(2), Mohamed Ahmed Sherif(1)

(1) Institute for Applied Informatics, Germany(2) FORTH, Greece

October 18th, 2016Kobe, Japan

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 1 / 54

Page 2: Link Discovery Tutorial Part II: Accuracy

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 2 / 54

Page 3: Link Discovery Tutorial Part II: Accuracy

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 3 / 54

Page 4: Link Discovery Tutorial Part II: Accuracy

IntroductionLink Discovery as Classification Task

Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54

Page 5: Link Discovery Tutorial Part II: Accuracy

IntroductionLink Discovery as Classification Task

Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54

Page 6: Link Discovery Tutorial Part II: Accuracy

IntroductionLink Discovery as Classification Task

Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54

Page 7: Link Discovery Tutorial Part II: Accuracy

IntroductionChallenge

Challenges1 Creation of labeled training data tedious2 Need automated means for automatic class and property matching3 Need for efficient execution of link specifications4 Dedicated machine learning approaches necessary

Solutions1 Use active learning approach for link discovery2 Rely on hospital/resident algorithm3 See previous section4 Topic of this section

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54

Page 8: Link Discovery Tutorial Part II: Accuracy

IntroductionChallenge

Challenges1 Creation of labeled training data tedious2 Need automated means for automatic class and property matching3 Need for efficient execution of link specifications4 Dedicated machine learning approaches necessary

Solutions1 Use active learning approach for link discovery2 Rely on hospital/resident algorithm3 See previous section4 Topic of this section

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54

Page 9: Link Discovery Tutorial Part II: Accuracy

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 6 / 54

Page 10: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn

AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54

Page 11: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn

AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54

Page 12: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn

AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54

Page 13: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54

Page 14: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54

Page 15: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 9 / 54

Page 16: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 10 / 54

Page 17: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 11 / 54

Page 18: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Class RestrictionsSimilarity function

String similarityNumber of shared property values amongst instances. . .

Solve corresponding hospital-resident problem

Source Target S TDrugbank Disesome Targets GenesSider Diseasome Side-Effect Diseases

DBpedia Dailymed Organization OrganizationSider Dailymed Drugs Offer

Drugbank DBpedia Targets ProteinProperty mapping similarLeads to σ1 . . . σn

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54

Page 19: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Class RestrictionsSimilarity function

String similarityNumber of shared property values amongst instances. . .

Solve corresponding hospital-resident problem

Source Target S TDrugbank Disesome Targets GenesSider Diseasome Side-Effect Diseases

DBpedia Dailymed Organization OrganizationSider Dailymed Drugs Offer

Drugbank DBpedia Targets ProteinProperty mapping similarLeads to σ1 . . . σn

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54

Page 20: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Guess initial classifier

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54

Page 21: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Guess initial classifier

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54

Page 22: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Pick most informative examples, i.e., unclassified and closest to boundary

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 14 / 54

Page 23: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Ask for classification from oracle

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 15 / 54

Page 24: Link Discovery Tutorial Part II: Accuracy

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Update classifier

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 16 / 54

Page 25: Link Discovery Tutorial Part II: Accuracy

RAVENEvaluation

Evaluation on Diseases (Diseasome to DBpedia)Learning rate = 0.0210 questions/iterationF-measure of up to 92%

1 3 5 7 9 11 13 15 17 19 21 23 25

Number of iterations

0

10

20

30

40

50

60

70

80

90

100

P (%)R (%)F (%)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 17 / 54

Page 26: Link Discovery Tutorial Part II: Accuracy

RAVENEvaluation

Learning rate = 0.0210 questions/iteration

1 3 5 7 9 11 13 15 17 19 21 23 25

Number of iterations

10

100

1000

Run

time

(ms)

DiseasesDrugsSide Effects

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 18 / 54

Page 27: Link Discovery Tutorial Part II: Accuracy

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 19 / 54

Page 28: Link Discovery Tutorial Part II: Accuracy

EagleEfficient Active Learning of Link Specifications using Genetic Programming

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 20 / 54

Page 29: Link Discovery Tutorial Part II: Accuracy

EagleEfficient Active Learning of Link Specifications using Genetic Programming

EagleProvides means for automatic class and property matchingMinimizes human labeling effort through active learningAllow for learning generic specs (limitation of RAVEN)Similar approaches [NIK+12; ISE+12]

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 21 / 54

Page 30: Link Discovery Tutorial Part II: Accuracy

EagleFormal Definition

Same formal setting as RAVENTwo sets of restrictions resp. that specify the sets S resp. T ,a specification of mapping properties (p1, q1), . . . , (pn, qn) for the elements ofS and T anda specification of a complex similarity measure σ as the combination of severalatomic similarity measures σ1, . . . , σn and of a set of thresholds θ1, . . . , θn suchthat θi is the threshold for σi .

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 22 / 54

Page 31: Link Discovery Tutorial Part II: Accuracy

EagleLS example

Can learn generic classifier type

f (levenshtein(:title, :title), 0.53)

f (cosine(:venue, :year), 1.00)\

f (jaccard(:title, :authors), 0.43)

f (trigrams(:title, :year), 1.00)

ut

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 23 / 54

Page 32: Link Discovery Tutorial Part II: Accuracy

EagleIdea & Goal

EagleIdea: Specifications are treesGoal: Learn elements of trees through genetic operations until best LS isfound

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 24 / 54

Page 33: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 1: Generate initial population

Random process (property pairs, thresholds)Compute fitnessFitness = F-Measure w.r.t known data

(m1, θ1)

p1 q1

(m2, θ2)

p2 q2

(m3, θ3)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 25 / 54

Page 34: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

(m1, θ1)

p1 q1

(m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Page 35: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

(m1, θ1)

p1 q1

(m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Page 36: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

(m1, θ1)

p1 q1

(m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Page 37: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

p1 q1

(m1, θ1 + α) (m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Page 38: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

p1 q1

(m1, θ1 + α) (m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4)

p3 q3

(m2, θ2)

p3 q3

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Page 39: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 3: Computation of most informative links

Previous approaches define amount of information of link as closeness to thedecision boundaryHere, use disagreement amongst elements of population of size n

δ((s, t)) = (n − |Mti : (s, t) ∈Mi)|)(n − |Mt

i : (s, t) /∈Mi |)

Function is maximal when n2 count (s, t) as positive and n

2 as negativeCan be modeled with other functions such as entropy

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 27 / 54

Page 40: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 4: Active Learning

Compute d((s, t)) for all (s, t) returned by a LSPick k most informativeRequire labeling from userUpdate list of positive and negative examples

(m1, θ1 + α)

p1 q1

(m2, θ2)

p3 q3

(m3, θ3)

p2 q2

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 28 / 54

Page 41: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 5: Remove least fit elements

Fitness = F-Measure w.r.t known data

(m1, θ1 + α)

p1 q1

(m2, θ2)

p3 q3

(m3, θ3)

p2 q2

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

If termination conditions not met, goto Step 2Else terminate and pick fittest LS

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54

Page 42: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 5: Remove least fit elements

Fitness = F-Measure w.r.t known data

(m1, θ1 + α)

p1 q1

(m2, θ2)

p3 q3

(m3, θ3)

p2 q2

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

If termination conditions not met, goto Step 2Else terminate and pick fittest LS

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54

Page 43: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmStep 5: Remove least fit elements

Fitness = F-Measure w.r.t known data

(m1, θ1 + α)

p1 q1

(m2, θ2)

p3 q3

(m3, θ3)

p2 q2

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

If termination conditions not met, goto Step 2Else terminate and pick fittest LS

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54

Page 44: Link Discovery Tutorial Part II: Accuracy

Eagle AlgorithmUnsupervised learning

Measure degree of monogamy of links [NIK+12]Only works for 1-1 relations, e.g., owl:sameAs

P(M) = |s|∃t : (s, t) ∈ M|∑s|t : (s, t) ∈ M| ,R(M) = |t|∃s : (s, t) ∈ M|∑

t|s : (s, t) ∈ M| . (1)

Fβ(M) = (1 + β2) Pd(M)Rd(M)β2Pd(M) +Rd(M) (2)

s1

s2

s3

t1

t2

t3

t4

linklinklinklink

P = 3/4, R = 2/4, F = 3/5.Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 30 / 54

Page 45: Link Discovery Tutorial Part II: Accuracy

EagleExperiments and Results

Experimental Setup:Compared batch learning and genetic programmingUsed 3 different data sets

1 Dailymed-Drugbank (LATC)2 DBpedia-LinkedMDB (LATC)3 DBLP-ACM

Compared different sizes of population (20,100)Compared random annotation with active learningMutation and crossover rates = 0.6Maximal number of iterations = 50

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 31 / 54

Page 46: Link Discovery Tutorial Part II: Accuracy

EagleExperiments and Results

Dailymed-Drugbank

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 32 / 54

Page 47: Link Discovery Tutorial Part II: Accuracy

EagleExperiments and Results

DBpedia-LinkedMDB

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 33 / 54

Page 48: Link Discovery Tutorial Part II: Accuracy

EagleExperiments and Results

DBLP-ACM

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 34 / 54

Page 49: Link Discovery Tutorial Part II: Accuracy

EagleExperiments and Results

Larger population leads toBetter results, yetLonger runtimes

For most datasets, population size of 100 seems sufficient for most linkeddata setsEAGLE is more time-efficient than state of the art

337s for ACM-DBLP (n=100) vs.1553s for Marlin (ADTree)2196s for Marlin (SVM)4320s for Febrl (SVM)

Active learning clearly outperforms random annotation

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 35 / 54

Page 50: Link Discovery Tutorial Part II: Accuracy

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 36 / 54

Page 51: Link Discovery Tutorial Part II: Accuracy

CoalaCorrelation-Aware Active Learning of Link Specifications

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 37 / 54

Page 52: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54

Page 53: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54

Page 54: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54

Page 55: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54

Page 56: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54

Page 57: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54

Page 58: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

InsightChoice of right example is key for learningSo far, only use of informativeness

QuestionCan we do better by using more information?Higher F-measureOften slower

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54

Page 59: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

InsightChoice of right example is key for learningSo far, only use of informativeness

QuestionCan we do better by using more information?

Higher F-measureOften slower

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54

Page 60: Link Discovery Tutorial Part II: Accuracy

CoalaLearning Complex Specifications

InsightChoice of right example is key for learningSo far, only use of informativeness

QuestionCan we do better by using more information?Higher F-measureOften slower

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54

Page 61: Link Discovery Tutorial Part II: Accuracy

Coala ApproachBasic Idea

Use similarity of link candidates when selecting most informative examples

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54

Page 62: Link Discovery Tutorial Part II: Accuracy

Coala ApproachBasic Idea

Use similarity of link candidates when selecting most informative examples

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54

Page 63: Link Discovery Tutorial Part II: Accuracy

Coala ApproachBasic Idea

Use similarity of link candidates when selecting most informative examples

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54

Page 64: Link Discovery Tutorial Part II: Accuracy

CoalaSimilarity of Candidates

Link candidate x = (s, t) can be regarded as vector(σ1(x), . . . , σn(x)) ∈ [0, 1]n.Similarity of link candidates x and y :

sim(x , y) = 1

1 +√

n∑i=1

(σi(x)− σi(y))2

. (3)

Allows exploiting both intra- and inter-class similarity

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 42 / 54

Page 65: Link Discovery Tutorial Part II: Accuracy

CoalaGraph Clustering

Rationale: Use intra-class similarityApproach

Cluster elements of S+ and S− independentlyChoose one element per cluster as representativePresent oracle with most informative representatives

0.8

0.9

0.8

S+

S-

0.8

0.9

0.8

0.25

0.25

0.9

0.80.8

0.8

0.25a

b

c

d

e

d

f g

hi

k

l

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 43 / 54

Page 66: Link Discovery Tutorial Part II: Accuracy

CoalaBorderFlow

G = (V ,E , ω) with V = S+ or V = S−

ω(x , y) = sim(x , y)Keep best ec edges for each x ∈ V

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 44 / 54

Page 67: Link Discovery Tutorial Part II: Accuracy

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54

Page 68: Link Discovery Tutorial Part II: Accuracy

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54

Page 69: Link Discovery Tutorial Part II: Accuracy

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54

Page 70: Link Discovery Tutorial Part II: Accuracy

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54

Page 71: Link Discovery Tutorial Part II: Accuracy

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54

Page 72: Link Discovery Tutorial Part II: Accuracy

CoalaSpreading Activation

Rationale: Use both inter- and intra-class similarityApproach

M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2

A0 : ai = ifm(xi )

At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©

t−1 (weight decay)

3 iterations 3.9*10-3

0.97

0.691

0.73 1.5*10-5

3.9*10-3

1.5*10-5

3.9*10-3

S+ S-

0.8

0.80.9

0.90.25

0.5 0.5

0.25

0.5

S+ S-

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54

Page 73: Link Discovery Tutorial Part II: Accuracy

CoalaSpreading Activation

Rationale: Use both inter- and intra-class similarityApproach

M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2

A0 : ai = ifm(xi )At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©

t−1 (weight decay)

3 iterations 3.9*10-3

0.97

0.691

0.73 1.5*10-5

3.9*10-3

1.5*10-5

3.9*10-3

S+ S-

0.8

0.80.9

0.90.25

0.5 0.5

0.25

0.5

S+ S-

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54

Page 74: Link Discovery Tutorial Part II: Accuracy

CoalaSpreading Activation

Rationale: Use both inter- and intra-class similarityApproach

M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2

A0 : ai = ifm(xi )At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©

t−1 (weight decay)

3 iterations 3.9*10-3

0.97

0.691

0.73 1.5*10-5

3.9*10-3

1.5*10-5

3.9*10-3

S+ S-

0.8

0.80.9

0.90.25

0.5 0.5

0.25

0.5

S+ S-

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54

Page 75: Link Discovery Tutorial Part II: Accuracy

Coala EvaluationExperimental Setup

Used EAGLE as active learning approachMutation and crossover rate = 0.6Selection rate = 0.7Not deterministic ⇒ Ran each experiment 5 times5 queries to oracle per iteration10 iterations overall2 populations sizes: 20 and 10050 generations between iterations

Two real-world and three synthetic datasetsSingle thread of a server (JDK1.7, Ubuntu 10.0.4, AMD Opteron 2GHz,2GB/Experiment)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 48 / 54

Page 76: Link Discovery Tutorial Part II: Accuracy

Coala EvaluationParameters for WD

Ran experiments on DBLP-ACMPopulation = 20r ∈ 2, 4, 8, 16, 32

10 20 30 40 50 60 70 80 90 1000.5

0.6

0.7

0.8

0.9

1

F-sc

ore

0

200

400

600

800

1,000

runti

me in s

eco

nds

f(2) f(4) f(8) f(16) f(32)d(2) d(4) d(8) d(16) d(32)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 49 / 54

Page 77: Link Discovery Tutorial Part II: Accuracy

Coala EvaluationParameters for CL

Ran experiments on DBLP-ACMPopulation = 20ec ∈ 1, 2, 3, 4, 5

10 20 30 40 50 60 70 80 90 1000.5

0.6

0.7

0.8

0.9

1

F-sc

ore

0

500

1,000

1,500

2,000

runti

me in s

eco

nd

s

f(1) f(2) f(3) f(4) f(5)d(1) d(2) d(3) d(4) d(5)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 50 / 54

Page 78: Link Discovery Tutorial Part II: Accuracy

Coala EvaluationF-Scores

Population = 100, final valuesBetter results, yet unclear when to use WD or CL

DataSet EAGLE WD CLAbt 0.19±0.04 0.25±0.04 0.23±0.04DBLP 0.91±0.03 0.96±0.01 0.96±0.02Person1 0.86±0.02 0.89±0.01 0.81±0.18Person2 0.74±0.03 0.71±0.08 0.77±0.03Restaurant 0.89±0.0 0.86±0.02 0.89±0.0

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 51 / 54

Page 79: Link Discovery Tutorial Part II: Accuracy

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 52 / 54

Page 80: Link Discovery Tutorial Part II: Accuracy

Summary and Conclusion

Large number of challenges tolearning accurate specifications

1 Reduce labeling effort⇒ Active learning

2 Learn complex specifications⇒ Genetic programming

3 Learn specifications efficienty⇒ See previous slides

Challenges include1 Determinism2 Deep learning3 Self-checking4 . . .

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 53 / 54

Page 81: Link Discovery Tutorial Part II: Accuracy

Acknowledgment

This work was supported by grants from the EU H2020 Framework Programmeprovided for the project HOBBIT (GA no. 688227).

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 54 / 54