Top Banner

of 18

Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

Jul 07, 2018

Download

Documents

Lakshay Bansal
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    1/18

     

    Feature-based approaches to semanticsimilarity assessment of concepts using

    Wikipedia

    -Yuncheng Jiang , Xiaopei Zhang, Yong Tang, Ruihua Nie

    Presented By:Kushagra Sharma (286/CO/12)Lakshay Bansal (287/CO/12)

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    2/18

     

    Abstract

     In the ast! se"eral ar#a$hes t# assess s%m%lar%ty &ye"aluat%ng the kn#'lege m#ele %n an (#r mult%le) #nt#l#gy (#r#nt#l#g%es) ha"e &een r##se

     *#'e"er! there are s#me l%m%tat%#ns su$h as the +a$ts #+ rely%ng#n ree+%ne #nt#l#g%es an +%tt%ng n#n-ynam%$ #ma%ns %n thee,%st%ng measures

     In th%s aer! s#me n#"el +eature &ase s%m%lar%ty assessment

    meth#s ha"e &een r##se that are +ully eenent #n%k%e%a an $an a"#% m#st #+ the l%m%tat%#ns an ra'&a$ks%ntr#u$e %n the re"%#us meth#s

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    3/18

     

    Introduction

    Definition: Semant%$ s%m%lar%ty %s unerst## as the egree #+ta,#n#m%$ r#,%m%ty &et'een $#n$ets (#r terms! '#rs)

      In #ther '#rs! semant%$ s%m%lar%ty states h#' ta,#n#m%$ally neart'# $#n$ets (#r terms! '#rs) are! &e$ause they share s#mease$ts #+ the%r mean%ng

    .e$hn%$ally! s%m%lar%ty measures assess a numer%$al s$#re thatuant%+%es th%s r#,%m%ty as a +un$t%#n #+ the semant%$ e"%en$e#&ser"e %n #ne #r se"eral kn#'lege s#ur$es

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    4/18

     

    Ontology based methods toestimate similarity

    0ge C#unt%ng easures

    In+#rmat%#n C#ntent easures

    eature Base easures

    *y&r% easures

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    5/18

     

    Edge counting measures

     - consists of taking into account the length of the path linking theconcepts !or terms" and the position of the concepts !or terms"in a gi#en dictionary !or ta$onomy% ontology"

      .he ma%n ad#antage #+ ege $#unt%ng measures %s the%r simplicity&

    .hey #nly rely #n the grah m#el #+ an %nut #nt#l#gy 'h#see"aluat%#n reu%res a lo' computational cost&

    3ue t# the%r s%ml%$%ty! these ar#a$hes #++er a limited accuracy ue t# #nt#l#g%es m#el a large am#unt #+ ta,#n#m%$al kn#'legethat %s n#t $#ns%ere ur%ng the e"aluat%#n #+ the m%n%mum ath In

    an#ther erse$t%"e! the ma%n assumt%#n #+ ege $#unt%ngmeasures %s that an ege reresents the same semantic distanceany'here %n the stru$ture #+ the grah (#r ath)! 'h%$h %s n#t true asse$t%#ns #+ the grah may &e +%nely $lass%+%e an #thers #nly$#arsely e+%ne

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    6/18

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    7/18

     

    DI*AD+A,A.E* ofO,O/O.0 BA*ED )E1OD*

    Clearly! the $#nstru$t%#n process of domain ontologies is time-consuming and

    error-prone an ma%nta%n%ng these #nt#l#g%es als# reu%res a l#t #+ e++#rt +r#m

    e,erts .hus! the meth#s #+ #nt#l#gy &ase s%m%lar%ty measures are limited in

    scope and scalability

    %th the emergence of social net'orks or instant messaging systems! a l#t #+

    (sets #+) $#n$ets #r terms (r#er n#uns! &rans! a$r#nyms! ne' '#rs!$#n"ersat%#nal '#rs! te$hn%$al terms an s# #n) are not included in Word,et an

    #ma%n #nt#l#g%es (%n +a$t e& users $an u&l%sh 'hate"er they 'ant t# share '%th

    the rest #+ the '#rl &y us%ng %k%s! Bl#gs an #nl%ne $#mmun%t%es at resent)!

    there+#re! s%m%lar%ty measures that are &ase #n these k%ns #+ kn#'lege res#ur$es

    $ann#t &e use %n these tasks

    .hese l%m%tat%#ns are the m#t%"at%#n &eh%n the ne' te$hn%ues resente %n th%saer 'h%$h infer semantic similarity from a kind of ne' source of information!

    %e! a '%e $#"erage #nl%ne en$y$l#e%a! namely %k%e%a

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    8/18

     

    Feature based similarityFeature based approaches to similarity measures assess similarity bet'een conceptsas a function of their properties&

    (ommon features tend to increase similarity and non-common ones tend to diminish it&

     4m%tt%ng a +un$t%#n 2!c" that yields the set of features rele#ant to c! ."ersky r##se the+#ll#'%ng s%m%lar%ty +un$t%#n:

    'here %s s#me +un$t%#n that re+le$ts the sal%en$e #+ a set #+ +eatures! 5(a) 5(&) %s the%nterse$t%#n &et'een th#se t'# sets #+ +eatures! 5(a) 5(&)%s the set #&ta%ne 'henel%m%nat%ng the elements #+ '(&) +r#m the set #+ +eatures #+ $#n$et a! 5(a)! an ! 9 an are arameters that r#"%e +#r %++eren$es %n +#$us #n the %++erent $#m#nents

    3odrigue4 and Egenhofer!35E" resent a k%n #+ ar#a$h t# $#mut%ng semant%$s%m%lar%ty .he s%m%lar%ty %s $#mute as the 'e%ghte sum #+ s%m%lar%t%es &et'een synsets!+eatures (eg! mer#nyms! attr%&utes! et$) an ne%ghr $#n$ets (th#se l%nke "%a semant%$

    #%nters) #+ e"aluate terms: Simre(a,b) = w. S synsets(a,b) + u. S features(a,b)+ v. S neighborhoods(a,b) 

    'here the +un$t%#ns Ssynsets! S+eatures! an Sne%ghrh##s are the s%m%lar%ty &et'eensyn#nym sets! +eatures! an semant%$ ne%ghrh##s #+ e"aluate terms! '! u! an " ('! u! ";

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    9/18

     

    S reresents the #"erla%ng &et'een the %++erent +eatures!$#mute as +#ll#'s:

     4 +eature &ase +un$t%#n $alle =-s%m%lar%ty rel%es #n the mat$h%ng &et'een synsets an termes$r%t%#n sets .he term es$r%t%#n sets $#nta%n '#rs e,tra$te &y ars%ng term e+%n%t%#ns .'# terms are s%m%lar %+ the%r synsets #r es$r%t%#n sets #r! the synsets #+ the terms %n the%rne%ghrh## (eg! m#re se$%+%$ an m#re general terms) are le,%$ally s%m%lar .he s%m%lar%ty+un$t%#n %s e,resse as +#ll#'s

     .he s%m%lar%ty +#r the semant%$ ne%ghrs Sne%ghrh##s %s $al$ulate as +#ll#'s:

    'here % en#tes relat%#nsh% tye

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    10/18

     

    WIs hyerl%nks are als# use+ul as ana%t%#nal s#ur$e #+ syn#nyms n#t $ature &y re%re$ts*yerl%nks als# $#mlement%sam&%guat%#n ages &y en$#%ng #lysemy In art%$ular! art%$les ment%#n%ng #ther en$y$l#e%$entr%es #%nt t# them thr#ugh %nternal hyerl%nks .h%s m#els art%$le $r#ss-re+eren$e

    (ategory structure S%n$e ay 2

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    11/18

     

    Feature-based similarity usingWikipedia

    Formal representation of Wikipedia conceptsLet 4 &e a %k%e%a art%$le an C#n &e the t%tle #+ 4 .he +#rmal reresentat%#n #+ %k%e%a

    $#n$et C#n %se+%ne as +#ll#'s:

    (on 6 =*ynonyms% .losses% Anchors% (ategories>

    'here Syn#nyms C#n! C#n1! ! C#nmD %s the set #+ syn#nyms #+ C#n! El#sses %s the +%rstaragrah #+ te,t #+ 4! 4n$h#rs 4n$1! ! 4n$nD %s the set #+ an$h#r te,ts (%e! la&els #+%nternal hyerl%nks) %n 4! an Categ#r%es Cat1! ! CatkD %s the set #+ $ateg#r%es #+ 4

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    12/18

     

     4 +rame'#rk +#r +eature-&ase s%m%lar%ty

    Let C#n1  FSyn#nyms1! El#sses1! 4n$h#rs1! Categ#r%es1G an C#n2 FSyn#nyms2! El#sses2!

     4n$h#rs2! Categ#r%es2G &e t'# %k%e%a $#n$ets .he s%m%lar%ty #+ C#n1 an C#n2! en#te

    as S%mC#n(C#n1! C#n2)! %s the +un$t%#n

    S%mC#n: %k%C#n = %k%C#n H

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    13/18

     

    )A1E)AI(A/ )ODE//I,. OF FEA3EBA*ED A**E**)E,

    e $an #&ta%n %++erent +eature &ase ar#a$hes t# s%m%lar%ty assessment reult%ng +r#m%nstant%at%#ns #+ the +rame'#rk

    %th#ut l#ss #+ general%ty! 'e assume that there are t'# sets #+ terms (#r '#rs! $#n$ets) Set1 

    an Set2 O&"%#usly! these t'# sets may &e Syn#nyms! Setgl#sses! 4n$h#rs! #r Categ#r%es

     4$$#r%ng t# =-s%m%lar%ty ar#a$h #r 0 ar#a$h (#r%gue@ 0genh#+er)! 'e ha"e the

    +#ll#'%ng s%m%lar%ty $#mutat%#n meth#s +#r Set1 an Set2:

    here

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    14/18

     

    E%"en t'# %k%e%a $#n$ets

    (on? 6 =*ynonyms?% *etglosses?% Anchors?% (ategories?> and

    (on@ 6 =*ynonyms@% *etglosses@%Anchors@% (ategories@>

     4$$#r%ng t# the n#t%#ns #+ SBsim an SKL0! 'e ha"e the +#ll#'%ng ar#a$hes t# s%m%lar%tymeasures +#r %k%e%a $#n$ets (su#se that the +un$t%#n S$#n$ets %s the a"erage #r ma,):

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    15/18

     

    (omparisonof #arious

    Approachesto 1uman

    basedCudgements

    3esults on correlation'ith human udgementsof similarity measures&

    (r#m le+t t# r%ght: measurear#a$h! $#rrelat%#n +#r C

    &en$hmark! $#rrelat%#n +#r E&en$hmark! an $#rrelat%#n +#r

    NMN-.C &en$hmark)

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    16/18

     

    Benchmark and E$perimental results !based on studentsG and professorsG udgements"

    3esults on correlation 'ith our benchmark ofsimilarity measures&

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    17/18

     

    Analysis of E$perimental3esults

    .he ar#a$hes S%m%r C#n! S%mSe$C#n! S%m.h%C#n! an S%m#uC#n! they er+#rm relat%"ely 'ell

    '%th the l#'est $#rrelat%#n &e%ng

  • 8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

    18/18

     

    (onclusion

      .he +%nal g#al #+ $#muter%@e s%m%lar%ty measures %s to accurately mimic human udgements about semantic similarity&

      In th%s aer! s#me limitations #+ the e,%st%ng +eature &ase measures areidentified! su$h as the +a$ts #+ rely%ng #n a (#r mult%le) ree+%ne #ma%n #nt#l#gy (#r#nt#l#g%es) an fitting static domains (%e! n#n-ynam%$ #ma%ns)

    .# %mlement semant%$ s%m%lar%ty measurement &ase #n +eature &y mak%ng use #+

    %k%e%a a formal representation of Wikipedia concepts %s resente .hen! aframe'ork +#r +eature &ase s%m%lar%ty &ase #n the +#rmal reresentat%#n #+ %k%e%a$#n$ets %s g%"en

    .he e"aluat%#n! &ase #n se"eral '%ely use &en$hmarks an a &en$hmarke"el#e %n th%s aer! susta%ns the %ntu%t%#ns '%th rese$t t# human Pugements

    O"erall! se"eral meth#s resente here ha"e g## human $#rrelat%#n an $#nst%tute

    s#me e++e$t%"e 'ays #+ eterm%n%ng s%m%lar%ty &et'een %k%e%a $#n$ets In a%t%#n!$#ns%er%ng the l%m%tat%#ns (eg! small s%@e) #+ the e,%st%ng stanar &en$hmarks +#r$#n$et s%m%lar%ty assessment! 'e '%ll ursue the es%gn #+ a ne' &en$hmark se$%ally+#$use #n %k%e%a $#n$ets