Computing text semantic relatedness using the contents and links of a hypertext encyclopedia Presenter : Bo-Sheng Wang Authors : Majid Yazdani a,b,* , Andrei Popescu-Belis a AI, 2013 1
Feb 23, 2016
Computing text semantic relatedness using the contents and links of a
hypertext encyclopedia
Presenter : Bo-Sheng Wang Authors : Majid Yazdania,b,*, Andrei Popescu-Belisa
AI, 2013
1
Outlines
• Motivation• Objectives• Methodology• Empirical analyses• Experiments• Conclusions• Comments
2
Motivation
3
• Existing measures of semantic relatedness based on lexical overlap, though widely used, are of little help when text similarity is not based on identical words.
Objectives• Therefore, they will computing text semantic
relatedness based on concepts and their relations, which have linguistic as well as extra-linguistic dimensions, remains a challenge especially in the general domain and/or over noisy
4
Methodology-build concept network
5
• Concept– They removed all Wikipedia articles.• (Talk,File, Image, Template, Category, Portal, and List,)
– Disambiguation pages were removed.– They set a cut-off limit of 100 non-stop words.– They extracted the corresponding anchor text
and considered it as another possible secondary title for the linked article.
Methodology
6
Methodology-build concept network• Relatoins– They focus in the present study on the hyperlinks
and links computed from similarity of content, of category.
– we computed the lexical similarity between articles as the cosine similarity between the vectors derived from the articles’ texts, after stopword removal and stemming using Snowball.
7
Methodology
8
Methodology-VP
9
Methodology-VP to weighted sets of concepts and to texts
10
Methodology-Approximation
11
Methodology-Approximation• T–truncated
• ε-truncated
12
Methodology-Learning embedding
13
Empirical analyses• Convergence of the T-truncated
14
Empirical analyses
• Convergence of ε-truncated
15
Empirical analyses
16
Experiments
• Average training error
17
Experiments
• Average training error
18
Experiments
• Word Similarity
19
Experiments
• Word Similarity
20
Experiments
21
Experiments
• Document similarity
22
Experiments
• Document clustering
23
Experiments
• Comparison of VP and cosine similarity
24
Experiments
• Text classification
25
Experiments
26
Experiments
27
Experiments
28
Conclusions
29
Comments
• Advantages
• Disadvantage
• Applications– Text categorization
30