Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Computing text semantic relatedness using the contents and links of a

hypertext encyclopedia

Presenter : Bo-Sheng Wang 　Authors : Majid Yazdania,b,*, Andrei Popescu-Belisa

AI, 2013

1

Outlines

• Motivation• Objectives• Methodology• Empirical analyses• Experiments• Conclusions• Comments

2

Motivation

3

• Existing measures of semantic relatedness based on lexical overlap, though widely used, are of little help when text similarity is not based on identical words.

Objectives• Therefore, they will computing text semantic

relatedness based on concepts and their relations, which have linguistic as well as extra-linguistic dimensions, remains a challenge especially in the general domain and/or over noisy

4

Methodology-build concept network

5

• Concept– They removed all Wikipedia articles.• (Talk,File, Image, Template, Category, Portal, and List,)

– Disambiguation pages were removed.– They set a cut-off limit of 100 non-stop words.– They extracted the corresponding anchor text

and considered it as another possible secondary title for the linked article.

Methodology

6

Methodology-build concept network• Relatoins– They focus in the present study on the hyperlinks

and links computed from similarity of content, of category.

– we computed the lexical similarity between articles as the cosine similarity between the vectors derived from the articles’ texts, after stopword removal and stemming using Snowball.

7

Methodology

8

Methodology-VP

9

Methodology-VP to weighted sets of concepts and to texts

10

Methodology-Approximation

11

Methodology-Approximation• T–truncated

• ε-truncated

12

Methodology-Learning embedding

13

Empirical analyses• Convergence of the T-truncated

14

Empirical analyses

• Convergence of ε-truncated

15

Empirical analyses

16

Experiments

• Average training error

17

Experiments

• Average training error

18

Experiments

• Word Similarity

19

Experiments

• Word Similarity

20

Experiments

21

Experiments

• Document similarity

22

Experiments

• Document clustering

23

Experiments

• Comparison of VP and cosine similarity

24

Experiments

• Text classification

25

Experiments

26

Experiments

27

Experiments

28

Conclusions

29

Comments

• Advantages

• Disadvantage

• Applications– Text categorization

30

Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Documents

text similarity

experimentsword similarity

lexical similarity

similarity of content

7methodology8 methodologyvp

articles texts

wikipedia articles

corresponding anchor