Top Banner
CASIA Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base Yi Zeng, Dongsheng Wang, etc. 2013.1
18

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Feb 23, 2016

Download

Documents

iolani

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base. Yi Z eng , Dongsheng Wang, etc. 2013.11.18. Content. Introduction Chinese Semantic Knowledge Base Entity Linking Semantic Knowledge Base Construction Linking Unambiguous Entities - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

CASIA

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Yi Zeng, Dongsheng Wang, etc.

2013.11.18

Page 2: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

ContentIntroductionChinese Semantic Knowledge Base Entity

Linking Semantic Knowledge Base Construction Linking Unambiguous Entities Stepwise Entity Disambiguation Based on a

Chinese Knowledge Base Experimental Result and AnalysisConclusion

Page 3: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

IntroductionMotivation and requirement

Real-time update knowledge base Extract facts automatically from massive raw

texts• -> Entity identification in advance is essential

Page 4: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

IntroductionCASIA_EL (the proposed system)

Knowledge base construction• Entity linking by retrieving “relations”

Ambiguous• Similarity measurement• Stepwise ambiguity

Result Linked 1232 entities from Sina to a Chinese

Knowledge Base An accuracy of 88.5% overall

Page 5: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Chinese Semantic Knowledge Base Entity Linking

Semantic Knowledge Base Construction

Linking Unambiguous Entities

Stepwise Entity Disambiguation Based on a Chinese Knowledge base

Page 6: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Semantic Knowledge Base Construction

Format XML -> N3 TDB triple store

Page 7: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Synset construction from multiple sources (1/2)

Provided Part of the Baidu Encyclopedia knowledge

base name, English name, Chinese name from the

infobox knowledgeNot provided synset

1) nick names and redirect titles from:• Baidu Encyclopedia• Hudong Encyclopedia• Wikipedia Chinese pages• -> lead to 476,086 pair of synonyms are added

Page 8: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Synset construction from multiple sources (2/2)

Split western people’s name by “.” into smaller keywords Etc., “Michael·Jordan” is split into “Michael” and

“Jordan” added as possible labels for “Michael Jordan”

These synset are represented through “rdf:label” Enable the search of entity through keyword

Page 9: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Linking Unambiguous Entities

Preprocessing <,>,《,》,” ,”, etc.

• For example, “《霸王别姬》 Process as it is (syntax) Remove and process again

Retrieving “rdfs:label” can result in 1) one candidate

• Link directly and out put KB_ID 2) no candidate

• Google’s “did you mean?” function• Start again or output null

3) several candidate• We will discuss in details in the following

Page 10: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Stepwise Entity Disambiguation Based on a Chinese Knowledge base

Page 11: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Stepwise Entity Disambiguation Based on a Chinese Knowledge base

Stepwise Bag-of-Words(S-BOW) Add other entities' document

• bag[dst]– Bag of [Document of short text]

• bag[dkb]– Bag of [document of knowledgebase]

Algorithms • 1) sim(dst, dkb)=| bag[dst] ∩bag[dkb]|.• 2) bag’[dst]= bag[dst]bag[t1]bag[t2]…bag[tn].• 3) sim'(dst, dkb)=| bag'[dst] ∩bag[dkb]|

= | (bag[dst]bag[t1]bag[t2]…bag[tn]) ∩bag[dkb] |.

Page 12: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Experimental Result and Analysis

Accurate Output

precision In-KB precision

In-KBrecall

In-KBF1

NIL precision NIL recall

NIL F1

731 0.885 0.8662 0.8456 0.8558 0.9036 0.9260 0.9146

Page 13: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Experimental Result and Analysis

Reasons why we select nouns and literal string Which words should be considered and the

documents should be added?

Page 14: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Experimental Result and Analysis

The number of candidate entities Goes larger, more correct it is

Page 15: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Experimental Result and Analysis

Due to the adding of many synonyms Many of then more than 10 (56 entities) Performs very well Example

Note: when the number is greater than 9, there are no incorrect disambiguations.

Target Entity and the Microblog Post

Candidate Entities

Name Produced KB_ID

 weibo id = aonierqiuyituiyi914 name id = 詹姆斯 content = “奥尼尔球衣退役了,突然联想到如果詹姆斯以后退役了,克里夫兰会退役他的球衣吗??????”

KBBD000035 詹姆斯 ·普雷斯科特 ·焦耳    KBBD0000

92

KBBD000092 勒布朗 ·詹姆斯KBBD000609 詹姆斯 ·西蒙斯KBBD000707 詹姆斯 ·克拉克 ·麦克斯韦KBBD000875 詹姆斯 ·弗兰克KBBD000876 詹姆斯 ·弗兰克… …KBBD018850 詹姆斯 ·瓦特

Page 16: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Experimental Result and Analysis

Disambiguation 161 disambuguation entites are detected

(161/1232)• 123 were disambiguated -> correctness is 82.1%

(101/123 entities were correctely disambiguated)• Extend the original microblog posts (Stepwise)

– Another 38 entities were disambiguated– correctness is 63.2%

the overall entity disambiguation correctness based on the proposed algorithm is 77.6% (125/161 entities are correctly disambiguated).

Page 17: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

ConclusionAs for short texts (Based on Chinese

Knowledge base) Stepwise method

• Adding other documents of entities those that are within the same context

Compared to many algorithms• Solve the insufficient of information

Retrieving of the relations is various Enriching of the synset from multiple sources

Page 18: Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

References [1] F. M. Suchanek, G. Kasneci, and G. Weikum, "YAGO: A Large Ontology from Wikipedia and WordNet". Journal

of Web Semantics, 6(3), 203-217, Elsevier, 2008. [2] A. Bagga and B. Baldwin, "Entity-based Cross-document Coreferencing Using the Vector Space Model".

Proceedings of the 17th International Conference on Computational linguistics (COLING '98), 79-85, ACL, Montreal, Quebec, Canada, 1998.

[3] G. S. Mann and D. Yarowsky, "Unsupervised personal name disambiguation". Proceedings of the 7th Conference on Natural Language Learning (CONLL '03), 33-40, ACL, Edmonton, Canada, 2003.

[4] R. Bekkerman and A. McCallum, "Disambiguating Web appearances of people in a social network". Proceedings of the 14th International Conference on the World Wide Web (WWW '05), 463-470, ACM Press, Chiba, Japan, 2005.

[5] L. Jiang, J. Wang, N. An, S. Wang, J. Zhan, and L. Li, "GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search". Proceedings of the 9th IEEE International Conference on Data Mining (ICDM '09), 199-208, IEEE Press, 2009.

[6] X. Han and J. Zhao, "Named entity disambiguation by leveraging Wikipedia semantic knowledge". Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09), 215-224, ACM Press, Hong Kong, China, 2009.

[7] W. Shen, J. Wang, P. Luo, and M. Wang, "LINDEN: linking named entities with knowledge base via semantic knowledge". Proceedings of the 21st international conference on the World Wide Web (WWW '12), 449-458, ACM Press, Lyon, France, 2012.

[8] X. Niu, X. Sun, H. Wang, S. Rong, G. Qi, and Y. Yu, "Zhishi.me - Weaving Chinese Linking Open Data". Proceedings of the International Semantic Web Conference (ISWC '11), Lecture Notes in Computer Science 7032, 205-220, Springer, 2011.

[9] Z. Wang, J. Li, Z. Wang, and J. Tang, "Cross-lingual Knowledge Linking across Wiki Knowledge Bases". Proceedings of the 21st World Wide Web Conference (WWW '12), 459-468, ACM Press, Lyon, France, 2012.