Top Banner
Intelligent Database Systems Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY A Study of Hybrid Similarity Measures for Semantic Relation Extraction
20

Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Dec 28, 2015

Download

Documents

Sydney Porter
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Presenter : BEI-YI JIANG

Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM

2012. ASSOCIATION FOR COMPUTING MACHINERY

A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Page 2: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Motivation

• The quality of the relations provided by existing extractors is still lower than the quality of the manually constructed relations.

• Most studies are still not taking into account the whole range of existing measures, combining mostly sporadically different methods.

Page 4: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Objectives

• To development of new relation extraction methods.• The method is a systematic analysis of 16 baseline

measures, and their combinations with 8 fusion methods and 3 techniques for the combination set selection.

Page 5: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Methodology• norm function

• similarity scores

• knn function

Page 6: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Methodology-Single Similarity Measures

• Measures Based on a Semantic Network(5)– exploit the lengths of the shortest paths between

terms in a network– probability of terms derived from a corpus– Wu and Palmer, Leacock and Chodorow, Resnik,

Jiang and Conrath , and Lin

Page 7: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Web-based Measures(3)– Web search engines– rely on the number of times the terms co-occur in

the documents– Normalized Google Distance(NGD)– Measures of Semantic Relatedness(MSR)– YAHOO!, BING, GOOGLE over the domain

wikipedia.org

Methodology-Single Similarity Measures

Page 8: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Corpus-based Measures(5)– Distributional Measures

› Bag-of-words Distributional Analysis(BDA) › Syntactic Distributional Analysis(SDA)

– Pattern-based Measure› PatternWiki

– Other Corpus-based Measures› Latent Semantic Analysis(LSA)› Normalized Google Distance(NGD)

Methodology-Single Similarity Measures

Page 9: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Definition-based Measures(3)– WktWiki– Gloss Vectors– Extended Lesk

Methodology-Single Similarity Measures

Page 10: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Combination Methods – Input: a set of similarity matrices{S1, . . . , SK}

produced by K single measures– Output: a combined similarity matrix Scmb

› 1. Mean› 2. Mean-Nnz› 3. Mean-Zscore› 4. Median

Methodology- Hybrid Similarity Measures

› 5. Max› 6. Rank Fusion› 7. Relation Fusion› 8. Logit

Page 11: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Combination Methods– Mean. A mean of K pairwise similarity scores:

– Mean-Nnz. A mean of those pairwise similarity scores which have a non-zero value:

Methodology- Hybrid Similarity Measures

Page 12: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Combination Methods– Mean-Zscore. A mean of K similarity scores transformed

into Z-scores:

– Median. A median of K pairwise similarities:

Methodology- Hybrid Similarity Measures

Page 13: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Combination Methods– Max. A maximum of K pairwise similarities:

– Rank Fusion.

Methodology- Hybrid Similarity Measures

Page 14: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Combination Methods– Relation Fusion.

– Logit.

Methodology- Hybrid Similarity Measures

Page 15: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

• Combination Sets– Expert choice of measures

– Forward stepwise procedure

– Logistic regression

Methodology- Hybrid Similarity Measures

Page 16: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Experiments• Evaluation– Human Judgements Datasets.

› MC, RG, WordSim353

– Semantic Relations Datasets.› BLESS, SN

Page 17: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Experiments

Page 18: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Experiments

Page 19: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Conclusions

• The results have shown that the hybrid measures outperform the single measures on all datasets.

• A combination of 15 baseline corpus-, web-, network-, and dictionary-based measures with Logistic Regression provided the best results.

Page 20: Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Intelligent Database Systems Lab

Comments• Advantages– higher performance

• Applications