Measuring Semantic Distance using Distributional Profiles of Concepts Saif Mohammad Department of Computer Science University of Toronto Grateful acknowledgments: Graeme Hirst (advisor and co-author); Iryna Gurevych, Torsten Zesch, and Philip Resnik (co-authors); Rada Mihalcea, Renee Miller, Gerald Penn, Suzanne Stevenson, University of Toronto (especially the CL group), and NSERC.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Measuring Semantic Distanceusing Distributional Profiles of Concepts
Saif MohammadDepartment of Computer Science
University of Toronto
Grateful acknowledgments:Graeme Hirst (advisor and co-author);Iryna Gurevych, Torsten Zesch, and Philip Resnik (co-authors);Rada Mihalcea, Renee Miller, Gerald Penn, Suzanne Stevenson,University of Toronto (especially the CL group), and NSERC.
Semantic Distance
SALSA DANCE
CLOWN BRIDGE
A measure of how close or distant two units of language arein terms of their meaning
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 2
Why measuresemantic distance?• Natural language processing is teeming with semantic-
distance problems:
� Machine translation
You know a person by the company they keep
Das Wesen eines Menschen erkennt man an derGesellschaft, mitder er sich umgibt
bag ofhypotheses
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 3
Why measuresemantic distance?• Natural language processing is teeming with semantic-
distance problems:
� Word sense disambiguation
Hermione cast a bewitchingspell
CHARM OR INCANTATION
bag ofhypotheses
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 4
Why measuresemantic distance?• Natural language processing is teeming with semantic-
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 53
Accomplishments (1)
• Performed a qualitative and quantitative comparison ofWordNet-based and distributional measures
• Identified significant limitations of state-of-the-art ap-proaches to measuring semantic distance
� Word sense ambiguity
• A hurdle for distributional measures
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 54
Accomplishments (2)
• Proposed a newhybrid approach to semantic distance
� Combines text with a thesaurus
� Models concepts (rather than words)
� Uses thesaurus categories as very coarse senses
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 55
Accomplishments (3)
• Extensive evaluation
� Monolingual
• By combining English text with an English thesaurus◦ Ranked word pairs◦ Corrected real-word spelling errors◦ Determined word sense dominance◦ Did word sense disambiguation
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 56
Accomplishments (4)
• Extensive evaluation(continued)
� Cross-lingual
• By combining German text with an English thesaurus◦ Ranked word pairs and solving word-choice
problems in German• By combining Chinese text with an English thesaurus◦ Identified the English translations of Chinese
words from their contexts
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 57
Future work
• Adding cross-lingual semantic distance as a feature to astate-of-the-art MT system (withPhilip Resnik)
• Cross-lingual document clustering
• Cross-lingual information retrieval
• Cross-lingual summarization (withBonnie Dorr)
• Determining paraphrases, lexical entailment, and contra-dictions (withBonnie Dorr)
• Determining cognates using semantic distance betweenwords in different languages (withGreg Kondrak)
• Porting the approach to Wikipedia (withTorsten ZeschandIryna Gurevych)
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 58
Conclusions (1)
• Distributional profiles of conceptscan be used to infertheir semantic properties, and indeed estimate semanticdistance.
• Cross-lingual DPCsallow for a seamless transition fromwords in one language to concepts in another.
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 59
Conclusions (2)
• Distributional measures of concept-distanceare markedlysuperior to previous approaches.
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 60
Conclusions (2)
• Distributional measures of concept-distanceare markedlysuperior to previous approaches.
� Works well for all pos pairs
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 61
Conclusions (2)
• Distributional measures of concept-distanceare markedlysuperior to previous approaches.
� Works well for all pos pairs
� Gives both relatedness and similarity
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 62
Conclusions (2)
• Distributional measures of concept-distanceare markedlysuperior to previous approaches.
� Works well for all pos pairs
� Gives both relatedness and similarity
� Domain adaptable
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 63
Conclusions (2)
• Distributional measures of concept-distanceare markedlysuperior to previous approaches.
� Works well for all pos pairs
� Gives both relatedness and similarity
� Domain adaptable
� Can be used in real-time systems
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 64
Conclusions (2)
• Distributional measures of concept-distanceare markedlysuperior to previous approaches.
� Works well for all pos pairs
� Gives both relatedness and similarity
� Domain adaptable
� Can be used in real-time systems
� Cross-lingual
• Solve problems in a one language using a knowledgesource from another
• Solve problems that involve multiple languages
Measuring Semantic Distance using Distributional Profiles of Concepts. Saif Mohammad. 65