This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Noun compound (NC): sequence of two or more nouns that actas a single noun, e.g., colon cancer, suppressor protein, tumorsuppressor protein, colon cancer tumor suppressor protein, etc.
Task: interpret the meaning of two-word English NCs
based on paraphrasinge.g., olive oil = ‘oil that is extracted from olive(s)’(Vanderwende 1994; Kim & Baldwin 2006; Butnariu & Veale2008; Nakov & Hearst 2008)
Hendrickx, Kozareva, Nakov, O Seaghdha, Szpakowicz, Veale
The participating systems’ paraphrases are matched againstthose in the “gold” standard: at word/stem level (fuzzy matchesallowed), then at phrase level (overlapping n-grams, nodeterminers), then at the paraphrase level (to find thehighest-ranking match for each). Scores and ranks for all ofthese are combined. See the paper for all gory details.
Hendrickx, Kozareva, Nakov, O Seaghdha, Szpakowicz, Veale
Isomorphic mode: each system paraphrase is matchedwith a different gold-standard paraphrase.Non-isomorphic mode: multiple system paraphrases maymatch the same gold-standard paraphrase.Rank multipliers reward system paraphrases which matchgold-standard paraphrases highly ranked by humans.
Hendrickx, Kozareva, Nakov, O Seaghdha, Szpakowicz, Veale
MELODI: semantic vector space model built from theUKWAC corpus; used features on the head noun to train aMaxEnt classifier.
IIITH: probabilities of the preposition co-occurring with arelation to identify the class of the noun compound; usesGoogle n-grams, BNC and ANC.
SFS: templates and fillers from training data, 4-gramlanguage model, and a MaxEnt reranker. To find similarcompounds, used Lin’s WordNet similarity and statisticsfrom the English Gigaword and the Google n-grams.
Hendrickx, Kozareva, Nakov, O Seaghdha, Szpakowicz, Veale
BaselineFor each test compound M H, generate the following paraphrases, in thisprecise order:H of M, H in M, H for M, H with M, H on M, H about M, H has M, H to M, Hused for M, H used in M.
Hendrickx, Kozareva, Nakov, O Seaghdha, Szpakowicz, Veale
Created a new dataset of free paraphrases for noun-nouncompound interpretation; available for further research.Proposed two new evaluation metrics.Offered insights into the current approaches to the task.
This work has been partially supported by a grant from Amazon, which we used onMTurk.
We also thank our annotators: Dave Carter, Chris Fournier and Colette Joubarne.
Hendrickx, Kozareva, Nakov, O Seaghdha, Szpakowicz, Veale