At Loose Ends: Challenges and Opportunities in Lexical Composition Vered Shwartz Natural Language Processing Lab, Bar-Ilan University Talk @ EPFL, January 30, 2019
At Loose Ends:Challenges and Opportunities in Lexical Composition
Vered ShwartzNatural Language Processing Lab, Bar-Ilan University
Talk @ EPFL, January 30, 2019
Representing Phrases
Word representations are pretty much sorted out
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
How to represent a phrase p = w1...wk?Most straightforward:
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
vw1 vw2 vwk, … ,, 𝑓 ( )“The whole is greater than the sum of its parts”1. Meaning shift2. Implicit meaning
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 2 / 39
Representing Phrases
Word representations are pretty much sorted out
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
How to represent a phrase p = w1...wk?
Most straightforward:
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
vw1 vw2 vwk, … ,, 𝑓 ( )“The whole is greater than the sum of its parts”1. Meaning shift2. Implicit meaning
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 2 / 39
Representing Phrases
Word representations are pretty much sorted out
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
How to represent a phrase p = w1...wk?Most straightforward:
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
vw1 vw2 vwk, … ,, 𝑓 ( )
“The whole is greater than the sum of its parts”1. Meaning shift2. Implicit meaning
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 2 / 39
Representing Phrases
Word representations are pretty much sorted out
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
How to represent a phrase p = w1...wk?Most straightforward:
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
vw1 vw2 vwk, … ,, 𝑓 ( )“The whole is greater than the sum of its parts”
1. Meaning shift2. Implicit meaning
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 2 / 39
Representing Phrases
Word representations are pretty much sorted out
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
How to represent a phrase p = w1...wk?Most straightforward:
Sentence with some [w1]
distributional hypothesis
neural magic
vw1best
embeddingsever
vw1 vw2 vwk, … ,, 𝑓 ( )“The whole is greater than the sum of its parts”1. Meaning shift2. Implicit meaning
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 2 / 39
Meaning Shift
A constituent word may beused in a non-literal way
VPC meanings differ fromtheir verbs’ meanings
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 3 / 39
Meaning Shift
A constituent word may beused in a non-literal way
VPC meanings differ fromtheir verbs’ meanings
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 3 / 39
Implicit Meaning
In noun compounds
In adjective-noun compositions
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 4 / 39
Implicit Meaning
In noun compounds In adjective-noun compositions
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 4 / 39
In this talk
1. Testing Existing Text RepresentationsCan they handle the complexity of phrases?
2. Paraphrasing Noun-CompoundsA model for explicating noun compounds through paraphrases
3. Future DirectionsThoughts about the future of phrase representations
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 5 / 39
In this talk
1. Testing Existing Text RepresentationsCan they handle the complexity of phrases?
2. Paraphrasing Noun-CompoundsA model for explicating noun compounds through paraphrases
3. Future DirectionsThoughts about the future of phrase representations
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 5 / 39
In this talk
1. Testing Existing Text RepresentationsCan they handle the complexity of phrases?
2. Paraphrasing Noun-CompoundsA model for explicating noun compounds through paraphrases
3. Future DirectionsThoughts about the future of phrase representations
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 5 / 39
Still a Pain in the Neck:Evaluating Text Representations on Lexical Composition
Vered Shwartz and Ido Dagan
(in submission)
Can existing representations address these phenomena?Probing Tasks
Simple tasks designed to test a single linguistic property[Adi et al., 2017, Conneau et al., 2018]
Representation Minimal Model Prediction
SkipThoughts(s) What is s’s length?InferSent(s) Is w in s?... ...
We follow the same for phrases, with various representations
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 7 / 39
Can existing representations address these phenomena?Probing Tasks
Simple tasks designed to test a single linguistic property[Adi et al., 2017, Conneau et al., 2018]
Representation Minimal Model Prediction
SkipThoughts(s) What is s’s length?InferSent(s) Is w in s?... ...
We follow the same for phrases, with various representations
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 7 / 39
Can existing representations address these phenomena?Probing Tasks
Simple tasks designed to test a single linguistic property[Adi et al., 2017, Conneau et al., 2018]
Representation Minimal Model Prediction
SkipThoughts(s) What is s’s length?InferSent(s) Is w in s?... ...
We follow the same for phrases, with various representations
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 7 / 39
RepresentationsWord Embeddings Sentence Embeddings Contextualized
Word Embeddingsword2vec SkipThoughts ELMoGloVe InferSent∗ OpenAI GPTfastText GenSen∗ BERT
- vector per word - vector per sentence - vector per word- context-agnostic - context-sensitive
- named after charactersfrom Sesame Street
∗ supervised
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 8 / 39
RepresentationsWord Embeddings Sentence Embeddings Contextualized
Word Embeddingsword2vec SkipThoughts ELMoGloVe InferSent∗ OpenAI GPTfastText GenSen∗ BERT- vector per word - vector per sentence - vector per word- context-agnostic - context-sensitive
- named after charactersfrom Sesame Street
∗ supervised
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 8 / 39
RepresentationsWord Embeddings Sentence Embeddings Contextualized
Word Embeddingsword2vec SkipThoughts ELMoGloVe InferSent∗ OpenAI GPTfastText GenSen∗ BERT- vector per word - vector per sentence - vector per word- context-agnostic - context-sensitive
- named after charactersfrom Sesame Street
∗ supervisedVered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 8 / 39
Tasks and Results
Phrase Type Noun Compound Literality Noun Compound RelationsFM1 FN1
0
50
100
Majority
23.8
Majority
62.2
word2vec
0.2
word2vec
20.6
GloVe
0.2
GloVe
32.6
fastText
0.1
fastText
31.4
ELMo
18.8
ELMo
61.5
OAIG
2.1
OpenAIGPT
44.9
BERT
18.8
BERT
61.6
Human
70.6
Human
95.9
0
50
100
Majority
20
word2vec
26.5
GloVe
28.8
fastText
30.3
SkipThoughts
34.2
InferSent
24.9
GenSen
35.5
ELMo
41.8
OpenAIGPT
50
BERT
44
Human
87
Accuracy
Word Embeddings Sentence Embeddings Contextualized
0
50
100
Majority
50
word2vec
60.9
GloVe
60.1
fastText
60.7
SkipThoughts
51.3
InferSent
58.5
GenSen
65.6
ELMo
67
OpenAIGPT
50
BERT
74.2 Human
92
Accuracy
Word Embeddings Sentence Embeddings Contextualized
Adjective-Noun Relations Adjective-Noun Entailment Verb-particle Classification
0
50
100
Majority
46.3
word2vec
41.2
GloVe
36
fastText
45.6
SkipThoughts
47.8
InferSent
51.5
GenSen
49.3
ELMo
43.4
OpenAIGPT
52.9
BERT
50
Human
77
Accuracy
Word Embeddings Sentence Embeddings Contextualized
0
50
100Majority
0
word2vec
36.6
GloVe
20.6 fastText
40.4Skip
Thoughts
23.4 InferSent
48.4
GenSen
55.2
ELMo
45.2
OAIG
14.7
BERT
37.2
Human
74.4
F 1
Word Embeddings Sentence Embeddings Contextualized 0
50
100
Majority
72.3
word2vec
68.6
GloVe
67.9
fastText
70
SkipThoughts
68.6
InferSent
67.9
GenSen
65.7
ELMo
76.4
OpenAIGPT
71.4
BERT
75
Human
82
Accuracy
Word Embeddings Sentence Embeddings Contextualized
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 9 / 39
1. Phrase TypeAuthorities meted out summary justice in cases as this
O B-MW_VPC I-MW_VPC B-MW_NC I-MW_NC O O O O
FM1 FN1
0
50
100
Majority
23.8
Majority
62.2
word2vec
0.2
word2vec
20.6
GloVe
0.2
GloVe
32.6
fastText
0.1
fastText
31.4
ELMo
18.8
ELMo
61.5
OAIG
2.1
OpenAIGPT
44.9
BERT
18.8
BERT
61.6
Human
70.6
Human
95.9
(1) Failure to recognize phrase type; (2) Named entities are easier; (3) Context helps
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 10 / 39
1. Phrase TypeAuthorities meted out summary justice in cases as this
O B-MW_VPC I-MW_VPC B-MW_NC I-MW_NC O O O O
FM1 FN1
0
50
100
Majority
23.8
Majority
62.2
word2vec
0.2word2vec
20.6
GloVe
0.2
GloVe
32.6
fastText
0.1
fastText
31.4
ELMo
18.8
ELMo
61.5
OAIG
2.1
OpenAIGPT
44.9
BERT
18.8
BERT
61.6
Human
70.6
Human
95.9
(1) Failure to recognize phrase type
; (2) Named entities are easier; (3) Context helps
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 10 / 39
1. Phrase TypeAuthorities meted out summary justice in cases as this
O B-MW_VPC I-MW_VPC B-MW_NC I-MW_NC O O O O
FM1 FN1
0
50
100
Majority
23.8
Majority
62.2
word2vec
0.2word2vec
20.6
GloVe
0.2
GloVe
32.6
fastText
0.1
fastText
31.4
ELMo
18.8
ELMo
61.5
OAIG
2.1
OpenAIGPT
44.9
BERT
18.8
BERT
61.6
Human
70.6
Human
95.9
(1) Failure to recognize phrase type; (2) Named entities are easier
; (3) Context helps
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 10 / 39
1. Phrase TypeAuthorities meted out summary justice in cases as this
O B-MW_VPC I-MW_VPC B-MW_NC I-MW_NC O O O O
FM1 FN1
0
50
100
Majority
23.8
Majority
62.2
word2vec
0.2word2vec
20.6
GloVe
0.2
GloVe
32.6
fastText
0.1
fastText
31.4
ELMo
18.8
ELMo
61.5
OAIG
2.1
OpenAIGPT
44.9
BERT
18.8
BERT
61.6
Human
70.6
Human
95.9
(1) Failure to recognize phrase type; (2) Named entities are easier; (3) Context helps
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 10 / 39
2. Noun Compound Literality
The crash course in litigation made me a better lawyer
Non-Literal Literal
0
50
100
Majority
20
word2vec
26.5
GloVe
28.8
fastText
30.3
SkipThoughts
34.2
InferSent
24.9
GenSen
35.5
ELMo
41.8
OpenAIGPT
50
BERT
44
Human
87
Accuracy
Word Embeddings Sentence Embeddings Contextualized
(1) word embeddings < sentence embeddings < contextualized; (2) Far from humans
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 11 / 39
2. Noun Compound Literality
The crash course in litigation made me a better lawyer
Non-Literal Literal
0
50
100
Majority
20
word2vec
26.5
GloVe
28.8
fastText
30.3SkipThoughts
34.2
InferSent
24.9
GenSen
35.5
ELMo
41.8
OpenAIGPT
50
BERT
44
Human
87
Accuracy
Word Embeddings Sentence Embeddings Contextualized
(1) word embeddings < sentence embeddings < contextualized
; (2) Far from humans
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 11 / 39
2. Noun Compound Literality
The crash course in litigation made me a better lawyer
Non-Literal Literal
0
50
100
Majority
20
word2vec
26.5
GloVe
28.8
fastText
30.3SkipThoughts
34.2
InferSent
24.9
GenSen
35.5
ELMo
41.8
OpenAIGPT
50
BERT
44
Human
87
Accuracy
Word Embeddings Sentence Embeddings Contextualized
(1) word embeddings < sentence embeddings < contextualized; (2) Far from humans
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 11 / 39
2. Noun Compound LiteralityAnalysis
ELMo OpenAI GPT BERT
A search team located the [crash]L site and found small amounts of human remains.
landfill body archaeologicalwreckage place burialWeb man wreckcrash missing excavationburial location grave
After a [crash]N course in tactics and maneuvers, the squadron was off to the war...
crash few shortchanging while successfulcollision moment rigoroustraining long briefreversed couple training
(1) Literal: fewer errors(2) BERT > ELMo, both reasonable(3) OpenAI GPT errs due to uni-directionality
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 12 / 39
2. Noun Compound LiteralityAnalysis
ELMo OpenAI GPT BERT
A search team located the [crash]L site and found small amounts of human remains.
landfill body archaeologicalwreckage place burialWeb man wreckcrash missing excavationburial location grave
After a [crash]N course in tactics and maneuvers, the squadron was off to the war...
crash few shortchanging while successfulcollision moment rigoroustraining long briefreversed couple training
(1) Literal: fewer errors
(2) BERT > ELMo, both reasonable(3) OpenAI GPT errs due to uni-directionality
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 12 / 39
2. Noun Compound LiteralityAnalysis
ELMo OpenAI GPT BERT
A search team located the [crash]L site and found small amounts of human remains.
landfill body archaeologicalwreckage place burialWeb man wreckcrash missing excavationburial location grave
After a [crash]N course in tactics and maneuvers, the squadron was off to the war...
crash few shortchanging while successfulcollision moment rigoroustraining long briefreversed couple training
(1) Literal: fewer errors(2) BERT > ELMo, both reasonable(3) OpenAI GPT errs due to uni-directionality
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 12 / 39
2. Noun Compound LiteralityAnalysis
ELMo OpenAI GPT BERT
Growing up with a [silver]N spoon in his mouth, he was always cheerful...
silver mother woodenrubber father greasyiron lot bigtin big silverwooden man little
Things get tougher when both constituent nouns are non-literal!
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 13 / 39
3. Noun Compound Relations
The township is served by three access roads .
Road that makes access possible
Road forecasted for access season
0
50
100
Majority
50
word2vec
60.9
GloVe
60.1
fastText
60.7
SkipThoughts
51.3
InferSent
58.5
GenSen
65.6
ELMo
67
OpenAIGPT
50
BERT
74.2 Human
92
Accuracy
Word Embeddings Sentence Embeddings Contextualized
(1) word embeddings < sentence embeddings < contextualized; (2) Far from humans;(3) Open AI GPT fails
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 14 / 39
3. Noun Compound Relations
The township is served by three access roads .
Road that makes access possible
Road forecasted for access season
0
50
100
Majority
50
word2vec
60.9GloVe
60.1
fastText
60.7
SkipThoughts
51.3
InferSent
58.5
GenSen
65.6
ELMo
67
OpenAIGPT
50
BERT
74.2 Human
92
Accuracy
Word Embeddings Sentence Embeddings Contextualized
(1) word embeddings < sentence embeddings < contextualized
; (2) Far from humans;(3) Open AI GPT fails
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 14 / 39
3. Noun Compound Relations
The township is served by three access roads .
Road that makes access possible
Road forecasted for access season
0
50
100
Majority
50
word2vec
60.9GloVe
60.1
fastText
60.7
SkipThoughts
51.3
InferSent
58.5
GenSen
65.6
ELMo
67
OpenAIGPT
50
BERT
74.2 Human
92
Accuracy
Word Embeddings Sentence Embeddings Contextualized
(1) word embeddings < sentence embeddings < contextualized; (2) Far from humans
;(3) Open AI GPT fails
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 14 / 39
3. Noun Compound Relations
The township is served by three access roads .
Road that makes access possible
Road forecasted for access season
0
50
100
Majority
50
word2vec
60.9GloVe
60.1
fastText
60.7
SkipThoughts
51.3
InferSent
58.5
GenSen
65.6
ELMo
67
OpenAIGPT
50
BERT
74.2 Human
92
Accuracy
Word Embeddings Sentence Embeddings Contextualized
(1) word embeddings < sentence embeddings < contextualized; (2) Far from humans;(3) Open AI GPT failsVered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 14 / 39
3. Noun Compound RelationsAnalysis
stage area
No clear signal from BERT. Capturing implicit information is challenging!
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 15 / 39
4. Adjective-Noun Relations
. . . he receives warm support from his students ...
emotionality
temperature
0
50
100
Majority
46.3
word2vec
41.2
GloVe
36
fastText
45.6
SkipThoughts
47.8
InferSent
51.5
GenSen
49.3
ELMo
43.4
OpenAIGPT
52.9
BERT
50
Human
77
Accuracy
Word Embeddings Sentence Embeddings Contextualized
Best model performs only slightly better than majority
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 16 / 39
4. Adjective-Noun Relations
. . . he receives warm support from his students ...
emotionality
temperature
0
50
100Majority
46.3
word2vec
41.2
GloVe
36
fastText
45.6
SkipThoughts
47.8
InferSent
51.5
GenSen
49.3
ELMo
43.4
OpenAIGPT
52.9
BERT
50
Human
77
Accuracy
Word Embeddings Sentence Embeddings Contextualized
Best model performs only slightly better than majority
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 16 / 39
5. Adjective-Noun EntailmentMost people die in the class to which they were born→
Most people die in the social class to which they were born
0
50
100
Majority
0
word2vec
36.6
GloVe
20.6 fastText
40.4
Skip
Thoughts
23.4 InferSent
48.4
GenSen
55.2
ELMo
45.2
OAIG
14.7
BERT
37.2
Human
74.4
F 1
Word Embeddings Sentence Embeddings Contextualized
(1) Bad performance for all models(2) Best: sentence embeddings trained on RTE
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 17 / 39
5. Adjective-Noun EntailmentMost people die in the class to which they were born→
Most people die in the social class to which they were born
0
50
100
Majority
0
word2vec
36.6
GloVe
20.6 fastText
40.4
Skip
Thoughts
23.4 InferSent
48.4
GenSen
55.2
ELMo
45.2
OAIG
14.7
BERT
37.2
Human
74.4
F 1
Word Embeddings Sentence Embeddings Contextualized
(1) Bad performance for all models
(2) Best: sentence embeddings trained on RTE
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 17 / 39
5. Adjective-Noun EntailmentMost people die in the class to which they were born→
Most people die in the social class to which they were born
0
50
100
Majority
0
word2vec
36.6
GloVe
20.6 fastText
40.4
Skip
Thoughts
23.4 InferSent
48.4
GenSen
55.2
ELMo
45.2
OAIG
14.7
BERT
37.2
Human
74.4
F 1
Word Embeddings Sentence Embeddings Contextualized
(1) Bad performance for all models(2) Best: sentence embeddings trained on RTE
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 17 / 39
6. Verb-Particle Classification
We did get on together Which response did you get on that?VPC Non-VPC
0
50
100
Majority
72.3
word2vec
68.6
GloVe
67.9
fastText
70
SkipThoughts
68.6
InferSent
67.9
GenSen
65.7
ELMo
76.4
OpenAIGPT
71.4
BERT
75
Human
82
Accuracy
Word Embeddings Sentence Embeddings Contextualized
Similar performance for all models.Is the good performance merely due to label imbalance?
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 18 / 39
6. Verb-Particle Classification
We did get on together Which response did you get on that?VPC Non-VPC
0
50
100
Majority
72.3word2vec
68.6
GloVe
67.9
fastText
70
SkipThoughts
68.6
InferSent
67.9
GenSen
65.7
ELMo
76.4
OpenAIGPT
71.4
BERT
75
Human
82
Accuracy
Word Embeddings Sentence Embeddings Contextualized
Similar performance for all models.
Is the good performance merely due to label imbalance?
Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 18 / 39
6. Verb-Particle Classification
We did get on together Which response did you get on that?VPC Non-VPC
0
50
100
Majority
72.3word2vec
68.6
GloVe
67.9
fastText
70
SkipThoughts
68.6
InferSent
67.9
GenSen
65.7
ELMo
76.4
OpenAIGPT
71.4
BERT
75
Human
82
Accuracy
Word Embeddings Sentence Embeddings Contextualized
Similar performance for all models.Is the good performance merely due to label imbalance?Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 18 / 39
6. Verb-Particle ClassificationAnalysis
Weak signal from ELMo. Mostly performs well due to label imbalance.Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 19 / 39
Paraphrase to Explicate:Revealing Implicit Noun-Compound Relations
Vered Shwartz and Ido Dagan
(ACL 2018)
Interpreting Noun-Compounds
Noun compounds are “text compression devices” [Nakov, 2013]
We’re pretty good at decompressing them, even when we seethem for the first time
What is a “parsley cake”?
cake eaten on a parsley?
cake with parsley?
cake for parsley?
... from http://www.bazekalim.com
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 21 / 39
Interpreting Noun-Compounds
Noun compounds are “text compression devices” [Nakov, 2013]We’re pretty good at decompressing them, even when we seethem for the first time
What is a “parsley cake”?
cake eaten on a parsley?
cake with parsley?
cake for parsley?
... from http://www.bazekalim.com
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 21 / 39
Interpreting Noun-Compounds
Noun compounds are “text compression devices” [Nakov, 2013]We’re pretty good at decompressing them, even when we seethem for the first time
What is a “parsley cake”?
cake eaten on a parsley?
cake with parsley?
cake for parsley?
... from http://www.bazekalim.com
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 21 / 39
Interpreting Noun-Compounds
Noun compounds are “text compression devices” [Nakov, 2013]We’re pretty good at decompressing them, even when we seethem for the first time
What is a “parsley cake”?
cake eaten on a parsley?
cake with parsley?
cake for parsley?
...
from http://www.bazekalim.com
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 21 / 39
Interpreting Noun-Compounds
Noun compounds are “text compression devices” [Nakov, 2013]We’re pretty good at decompressing them, even when we seethem for the first time
What is a “parsley cake”?
cake eaten on a parsley?
cake with parsley?
cake for parsley?
... from http://www.bazekalim.com
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 21 / 39
Generalizing Existing Knowledge
What can cake be made of?
Parsley (sort of) fits into this distribution
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 22 / 39
Generalizing Existing Knowledge
What can cake be made of?
Parsley (sort of) fits into this distribution
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 22 / 39
Noun-Compound ParaphrasingGiven a noun-compound w1w2, express the relation between thehead w2 and the modifier w1 with multiple prepositional and verbalparaphrases [Nakov and Hearst, 2006]
olive oil
apple cake
ground attack
[w2] extracted from [w1]
[w2] made of [w1]
[w2] from [w1]
boat whistle
sea bass
[w2] located in [w1]
[w2] live in [w1]
game room
service door
baby oil
[w2] used for [w1]
[w2] for [w1]
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 23 / 39
Prior Methods (1/2)
Based on constituent co-occurrences: “cake made of apple”
Problems:1. Many unseen compounds, no paraphrases in the corpus
rare: parsley cake or highly lexicalized: ice cream
2. Many compounds with just a few paraphrasesCan we infer “cake containing apple” given “cake made of apple”?
Prior work provides partial solutions to either (1) or (2)
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 24 / 39
Prior Methods (1/2)
Based on constituent co-occurrences: “cake made of apple”
Problems:1. Many unseen compounds, no paraphrases in the corpus
rare: parsley cake or highly lexicalized: ice cream
2. Many compounds with just a few paraphrasesCan we infer “cake containing apple” given “cake made of apple”?
Prior work provides partial solutions to either (1) or (2)
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 24 / 39
Prior Methods (1/2)
Based on constituent co-occurrences: “cake made of apple”
Problems:1. Many unseen compounds, no paraphrases in the corpus
rare: parsley cake or highly lexicalized: ice cream
2. Many compounds with just a few paraphrasesCan we infer “cake containing apple” given “cake made of apple”?
Prior work provides partial solutions to either (1) or (2)
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 24 / 39
Prior Methods (1/2)
Based on constituent co-occurrences: “cake made of apple”
Problems:1. Many unseen compounds, no paraphrases in the corpus
rare: parsley cake or highly lexicalized: ice cream
2. Many compounds with just a few paraphrasesCan we infer “cake containing apple” given “cake made of apple”?
Prior work provides partial solutions to either (1) or (2)
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 24 / 39
Prior Methods (2/2)
1. MELODI [Van de Cruys et al., 2013]:Represent NC using compositional distributional representations
Predict paraphrase templates given NC vectorGeneralizes for similar unseen NCs, e.g. pear tart
2. IIITH [Surtani et al., 2013]:Learn “is-a” relations between paraphrases:e.g. “[w2] extracted from [w1]” ⊂ “[w2] made of [w1]”
Our solution: multi-task learning to address both problems
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 25 / 39
Prior Methods (2/2)
1. MELODI [Van de Cruys et al., 2013]:Represent NC using compositional distributional representationsPredict paraphrase templates given NC vector
Generalizes for similar unseen NCs, e.g. pear tart
2. IIITH [Surtani et al., 2013]:Learn “is-a” relations between paraphrases:e.g. “[w2] extracted from [w1]” ⊂ “[w2] made of [w1]”
Our solution: multi-task learning to address both problems
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 25 / 39
Prior Methods (2/2)
1. MELODI [Van de Cruys et al., 2013]:Represent NC using compositional distributional representationsPredict paraphrase templates given NC vectorGeneralizes for similar unseen NCs, e.g. pear tart
2. IIITH [Surtani et al., 2013]:Learn “is-a” relations between paraphrases:e.g. “[w2] extracted from [w1]” ⊂ “[w2] made of [w1]”
Our solution: multi-task learning to address both problems
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 25 / 39
Prior Methods (2/2)
1. MELODI [Van de Cruys et al., 2013]:Represent NC using compositional distributional representationsPredict paraphrase templates given NC vectorGeneralizes for similar unseen NCs, e.g. pear tart
2. IIITH [Surtani et al., 2013]:Learn “is-a” relations between paraphrases:e.g. “[w2] extracted from [w1]” ⊂ “[w2] made of [w1]”
Our solution: multi-task learning to address both problems
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 25 / 39
Prior Methods (2/2)
1. MELODI [Van de Cruys et al., 2013]:Represent NC using compositional distributional representationsPredict paraphrase templates given NC vectorGeneralizes for similar unseen NCs, e.g. pear tart
2. IIITH [Surtani et al., 2013]:Learn “is-a” relations between paraphrases:e.g. “[w2] extracted from [w1]” ⊂ “[w2] made of [w1]”
Our solution: multi-task learning to address both problems
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 25 / 39
Multi-task Reformulation
Training example {w1 = apple, w2 = cake, p = “[w2] made of [w1]”}
1. Predict a paraphrase p for a given NC w1w2:What is the relation between apple and cake?
2. Predict w1 given a paraphrase p and w2:What can cake be made of?
3. Predict w2 given a paraphrase p and w1:What can be made of apple?
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 26 / 39
Multi-task Reformulation
Training example {w1 = apple, w2 = cake, p = “[w2] made of [w1]”}
1. Predict a paraphrase p for a given NC w1w2:What is the relation between apple and cake?
2. Predict w1 given a paraphrase p and w2:What can cake be made of?
3. Predict w2 given a paraphrase p and w1:What can be made of apple?
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 26 / 39
Multi-task Reformulation
Training example {w1 = apple, w2 = cake, p = “[w2] made of [w1]”}
1. Predict a paraphrase p for a given NC w1w2:What is the relation between apple and cake?
2. Predict w1 given a paraphrase p and w2:What can cake be made of?
3. Predict w2 given a paraphrase p and w1:What can be made of apple?
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 26 / 39
Multi-task Reformulation
Training example {w1 = apple, w2 = cake, p = “[w2] made of [w1]”}
1. Predict a paraphrase p for a given NC w1w2:What is the relation between apple and cake?
2. Predict w1 given a paraphrase p and w2:What can cake be made of?
3. Predict w2 given a paraphrase p and w1:What can be made of apple?
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 26 / 39
Main Task (1): Predicting ParaphrasesWhat is the relation between apple and cake?
(23) made
(28) apple
(4145) cake...
(7891) of
(1) [w1]
(2) [w2]
(3) [p]
(78) [w2] containing [w1]...
(131) [w2] made of [w1]...
[p]cake apple
MLPp
pi = 78
Encode placeholder [p] in “cake [p] apple” using biLSTM
Predict an index in the paraphrase vocabularyFixed word embeddings, learned placeholder embeddings(1) Generalizes NCs: pear tart expected to yield similar results
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 27 / 39
Main Task (1): Predicting ParaphrasesWhat is the relation between apple and cake?
(23) made
(28) apple
(4145) cake...
(7891) of
(1) [w1]
(2) [w2]
(3) [p]
(78) [w2] containing [w1]...
(131) [w2] made of [w1]...
[p]cake apple
MLPp
pi = 78
Encode placeholder [p] in “cake [p] apple” using biLSTMPredict an index in the paraphrase vocabulary
Fixed word embeddings, learned placeholder embeddings(1) Generalizes NCs: pear tart expected to yield similar results
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 27 / 39
Main Task (1): Predicting ParaphrasesWhat is the relation between apple and cake?
(23) made
(28) apple
(4145) cake...
(7891) of
(1) [w1]
(2) [w2]
(3) [p]
(78) [w2] containing [w1]...
(131) [w2] made of [w1]...
[p]cake apple
MLPp
pi = 78
Encode placeholder [p] in “cake [p] apple” using biLSTMPredict an index in the paraphrase vocabularyFixed word embeddings, learned placeholder embeddings
(1) Generalizes NCs: pear tart expected to yield similar results
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 27 / 39
Main Task (1): Predicting ParaphrasesWhat is the relation between apple and cake?
(23) made
(28) apple
(4145) cake...
(7891) of
(1) [w1]
(2) [w2]
(3) [p]
(78) [w2] containing [w1]...
(131) [w2] made of [w1]...
[p]cake apple
MLPp
pi = 78
Encode placeholder [p] in “cake [p] apple” using biLSTMPredict an index in the paraphrase vocabularyFixed word embeddings, learned placeholder embeddings(1) Generalizes NCs: pear tart expected to yield similar results
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 27 / 39
Helper Task (2): Predicting Missing ConstituentsWhat can cake be made of?
(23) made
(28) apple
(4145) cake...
(7891) of
(1) [w1]
(2) [w2]
(3) [p]
ofcake made [w1]
MLPw
w1i = 28
Encode placeholder in “cake made of [w1]” using biLSTM
Predict an index in the word vocabulary(2) Generalizes paraphrases:
“[w2] containing [w1]” expected to yield similar results
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 28 / 39
Helper Task (2): Predicting Missing ConstituentsWhat can cake be made of?
(23) made
(28) apple
(4145) cake...
(7891) of
(1) [w1]
(2) [w2]
(3) [p]
ofcake made [w1]
MLPw
w1i = 28
Encode placeholder in “cake made of [w1]” using biLSTMPredict an index in the word vocabulary
(2) Generalizes paraphrases:“[w2] containing [w1]” expected to yield similar results
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 28 / 39
Helper Task (2): Predicting Missing ConstituentsWhat can cake be made of?
(23) made
(28) apple
(4145) cake...
(7891) of
(1) [w1]
(2) [w2]
(3) [p]
ofcake made [w1]
MLPw
w1i = 28
Encode placeholder in “cake made of [w1]” using biLSTMPredict an index in the word vocabulary(2) Generalizes paraphrases:
“[w2] containing [w1]” expected to yield similar results
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 28 / 39
Evaluation
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 29 / 39
Evaluation Setting
Available dataset: SemEval 2013 task 4 [Hendrickx et al., 2013]Semi-supervised: infer templates of POS tags (e.g. “[w2] verbprep [w1]”) from training data, use Google N-grams to generatetraining data
A ranking rather than a retrieval taskSystems expected to return a ranked list of paraphrases for eachnoun compoundWe implemented a ranking model that re-ranks the top kparaphrases retrieved by the model
Evaluation: based on n-gram overlap, provided evaluation scriptGold paraphrase score: how many annotators suggested it?
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 30 / 39
Evaluation Setting
Available dataset: SemEval 2013 task 4 [Hendrickx et al., 2013]Semi-supervised: infer templates of POS tags (e.g. “[w2] verbprep [w1]”) from training data, use Google N-grams to generatetraining data
A ranking rather than a retrieval taskSystems expected to return a ranked list of paraphrases for eachnoun compound
We implemented a ranking model that re-ranks the top kparaphrases retrieved by the model
Evaluation: based on n-gram overlap, provided evaluation scriptGold paraphrase score: how many annotators suggested it?
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 30 / 39
Evaluation Setting
Available dataset: SemEval 2013 task 4 [Hendrickx et al., 2013]Semi-supervised: infer templates of POS tags (e.g. “[w2] verbprep [w1]”) from training data, use Google N-grams to generatetraining data
A ranking rather than a retrieval taskSystems expected to return a ranked list of paraphrases for eachnoun compoundWe implemented a ranking model that re-ranks the top kparaphrases retrieved by the model
Evaluation: based on n-gram overlap, provided evaluation scriptGold paraphrase score: how many annotators suggested it?
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 30 / 39
Evaluation Setting
Available dataset: SemEval 2013 task 4 [Hendrickx et al., 2013]Semi-supervised: infer templates of POS tags (e.g. “[w2] verbprep [w1]”) from training data, use Google N-grams to generatetraining data
A ranking rather than a retrieval taskSystems expected to return a ranked list of paraphrases for eachnoun compoundWe implemented a ranking model that re-ranks the top kparaphrases retrieved by the model
Evaluation: based on n-gram overlap, provided evaluation scriptGold paraphrase score: how many annotators suggested it?
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 30 / 39
Results
non-isomorphic isomorphic
20
40
6054.8
13
40.6
13.8
17.9
23.123.125.8
28.4 28.2
MELODI [Van de Cruys et al., 2013]
SemEval 2013 Baseline [Hendrickx et al., 2013]
SFS [Versley, 2013]
IIITH [Surtani et al., 2013]
PaNiC [Shwartz and Dagan, 2018]
rewards recalland precision
rewardsonly precision
“conservative”models
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 31 / 39
Results
non-isomorphic isomorphic
20
40
6054.8
13
40.6
13.8
17.9
23.123.125.8
28.4 28.2
MELODI [Van de Cruys et al., 2013]
SemEval 2013 Baseline [Hendrickx et al., 2013]
SFS [Versley, 2013]
IIITH [Surtani et al., 2013]
PaNiC [Shwartz and Dagan, 2018]
rewards recalland precision
rewardsonly precision
“conservative”models
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 31 / 39
Results
non-isomorphic isomorphic
20
40
6054.8
13
40.6
13.8
17.9
23.123.125.8
28.4 28.2
MELODI [Van de Cruys et al., 2013]
SemEval 2013 Baseline [Hendrickx et al., 2013]
SFS [Versley, 2013]
IIITH [Surtani et al., 2013]
PaNiC [Shwartz and Dagan, 2018]
rewards recalland precision
rewardsonly precision
“conservative”models
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 31 / 39
Results
non-isomorphic isomorphic
20
40
6054.8
13
40.6
13.8
17.9
23.123.125.8
28.4 28.2
MELODI [Van de Cruys et al., 2013]
SemEval 2013 Baseline [Hendrickx et al., 2013]
SFS [Versley, 2013]
IIITH [Surtani et al., 2013]
PaNiC [Shwartz and Dagan, 2018]
rewards recalland precision
rewardsonly precision
“conservative”models
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 31 / 39
Error AnalysisFalse Positive
(1)
44%
(2)15%
(3)
14%
(4)
8%
(5)
5%(6)
14%
1. Valid, missing from gold-standard(“discussion by group”)
2. Too specific(“life of women in community”)
3. Incorrect prepositionsE.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”
4. Syntactic errors5. Borderline grammatical(“force of coalition forces”)
6. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 32 / 39
Error AnalysisFalse Positive
(1)
44%
(2)15%
(3)
14%
(4)
8%
(5)
5%(6)
14%
1. Valid, missing from gold-standard(“discussion by group”)
2. Too specific(“life of women in community”)
3. Incorrect prepositionsE.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”
4. Syntactic errors5. Borderline grammatical(“force of coalition forces”)
6. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 32 / 39
Error AnalysisFalse Positive
(1)
44%
(2)15%
(3)
14%
(4)
8%
(5)
5%(6)
14%
1. Valid, missing from gold-standard(“discussion by group”)
2. Too specific(“life of women in community”)
3. Incorrect prepositions
E.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”
4. Syntactic errors5. Borderline grammatical(“force of coalition forces”)
6. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 32 / 39
Error AnalysisFalse Positive
(1)
44%
(2)15%
(3)
14%
(4)
8%
(5)
5%(6)
14%
1. Valid, missing from gold-standard(“discussion by group”)
2. Too specific(“life of women in community”)
3. Incorrect prepositionsE.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”
4. Syntactic errors5. Borderline grammatical(“force of coalition forces”)
6. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 32 / 39
Error AnalysisFalse Positive
(1)
44%
(2)15%
(3)
14%
(4)
8%
(5)
5%(6)
14%
1. Valid, missing from gold-standard(“discussion by group”)
2. Too specific(“life of women in community”)
3. Incorrect prepositionsE.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”
4. Syntactic errors
5. Borderline grammatical(“force of coalition forces”)
6. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 32 / 39
Error AnalysisFalse Positive
(1)
44%
(2)15%
(3)
14%
(4)
8%
(5)
5%(6)
14%
1. Valid, missing from gold-standard(“discussion by group”)
2. Too specific(“life of women in community”)
3. Incorrect prepositionsE.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”
4. Syntactic errors5. Borderline grammatical(“force of coalition forces”)
6. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 32 / 39
Error AnalysisFalse Positive
(1)
44%
(2)15%
(3)
14%
(4)
8%
(5)
5%(6)
14%
1. Valid, missing from gold-standard(“discussion by group”)
2. Too specific(“life of women in community”)
3. Incorrect prepositionsE.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”
4. Syntactic errors5. Borderline grammatical(“force of coalition forces”)
6. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 32 / 39
Error AnalysisFalse Negative
(1)
30%(2)
25%
(3)
10%
(4)
35%
1. Long paraphrase (n > 5)
2. Determiners(“mutation of a gene”)
3. Inflected constituents(“holding of shares”)
4. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 33 / 39
Error AnalysisFalse Negative
(1)
30%(2)
25%
(3)
10%
(4)
35%
1. Long paraphrase (n > 5)2. Determiners(“mutation of a gene”)
3. Inflected constituents(“holding of shares”)
4. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 33 / 39
Error AnalysisFalse Negative
(1)
30%(2)
25%
(3)
10%
(4)
35%
1. Long paraphrase (n > 5)2. Determiners(“mutation of a gene”)
3. Inflected constituents(“holding of shares”)
4. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 33 / 39
Error AnalysisFalse Negative
(1)
30%(2)
25%
(3)
10%
(4)
35%
1. Long paraphrase (n > 5)2. Determiners(“mutation of a gene”)
3. Inflected constituents(“holding of shares”)
4. Other errors
Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 33 / 39
Future Directions
Can we learn phrase meanings like humans do?
[Cooper, 1999]: how do L2 learners process idioms?Infer from context: 28% (57% success rate)Rely on literal meaning: 19% (22% success rate)...
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 35 / 39
Inferring from context
We need “extended” contexts[Asl, 2013]: more successful idiominterpretation with extendedcontexts (stories)
We need richer context modelingCharacters in the storyRelationships between themDialogues...
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 36 / 39
Inferring from context
We need “extended” contexts[Asl, 2013]: more successful idiominterpretation with extendedcontexts (stories)
We need richer context modelingCharacters in the storyRelationships between themDialogues...
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 36 / 39
Relying on literal meaning
“Robert knew he was robbing the cradle by dating a sixteen-year-old girl”
We need world knowledge“Cradle is something you put thebaby in”
We need to be able to reason“You’re stealing a child from amother”
“So robbing the cradle is like datinga really young person”
[Cooper, 1999]
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 37 / 39
Relying on literal meaning
“Robert knew he was robbing the cradle by dating a sixteen-year-old girl”
We need world knowledge“Cradle is something you put thebaby in”
We need to be able to reason“You’re stealing a child from amother”
“So robbing the cradle is like datinga really young person”
[Cooper, 1999]
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 37 / 39
Recap
1. Testing Existing Pre-trained RepresentationsContextualized word embeddings provide better phraserepresentations, but there is still a long way to go
2. Paraphrasing Noun-CompoundsRepresentations of compositional phrases can rely upon andgeneralize existing knowledge about similar concepts
3. Future DirectionsTo represent phrases like humans do, we need better context andworld knowledge modeling
Thank you!
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 38 / 39
Recap
1. Testing Existing Pre-trained RepresentationsContextualized word embeddings provide better phraserepresentations, but there is still a long way to go
2. Paraphrasing Noun-CompoundsRepresentations of compositional phrases can rely upon andgeneralize existing knowledge about similar concepts
3. Future DirectionsTo represent phrases like humans do, we need better context andworld knowledge modeling
Thank you!
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 38 / 39
Recap
1. Testing Existing Pre-trained RepresentationsContextualized word embeddings provide better phraserepresentations, but there is still a long way to go
2. Paraphrasing Noun-CompoundsRepresentations of compositional phrases can rely upon andgeneralize existing knowledge about similar concepts
3. Future DirectionsTo represent phrases like humans do, we need better context andworld knowledge modeling
Thank you!
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 38 / 39
Recap
1. Testing Existing Pre-trained RepresentationsContextualized word embeddings provide better phraserepresentations, but there is still a long way to go
2. Paraphrasing Noun-CompoundsRepresentations of compositional phrases can rely upon andgeneralize existing knowledge about similar concepts
3. Future DirectionsTo represent phrases like humans do, we need better context andworld knowledge modeling
Thank you!
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 38 / 39
References I[Adi et al., 2017] Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., and Goldberg, Y. (2017). Fine-grained analysis of sentence
embeddings using auxiliary prediction tasks. In Proceedings of ICLR Conference Track.
[Asl, 2013] Asl, F. M. (2013). The impact of context on learning idioms in efl classes. TESOL Journal, 37(1):2.
[Conneau et al., 2018] Conneau, A., Kruszewski, G., Lample, G., Barrault, L., and Baroni, M. (2018). What you can cram into asingle vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of theAssociation for Computational Linguistics (Volume 1: Long Papers), pages 2126–2136. Association for ComputationalLinguistics.
[Cooper, 1999] Cooper, T. C. (1999). Processing of idioms by l2 learners of english. TESOL quarterly, 33(2):233–262.
[Hendrickx et al., 2013] Hendrickx, I., Kozareva, Z., Nakov, P., Ó Séaghdha, D., Szpakowicz, S., and Veale, T. (2013).Semeval-2013 task 4: Free paraphrases of noun compounds. In SemEval, pages 138–143.
[Nakov, 2013] Nakov, P. (2013). On the interpretation of noun compounds: Syntax, semantics, and entailment. NaturalLanguage Engineering, 19(03):291–330.
[Nakov and Hearst, 2006] Nakov, P. and Hearst, M. (2006). Using verbs to characterize noun-noun relations. In InternationalConference on Artificial Intelligence: Methodology, Systems, and Applications, pages 233–244. Springer.
[Shwartz and Dagan, 2018] Shwartz, V. and Dagan, I. (2018). Paraphrase to explicate: Revealing implicit noun-compoundrelations. In ACL, Melbourne, Australia.
[Surtani et al., 2013] Surtani, N., Batra, A., Ghosh, U., and Paul, S. (2013). Iiit-h: A corpus-driven co-occurrence basedprobabilistic model for noun compound paraphrasing. In SemEval, pages 153–157.
[Van de Cruys et al., 2013] Van de Cruys, T., Afantenos, S., and Muller, P. (2013). Melodi: A supervised distributional approachfor free paraphrasing of noun compounds. In SemEval, pages 144–147.
[Versley, 2013] Versley, Y. (2013). Sfs-tue: Compound paraphrasing with a language model and discriminative reranking. InSemEval, pages 148–152.
Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 39 / 39