At Loose EndsNatural Language Processing Lab, Bar-Ilan University Talk @ EPFL, January 30, 2019. Representing Phrases Word representations are pretty much sorted out ... The crash

At Loose Ends:Challenges and Opportunities in Lexical Composition

Vered ShwartzNatural Language Processing Lab, Bar-Ilan University

Talk @ EPFL, January 30, 2019

Representing Phrases

Word representations are pretty much sorted out

Sentence with some [w1]

distributional hypothesis

neural magic

vw1best

embeddingsever

How to represent a phrase p = w1...wk?Most straightforward:



neural magic

vw1best

embeddingsever

vw1 vw2 vwk, … ,, 𝑓 ( )“The whole is greater than the sum of its parts”1. Meaning shift2. Implicit meaning

Vered Shwartz · MWUs Under the Magnifying Glass · January 2019 2 / 39





neural magic

vw1best

embeddingsever

How to represent a phrase p = w1...wk?

Most straightforward:



neural magic

vw1best

embeddingsever







neural magic

vw1best

embeddingsever




neural magic

vw1best

embeddingsever

vw1 vw2 vwk, … ,, 𝑓 ( )

“The whole is greater than the sum of its parts”1. Meaning shift2. Implicit meaning






neural magic

vw1best

embeddingsever




neural magic

vw1best

embeddingsever

vw1 vw2 vwk, … ,, 𝑓 ( )“The whole is greater than the sum of its parts”

1. Meaning shift2. Implicit meaning






neural magic

vw1best

embeddingsever




neural magic

vw1best

embeddingsever



Meaning Shift

A constituent word may beused in a non-literal way

VPC meanings differ fromtheir verbs’ meanings


Meaning Shift

A constituent word may beused in a non-literal way

VPC meanings differ fromtheir verbs’ meanings


Implicit Meaning

In noun compounds

In adjective-noun compositions


Implicit Meaning

In noun compounds In adjective-noun compositions


In this talk

1. Testing Existing Text RepresentationsCan they handle the complexity of phrases?

2. Paraphrasing Noun-CompoundsA model for explicating noun compounds through paraphrases

3. Future DirectionsThoughts about the future of phrase representations


In this talk





In this talk





Still a Pain in the Neck:Evaluating Text Representations on Lexical Composition

Vered Shwartz and Ido Dagan

(in submission)

Can existing representations address these phenomena?Probing Tasks

Simple tasks designed to test a single linguistic property[Adi et al., 2017, Conneau et al., 2018]

Representation Minimal Model Prediction

SkipThoughts(s) What is s’s length?InferSent(s) Is w in s?... ...

We follow the same for phrases, with various representations

Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 7 / 39













RepresentationsWord Embeddings Sentence Embeddings Contextualized

Word Embeddingsword2vec SkipThoughts ELMoGloVe InferSent∗ OpenAI GPTfastText GenSen∗ BERT

- vector per word - vector per sentence - vector per word- context-agnostic - context-sensitive

- named after charactersfrom Sesame Street

∗ supervised



Word Embeddingsword2vec SkipThoughts ELMoGloVe InferSent∗ OpenAI GPTfastText GenSen∗ BERT- vector per word - vector per sentence - vector per word- context-agnostic - context-sensitive


∗ supervised



Word Embeddingsword2vec SkipThoughts ELMoGloVe InferSent∗ OpenAI GPTfastText GenSen∗ BERT- vector per word - vector per sentence - vector per word- context-agnostic - context-sensitive


∗ supervisedVered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 8 / 39

Tasks and Results

Phrase Type Noun Compound Literality Noun Compound RelationsFM1 FN1

0

50

100

Majority

23.8

Majority

62.2

word2vec

0.2

word2vec

20.6

GloVe

0.2

GloVe

32.6

fastText

0.1

fastText

31.4

ELMo

18.8

ELMo

61.5

OAIG

2.1

OpenAIGPT

44.9

BERT

18.8

BERT

61.6

Human

70.6

Human

95.9

0

50

100

Majority

20

word2vec

26.5

GloVe

28.8

fastText

30.3

SkipThoughts

34.2

InferSent

24.9

GenSen

35.5

ELMo

41.8

OpenAIGPT

50

BERT

44

Human

87

Accuracy

Word Embeddings Sentence Embeddings Contextualized

0

50

100

Majority

50

word2vec

60.9

GloVe

60.1

fastText

60.7

SkipThoughts

51.3

InferSent

58.5

GenSen

65.6

ELMo

67

OpenAIGPT

50

BERT

74.2 Human

92

Accuracy


Adjective-Noun Relations Adjective-Noun Entailment Verb-particle Classification

0

50

100

Majority

46.3

word2vec

41.2

GloVe

36

fastText

45.6

SkipThoughts

47.8

InferSent

51.5

GenSen

49.3

ELMo

43.4

OpenAIGPT

52.9

BERT

50

Human

77

Accuracy


0

50

100Majority

0

word2vec

36.6

GloVe

20.6 fastText

40.4Skip

Thoughts

23.4 InferSent

48.4

GenSen

55.2

ELMo

45.2

OAIG

14.7

BERT

37.2

Human

74.4

F 1

Word Embeddings Sentence Embeddings Contextualized 0

50

100

Majority

72.3

word2vec

68.6

GloVe

67.9

fastText

70

SkipThoughts

68.6

InferSent

67.9

GenSen

65.7

ELMo

76.4

OpenAIGPT

71.4

BERT

75

Human

82

Accuracy



1. Phrase TypeAuthorities meted out summary justice in cases as this

O B-MW_VPC I-MW_VPC B-MW_NC I-MW_NC O O O O

FM1 FN1

0

50

100

Majority

23.8

Majority

62.2

word2vec

0.2

word2vec

20.6

GloVe

0.2

GloVe

32.6

fastText

0.1

fastText

31.4

ELMo

18.8

ELMo

61.5

OAIG

2.1

OpenAIGPT

44.9

BERT

18.8

BERT

61.6

Human

70.6

Human

95.9

(1) Failure to recognize phrase type; (2) Named entities are easier; (3) Context helps




FM1 FN1

0

50

100

Majority

23.8

Majority

62.2

word2vec

0.2word2vec

20.6

GloVe

0.2

GloVe

32.6

fastText

0.1

fastText

31.4

ELMo

18.8

ELMo

61.5

OAIG

2.1

OpenAIGPT

44.9

BERT

18.8

BERT

61.6

Human

70.6

Human

95.9

(1) Failure to recognize phrase type

; (2) Named entities are easier; (3) Context helps




FM1 FN1

0

50

100

Majority

23.8

Majority

62.2

word2vec

0.2word2vec

20.6

GloVe

0.2

GloVe

32.6

fastText

0.1

fastText

31.4

ELMo

18.8

ELMo

61.5

OAIG

2.1

OpenAIGPT

44.9

BERT

18.8

BERT

61.6

Human

70.6

Human

95.9

(1) Failure to recognize phrase type; (2) Named entities are easier

; (3) Context helps




FM1 FN1

0

50

100

Majority

23.8

Majority

62.2

word2vec

0.2word2vec

20.6

GloVe

0.2

GloVe

32.6

fastText

0.1

fastText

31.4

ELMo

18.8

ELMo

61.5

OAIG

2.1

OpenAIGPT

44.9

BERT

18.8

BERT

61.6

Human

70.6

Human

95.9

(1) Failure to recognize phrase type; (2) Named entities are easier; (3) Context helps


2. Noun Compound Literality

The crash course in litigation made me a better lawyer

Non-Literal Literal

0

50

100

Majority

20

word2vec

26.5

GloVe

28.8

fastText

30.3

SkipThoughts

34.2

InferSent

24.9

GenSen

35.5

ELMo

41.8

OpenAIGPT

50

BERT

44

Human

87

Accuracy


(1) word embeddings < sentence embeddings < contextualized; (2) Far from humans




Non-Literal Literal

0

50

100

Majority

20

word2vec

26.5

GloVe

28.8

fastText

30.3SkipThoughts

34.2

InferSent

24.9

GenSen

35.5

ELMo

41.8

OpenAIGPT

50

BERT

44

Human

87

Accuracy


(1) word embeddings < sentence embeddings < contextualized

; (2) Far from humans




Non-Literal Literal

0

50

100

Majority

20

word2vec

26.5

GloVe

28.8

fastText

30.3SkipThoughts

34.2

InferSent

24.9

GenSen

35.5

ELMo

41.8

OpenAIGPT

50

BERT

44

Human

87

Accuracy




2. Noun Compound LiteralityAnalysis

ELMo OpenAI GPT BERT

A search team located the [crash]L site and found small amounts of human remains.

landfill body archaeologicalwreckage place burialWeb man wreckcrash missing excavationburial location grave

After a [crash]N course in tactics and maneuvers, the squadron was off to the war...

crash few shortchanging while successfulcollision moment rigoroustraining long briefreversed couple training

(1) Literal: fewer errors(2) BERT > ELMo, both reasonable(3) OpenAI GPT errs due to uni-directionality








(1) Literal: fewer errors

(2) BERT > ELMo, both reasonable(3) OpenAI GPT errs due to uni-directionality








(1) Literal: fewer errors(2) BERT > ELMo, both reasonable(3) OpenAI GPT errs due to uni-directionality




Growing up with a [silver]N spoon in his mouth, he was always cheerful...

silver mother woodenrubber father greasyiron lot bigtin big silverwooden man little

Things get tougher when both constituent nouns are non-literal!


3. Noun Compound Relations

The township is served by three access roads .

Road that makes access possible

Road forecasted for access season

0

50

100

Majority

50

word2vec

60.9

GloVe

60.1

fastText

60.7

SkipThoughts

51.3

InferSent

58.5

GenSen

65.6

ELMo

67

OpenAIGPT

50

BERT

74.2 Human

92

Accuracy


(1) word embeddings < sentence embeddings < contextualized; (2) Far from humans;(3) Open AI GPT fails






0

50

100

Majority

50

word2vec

60.9GloVe

60.1

fastText

60.7

SkipThoughts

51.3

InferSent

58.5

GenSen

65.6

ELMo

67

OpenAIGPT

50

BERT

74.2 Human

92

Accuracy


(1) word embeddings < sentence embeddings < contextualized

; (2) Far from humans;(3) Open AI GPT fails






0

50

100

Majority

50

word2vec

60.9GloVe

60.1

fastText

60.7

SkipThoughts

51.3

InferSent

58.5

GenSen

65.6

ELMo

67

OpenAIGPT

50

BERT

74.2 Human

92

Accuracy



;(3) Open AI GPT fails






0

50

100

Majority

50

word2vec

60.9GloVe

60.1

fastText

60.7

SkipThoughts

51.3

InferSent

58.5

GenSen

65.6

ELMo

67

OpenAIGPT

50

BERT

74.2 Human

92

Accuracy


(1) word embeddings < sentence embeddings < contextualized; (2) Far from humans;(3) Open AI GPT failsVered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 14 / 39

3. Noun Compound RelationsAnalysis

stage area

No clear signal from BERT. Capturing implicit information is challenging!


4. Adjective-Noun Relations

. . . he receives warm support from his students ...

emotionality

temperature

0

50

100

Majority

46.3

word2vec

41.2

GloVe

36

fastText

45.6

SkipThoughts

47.8

InferSent

51.5

GenSen

49.3

ELMo

43.4

OpenAIGPT

52.9

BERT

50

Human

77

Accuracy


Best model performs only slightly better than majority


4. Adjective-Noun Relations

. . . he receives warm support from his students ...

emotionality

temperature

0

50

100Majority

46.3

word2vec

41.2

GloVe

36

fastText

45.6

SkipThoughts

47.8

InferSent

51.5

GenSen

49.3

ELMo

43.4

OpenAIGPT

52.9

BERT

50

Human

77

Accuracy


Best model performs only slightly better than majority


5. Adjective-Noun EntailmentMost people die in the class to which they were born→

Most people die in the social class to which they were born

0

50

100

Majority

0

word2vec

36.6

GloVe

20.6 fastText

40.4

Skip

Thoughts

23.4 InferSent

48.4

GenSen

55.2

ELMo

45.2

OAIG

14.7

BERT

37.2

Human

74.4

F 1


(1) Bad performance for all models(2) Best: sentence embeddings trained on RTE




0

50

100

Majority

0

word2vec

36.6

GloVe

20.6 fastText

40.4

Skip

Thoughts

23.4 InferSent

48.4

GenSen

55.2

ELMo

45.2

OAIG

14.7

BERT

37.2

Human

74.4

F 1


(1) Bad performance for all models

(2) Best: sentence embeddings trained on RTE




0

50

100

Majority

0

word2vec

36.6

GloVe

20.6 fastText

40.4

Skip

Thoughts

23.4 InferSent

48.4

GenSen

55.2

ELMo

45.2

OAIG

14.7

BERT

37.2

Human

74.4

F 1


(1) Bad performance for all models(2) Best: sentence embeddings trained on RTE


6. Verb-Particle Classification

We did get on together Which response did you get on that?VPC Non-VPC

0

50

100

Majority

72.3

word2vec

68.6

GloVe

67.9

fastText

70

SkipThoughts

68.6

InferSent

67.9

GenSen

65.7

ELMo

76.4

OpenAIGPT

71.4

BERT

75

Human

82

Accuracy


Similar performance for all models.Is the good performance merely due to label imbalance?




0

50

100

Majority

72.3word2vec

68.6

GloVe

67.9

fastText

70

SkipThoughts

68.6

InferSent

67.9

GenSen

65.7

ELMo

76.4

OpenAIGPT

71.4

BERT

75

Human

82

Accuracy


Similar performance for all models.

Is the good performance merely due to label imbalance?




0

50

100

Majority

72.3word2vec

68.6

GloVe

67.9

fastText

70

SkipThoughts

68.6

InferSent

67.9

GenSen

65.7

ELMo

76.4

OpenAIGPT

71.4

BERT

75

Human

82

Accuracy


Similar performance for all models.Is the good performance merely due to label imbalance?Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 18 / 39

6. Verb-Particle ClassificationAnalysis

Weak signal from ELMo. Mostly performs well due to label imbalance.Vered Shwartz and Ido Dagan · Evaluating Text Representations on Lexical Composition 19 / 39

Paraphrase to Explicate:Revealing Implicit Noun-Compound Relations

Vered Shwartz and Ido Dagan

(ACL 2018)

Interpreting Noun-Compounds

Noun compounds are “text compression devices” [Nakov, 2013]

We’re pretty good at decompressing them, even when we seethem for the first time

What is a “parsley cake”?

cake eaten on a parsley?

cake with parsley?

cake for parsley?

... from http://www.bazekalim.com

Vered Shwartz and Ido Dagan · Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations · ACL 2018 21 / 39

http://www.bazekalim.com


Noun compounds are “text compression devices” [Nakov, 2013]We’re pretty good at decompressing them, even when we seethem for the first time



cake with parsley?

cake for parsley?








cake with parsley?

cake for parsley?








cake with parsley?

cake for parsley?

...

from http://www.bazekalim.com







cake with parsley?

cake for parsley?




Generalizing Existing Knowledge

What can cake be made of?

Parsley (sort of) fits into this distribution


Generalizing Existing Knowledge

What can cake be made of?

Parsley (sort of) fits into this distribution


Noun-Compound ParaphrasingGiven a noun-compound w1w2, express the relation between thehead w2 and the modifier w1 with multiple prepositional and verbalparaphrases [Nakov and Hearst, 2006]

olive oil

apple cake

ground attack

[w2] extracted from [w1]

[w2] made of [w1]

[w2] from [w1]

boat whistle

sea bass

[w2] located in [w1]

[w2] live in [w1]

game room

service door

baby oil

[w2] used for [w1]

[w2] for [w1]


Prior Methods (1/2)

Based on constituent co-occurrences: “cake made of apple”

Problems:1. Many unseen compounds, no paraphrases in the corpus

rare: parsley cake or highly lexicalized: ice cream

2. Many compounds with just a few paraphrasesCan we infer “cake containing apple” given “cake made of apple”?

Prior work provides partial solutions to either (1) or (2)


Prior Methods (1/2)







Prior Methods (1/2)







Prior Methods (1/2)







Prior Methods (2/2)

1. MELODI [Van de Cruys et al., 2013]:Represent NC using compositional distributional representations

Predict paraphrase templates given NC vectorGeneralizes for similar unseen NCs, e.g. pear tart

2. IIITH [Surtani et al., 2013]:Learn “is-a” relations between paraphrases:e.g. “[w2] extracted from [w1]” ⊂ “[w2] made of [w1]”

Our solution: multi-task learning to address both problems


Prior Methods (2/2)

1. MELODI [Van de Cruys et al., 2013]:Represent NC using compositional distributional representationsPredict paraphrase templates given NC vector

Generalizes for similar unseen NCs, e.g. pear tart




Prior Methods (2/2)

1. MELODI [Van de Cruys et al., 2013]:Represent NC using compositional distributional representationsPredict paraphrase templates given NC vectorGeneralizes for similar unseen NCs, e.g. pear tart




Prior Methods (2/2)





Prior Methods (2/2)





Multi-task Reformulation

Training example {w1 = apple, w2 = cake, p = “[w2] made of [w1]”}

1. Predict a paraphrase p for a given NC w1w2:What is the relation between apple and cake?

2. Predict w1 given a paraphrase p and w2:What can cake be made of?

3. Predict w2 given a paraphrase p and w1:What can be made of apple?




















Main Task (1): Predicting ParaphrasesWhat is the relation between apple and cake?

(23) made

(28) apple

(4145) cake...

(7891) of

(1) [w1]

(2) [w2]

(3) [p]

(78) [w2] containing [w1]...

(131) [w2] made of [w1]...

[p]cake apple

MLPp

pi = 78

Encode placeholder [p] in “cake [p] apple” using biLSTM

Predict an index in the paraphrase vocabularyFixed word embeddings, learned placeholder embeddings(1) Generalizes NCs: pear tart expected to yield similar results



(23) made

(28) apple

(4145) cake...

(7891) of

(1) [w1]

(2) [w2]

(3) [p]


(131) [w2] made of [w1]...

[p]cake apple

MLPp

pi = 78

Encode placeholder [p] in “cake [p] apple” using biLSTMPredict an index in the paraphrase vocabulary

Fixed word embeddings, learned placeholder embeddings(1) Generalizes NCs: pear tart expected to yield similar results



(23) made

(28) apple

(4145) cake...

(7891) of

(1) [w1]

(2) [w2]

(3) [p]


(131) [w2] made of [w1]...

[p]cake apple

MLPp

pi = 78

Encode placeholder [p] in “cake [p] apple” using biLSTMPredict an index in the paraphrase vocabularyFixed word embeddings, learned placeholder embeddings

(1) Generalizes NCs: pear tart expected to yield similar results



(23) made

(28) apple

(4145) cake...

(7891) of

(1) [w1]

(2) [w2]

(3) [p]


(131) [w2] made of [w1]...

[p]cake apple

MLPp

pi = 78

Encode placeholder [p] in “cake [p] apple” using biLSTMPredict an index in the paraphrase vocabularyFixed word embeddings, learned placeholder embeddings(1) Generalizes NCs: pear tart expected to yield similar results


Helper Task (2): Predicting Missing ConstituentsWhat can cake be made of?

(23) made

(28) apple

(4145) cake...

(7891) of

(1) [w1]

(2) [w2]

(3) [p]

ofcake made [w1]

MLPw

w1i = 28

Encode placeholder in “cake made of [w1]” using biLSTM

Predict an index in the word vocabulary(2) Generalizes paraphrases:

“[w2] containing [w1]” expected to yield similar results



(23) made

(28) apple

(4145) cake...

(7891) of

(1) [w1]

(2) [w2]

(3) [p]

ofcake made [w1]

MLPw

w1i = 28

Encode placeholder in “cake made of [w1]” using biLSTMPredict an index in the word vocabulary

(2) Generalizes paraphrases:“[w2] containing [w1]” expected to yield similar results



(23) made

(28) apple

(4145) cake...

(7891) of

(1) [w1]

(2) [w2]

(3) [p]

ofcake made [w1]

MLPw

w1i = 28

Encode placeholder in “cake made of [w1]” using biLSTMPredict an index in the word vocabulary(2) Generalizes paraphrases:

“[w2] containing [w1]” expected to yield similar results


Evaluation


Evaluation Setting

Available dataset: SemEval 2013 task 4 [Hendrickx et al., 2013]Semi-supervised: infer templates of POS tags (e.g. “[w2] verbprep [w1]”) from training data, use Google N-grams to generatetraining data

A ranking rather than a retrieval taskSystems expected to return a ranked list of paraphrases for eachnoun compoundWe implemented a ranking model that re-ranks the top kparaphrases retrieved by the model

Evaluation: based on n-gram overlap, provided evaluation scriptGold paraphrase score: how many annotators suggested it?


Evaluation Setting


A ranking rather than a retrieval taskSystems expected to return a ranked list of paraphrases for eachnoun compound

We implemented a ranking model that re-ranks the top kparaphrases retrieved by the model



Evaluation Setting





Evaluation Setting





Results

non-isomorphic isomorphic

20

40

6054.8

13

40.6

13.8

17.9

23.123.125.8

28.4 28.2

MELODI [Van de Cruys et al., 2013]

SemEval 2013 Baseline [Hendrickx et al., 2013]

SFS [Versley, 2013]

IIITH [Surtani et al., 2013]

PaNiC [Shwartz and Dagan, 2018]

rewards recalland precision

rewardsonly precision

“conservative”models


Results


20

40

6054.8

13

40.6

13.8

17.9

23.123.125.8

28.4 28.2



SFS [Versley, 2013]







Results


20

40

6054.8

13

40.6

13.8

17.9

23.123.125.8

28.4 28.2



SFS [Versley, 2013]







Results


20

40

6054.8

13

40.6

13.8

17.9

23.123.125.8

28.4 28.2



SFS [Versley, 2013]







Error AnalysisFalse Positive

(1)

44%

(2)15%

(3)

14%

(4)

8%

(5)

5%(6)

14%

1. Valid, missing from gold-standard(“discussion by group”)

2. Too specific(“life of women in community”)

3. Incorrect prepositionsE.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”

4. Syntactic errors5. Borderline grammatical(“force of coalition forces”)

6. Other errors



(1)

44%

(2)15%

(3)

14%

(4)

8%

(5)

5%(6)

14%





6. Other errors



(1)

44%

(2)15%

(3)

14%

(4)

8%

(5)

5%(6)

14%



3. Incorrect prepositions

E.g., n-grams don’t respect syntacticstructure: “rinse away the oil frombaby ’s head”⇒ “oil from baby”


6. Other errors



(1)

44%

(2)15%

(3)

14%

(4)

8%

(5)

5%(6)

14%





6. Other errors



(1)

44%

(2)15%

(3)

14%

(4)

8%

(5)

5%(6)

14%




4. Syntactic errors

5. Borderline grammatical(“force of coalition forces”)

6. Other errors



(1)

44%

(2)15%

(3)

14%

(4)

8%

(5)

5%(6)

14%





6. Other errors



(1)

44%

(2)15%

(3)

14%

(4)

8%

(5)

5%(6)

14%





6. Other errors


Error AnalysisFalse Negative

(1)

30%(2)

25%

(3)

10%

(4)

35%

1. Long paraphrase (n > 5)

2. Determiners(“mutation of a gene”)

3. Inflected constituents(“holding of shares”)

4. Other errors



(1)

30%(2)

25%

(3)

10%

(4)

35%

1. Long paraphrase (n > 5)2. Determiners(“mutation of a gene”)


4. Other errors



(1)

30%(2)

25%

(3)

10%

(4)

35%



4. Other errors



(1)

30%(2)

25%

(3)

10%

(4)

35%



4. Other errors


Future Directions

Can we learn phrase meanings like humans do?

[Cooper, 1999]: how do L2 learners process idioms?Infer from context: 28% (57% success rate)Rely on literal meaning: 19% (22% success rate)...


Inferring from context

We need “extended” contexts[Asl, 2013]: more successful idiominterpretation with extendedcontexts (stories)

We need richer context modelingCharacters in the storyRelationships between themDialogues...


Inferring from context

We need “extended” contexts[Asl, 2013]: more successful idiominterpretation with extendedcontexts (stories)

We need richer context modelingCharacters in the storyRelationships between themDialogues...


Relying on literal meaning

“Robert knew he was robbing the cradle by dating a sixteen-year-old girl”

We need world knowledge“Cradle is something you put thebaby in”

We need to be able to reason“You’re stealing a child from amother”

“So robbing the cradle is like datinga really young person”

[Cooper, 1999]


Relying on literal meaning

“Robert knew he was robbing the cradle by dating a sixteen-year-old girl”

We need world knowledge“Cradle is something you put thebaby in”

We need to be able to reason“You’re stealing a child from amother”

“So robbing the cradle is like datinga really young person”

[Cooper, 1999]


Recap

1. Testing Existing Pre-trained RepresentationsContextualized word embeddings provide better phraserepresentations, but there is still a long way to go

2. Paraphrasing Noun-CompoundsRepresentations of compositional phrases can rely upon andgeneralize existing knowledge about similar concepts

3. Future DirectionsTo represent phrases like humans do, we need better context andworld knowledge modeling

Thank you!


Recap




Thank you!


Recap




Thank you!


Recap




Thank you!


References I[Adi et al., 2017] Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., and Goldberg, Y. (2017). Fine-grained analysis of sentence

embeddings using auxiliary prediction tasks. In Proceedings of ICLR Conference Track.

[Asl, 2013] Asl, F. M. (2013). The impact of context on learning idioms in efl classes. TESOL Journal, 37(1):2.

[Conneau et al., 2018] Conneau, A., Kruszewski, G., Lample, G., Barrault, L., and Baroni, M. (2018). What you can cram into asingle vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of theAssociation for Computational Linguistics (Volume 1: Long Papers), pages 2126–2136. Association for ComputationalLinguistics.

[Cooper, 1999] Cooper, T. C. (1999). Processing of idioms by l2 learners of english. TESOL quarterly, 33(2):233–262.

[Hendrickx et al., 2013] Hendrickx, I., Kozareva, Z., Nakov, P., Ó Séaghdha, D., Szpakowicz, S., and Veale, T. (2013).Semeval-2013 task 4: Free paraphrases of noun compounds. In SemEval, pages 138–143.

[Nakov, 2013] Nakov, P. (2013). On the interpretation of noun compounds: Syntax, semantics, and entailment. NaturalLanguage Engineering, 19(03):291–330.

[Nakov and Hearst, 2006] Nakov, P. and Hearst, M. (2006). Using verbs to characterize noun-noun relations. In InternationalConference on Artificial Intelligence: Methodology, Systems, and Applications, pages 233–244. Springer.

[Shwartz and Dagan, 2018] Shwartz, V. and Dagan, I. (2018). Paraphrase to explicate: Revealing implicit noun-compoundrelations. In ACL, Melbourne, Australia.

[Surtani et al., 2013] Surtani, N., Batra, A., Ghosh, U., and Paul, S. (2013). Iiit-h: A corpus-driven co-occurrence basedprobabilistic model for noun compound paraphrasing. In SemEval, pages 153–157.

[Van de Cruys et al., 2013] Van de Cruys, T., Afantenos, S., and Muller, P. (2013). Melodi: A supervised distributional approachfor free paraphrasing of noun compounds. In SemEval, pages 144–147.

[Versley, 2013] Versley, Y. (2013). Sfs-tue: Compound paraphrasing with a language model and discriminative reranking. InSemEval, pages 148–152.


At Loose EndsNatural Language Processing Lab, Bar-Ilan University Talk @ EPFL, January 30, 2019. Representing Phrases Word representations are pretty much sorted out ... The crash

Documents