Top Banner
Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy
20

Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Dec 11, 2015

Download

Documents

Athena Baylor
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Statistical NLPWinter 2009

Lecture 12: Computational Psycholinguistics

Roger Levy

Page 2: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

NLP techniques, human parsing

• Our “parsing” here is about Treebank parsing

• Now for a bit about human parsing!

• Techniques from NLP are still the foundation

• We’ll focus on rational models of human sentence processing

[rational = using all available information to make inferences]

• incremental inference: understanding of and response to a partial utterance

Page 3: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Incrementality and Rationality

• Online sentence comprehension is hard• But lots of information sources can be usefully brought to

bear to help with the task• Therefore, it would be rational for people to use all the

information available, whenever possible• This is what incrementality is• We have lots of evidence that people do this often

“Put the apple on the towel in the box.” (Tanenhaus et al., 1995)

Page 4: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Anatomy of ye olde garden path sentence

The horse raced past the barn fell.

• It’s weird• People fail to understand it most of the time• People are more likely to misunderstand it than to

understand it properly• “What’s a barn fell?”• The horse that raced past the barn fell• The horse raced past the barn and fell

• Today I’m going to talk about three outstanding puzzles involving garden-path sentences

Page 5: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Garden paths: What we do understand

• We have decent models of how this sentence is not understood• Incremental probabilistic parsing with beam search

(Jurafsky, 1996)• Surprisal (Hale, 2001; Levy, 2008): the disambiguating

word fell is extremely low probability alarm signal signals “this doesn’t make sense” to the parser

• These models are based on rational use of evidential information (data-driven probabilistic inference)• Also compatible with gradations in garden-path difficulty

(Garnsey et al., 1997; McRae et al., 1998)

Page 6: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Hale, 2001; Levy, 2008; Smith & Levy, 2008: surprisal

• Let the difficulty of a word be its surprisal given its context:

• Captures the expectation intuition: the more we expect an event, the easier it is to process

• Many probabilistic formalisms, including probabilistic context-free grammars, can give us word surprisals

Page 7: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

a man arrived yesterday0.3 S S CC S 0.15 VP VBD ADVP0.7 S NP VP 0.4 ADVP RB0.35 NP DT NN ...

0.7

0.150.35

0.40.3 0.03 0.02

0.07

Total probability: 0.7*0.35*0.15*0.3*0.03*0.02*0.4*0.07= 1.8510-7

Algorithms by Lafferty and Jelinek (1992), Stolcke (1995) give us P(wi|context) from a PCFG

Page 8: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Surprisal and garden paths: theory

• Revisiting the horse raced past the barn fell• After the horse raced past the barn, assume 2 parses:

• Jurafsky 1996 estimated the probability ratio of these parses as 82:1

• The surprisal differential of fell in reduced versus unreduced conditions should thus be log2 83 = 6.4 bits

*(assuming independence between RC reduction and main verb)

Page 9: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Surprisal and garden paths: practice

• An unlexicalized PCFG (from Brown corpus) gets right monotonicity of surprisals at disambiguating word “fell”

this is the key comparison; the difference is small, but in the right direction

Aside: These are way too high, but that’s because the grammar’s crude

Page 10: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Garden Paths: What we don’t understand so well

• How do people arrive at the misinterpretations they come up with?

• What factors induce them to be more or less likely to come up with such a misinterpretation

Page 11: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Outstanding puzzle: length effects

• Try to read this:• Tom heard the gossip about the neighbors wasn’t true.

• Compare it with this:• Tom heard the gossip wasn’t true.

• Likewise:• While the man hunted the deer that was brown and graceful ran into

the woods.• While the man hunted the deer ran into the woods.

• The longer the ambiguous region, the harder it is to recover (Frazier & Rayner, 1987; Tabor & Hutchins, 2004)

• Also problematic for rational models: effects of irrelevant information

Page 12: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Memory constraints in human parsing

• Sentence meaning is structured• The number of logically possible analyses for a sentence

is at best exponential in sentence length• So we must be entertaining some limited subset of

analyses at all times*

*“Dynamic programming”, you say? Ask later.

Page 13: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Dynamic programming

• Exact probabilistic inference with context-free grammars can be done efficiently in O(n3)

• But…• This inference requires strict probabilistic locality• Human parsing is linear—that is, O(n)—anyway

• Here, we’ll explore an approach from the machine-learning literature: the particle filter

Page 14: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

The particle filter: general picture

• Sequential Monte Carlo for incremental observations

• Let xi be hidden data, zi be unobserved states

• For parsing: xi are words, zi are structural analyses

• Suppose that after n-1 observations we have the distribution overinterpretations P(zn-1|x1…n-1)

• After obtaining the next word xn, represent the next distribution P(zn|x1…n) inductively:

• Representing P(zi|x1…i) by samples makes it a Monte Carlo method

Page 15: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Particle filter with probabilistic grammarsS NP VP 1.0 V broke 0.3

NP N 0.8 V tired 0.3

NP N RRC 0.2 Part raced 0.1

RRC Part Adv 1.0 Part broken 0.5

VP V Adv 1.0 Part tired 0.4

N horses 1.0 Adv quickly 1.0

V raced 0.4

S

horses raced quickly

VP

N V Adv*

NP*

*

* *

**

**

*

1.0 0.4 1.0

horses raced quickly

RRCN

V Adv

*

*

*

*

* *

**

*

1.0 0.4 1.0

**

tired tired

*

*VP

V

S

*

**

1.0

NP

Page 16: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Returning to the puzzle

A-S Tom heard the gossip wasn’t true.

A-L Tom heard the gossip about the neighbors wasn’t true.

U-S Tom heard that the gossip wasn’t true.

U-L Tom heard that the gossip about the neighbors wasn’t true.

•Previous empirical finding: ambiguity induces difficulty…• …but so does the length of the ambiguous region

•Our linking hypothesis:

The proportion of parse failures at the disambiguating region should be monotonically related to the difficulty of the sentence

Frazier & Rayner,1982; Tabor & Hutchins, 2004

Page 17: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Model Results

Ambiguity matters…

But the length of the ambiguous region also matters!

Page 18: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Human results (offline rating study)

Page 19: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Rational comprehension’s other successes

• Global disambiguation preferences (Jurafsky, 1996)

The women discussed the dogs on the beach

• Basic garden-path sentences (Hale, 2001)

The horse raced past the barn fell

• Garden-path gradience (Narayanan & Jurafsky, 2002)

The crook arrested by the detective was guilty

• Predictability in unambiguous contexts (Levy, 2008)

The children went outside to…

• Grounding in optimality/rational analysis (Norris, 2006; Smith & Levy, 2008)

? ?

(that was)

(not difficult)

playchat

Page 20: Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Behavioral correlates (Tabor et al., 2004)

• Also, Konieczny (2006, 2007) found compatible results in stops-making-sense and visual-world paradigms

• These results are problematic for theories requiring global contextual consistency (Frazier, 1987; Gibson, 1991, 1998; Jurafsky, 1996; Hale, 2001, 2006)

harder than

thrown

tossed