Top Banner
Capturing linguistic Capturing linguistic interaction interaction in a grammar in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University College London [email protected]
27

Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Dec 17, 2015

Download

Documents

Chad Quinn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Capturing linguistic Capturing linguistic interaction interaction

in a grammarin a grammarA method for empirically evaluating

the grammar of a parsed corpus

Sean WallisSurvey of English Usage

University College London

[email protected]

Page 2: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Capturing linguistic Capturing linguistic interaction...interaction...• Parsed corpus linguistics

• Empirical evaluation of grammar

• Experiments– Attributive AJPs– Preverbal AVPs– Embedded postmodifying clauses

• Conclusions– Comparing grammars or corpora– Potential applications

Page 3: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Parsed corpus linguisticsParsed corpus linguistics

• Several million-word parsed corpora exist

• Each sentence analysed in the form of a tree– different languages have been analysed– limited amount of spontaneous speech data

• Commitment to a particular grammar required– different schemes have been applied– problems: computational completeness + manual

consistency

• Tools support linguistic research in corpora

Page 4: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Parsed corpus linguisticsParsed corpus linguistics

• An example tree from ICE-GB (spoken)

S1A-006 #23

Page 5: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Parsed corpus linguisticsParsed corpus linguistics

• Three kinds of evidence may be obtained from a parsed corpusFrequency evidence of a particular known

rule, structure or linguistic eventCoverage evidence of new rules, etc.Interaction evidence of the relationship

between rules, structures and events

• This evidence is necessarily framed within a particular grammatical scheme– So… how might we evaluate this grammar?

Page 6: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Empirical evaluation of Empirical evaluation of grammargrammar• Many theories, frameworks and grammars

– no agreed evaluation method exists– linguistics is divided into competing camps– status of parsed corpora ‘suspect’

• Possible method: retrievability of events circularity: you get out what you put in redundancy: ‘improvement’ by mere addition atomic: based on single events, not pattern specificity: based on particular phenomena

• New method: retrievability of event sequences

Page 7: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 1: attributive AJPsExperiment 1: attributive AJPs

• Adjectives before a noun in English

• Simple idea: plot the frequency of NPs with at least n = 0, 1, 2, 3… attributive AJPs

Page 8: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 1: attributive AJPsExperiment 1: attributive AJPs

• Adjectives before a noun in English

• Simple idea: plot the frequency of NPs with at least n = 0, 1, 2, 3… attributive AJPs

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

0 1 2 3 4 5 6

0.0000

1.0000

2.0000

3.0000

4.0000

5.0000

6.0000

0 1 2 3 4 5 6

Raw frequency Log frequency

NB: not a straight line

Page 9: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 1: analysis of Experiment 1: analysis of resultsresults• If the log-frequency line is straight

– exponential fall in frequency (constant probability)– no interaction between decisions (cf. coin tossing)

• Sequential probability analysis– calculate probability of adding each AJP– error bars (binomial)– probability falls

• second < first• third < second• fourth < second

– decisions interact

Page 10: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 1: analysis of Experiment 1: analysis of resultsresults• If the log-frequency line is straight

– exponential fall in frequency (constant probability)– no interaction between decisions (cf. coin tossing)

• Sequential probability analysis– calculate probability of adding each AJP– error bars (binomial)– probability falls

• second < first• third < second• fourth < second

– decisions interact

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5

probability

Page 11: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 1: analysis of Experiment 1: analysis of resultsresults• If the log-frequency line is straight

– exponential fall in frequency (constant probability)– no interaction between decisions (cf. coin tossing)

• Sequential probability analysis– calculate probability of adding each AJP– error bars (binomial)– probability falls– decisions interact– fit to a power law

• y = m.x k

• find m and x

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5

probability

y = 0.1931x -1.2793

Page 12: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 1: explanations?Experiment 1: explanations?

• Feedback loop: for each successive AJP, it is more difficult to add a further AJP– Explanation 1: semantic constraints

• tend to say tall green ship • do not tend to say tall short ship or green tall ship

– Explanation 2: communicative economy• once speaker said tall green ship, tends to only say ship

– Further investigation required

• General principle:– significant change (usually, fall) in probability is

evidence of an interaction along grammatical axis

Page 13: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiments 2,3: variationsExperiments 2,3: variations

Restrict head: common and proper nouns– Common nouns: similar results– Proper nouns and adjectives are often treated as

compounds (Northern England vs. lower Loire )

Ignore grammar: adjective + noun strings– Some misclassifications / miscounting (‘noise’)

• she was [beautiful, people] said; tall very [green ship]

– Similar results • slightly weaker (third < second ns at p=0.01)

– Insufficient evidence for grammar• null hypothesis: simple lexical adjacency

Page 14: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 4: preverbal AVPsExperiment 4: preverbal AVPs

• Consider adverb phrases before a verb– Results very different

• Probability does not fall significantly between first and second AVP

• Probability does fall between third and second AVP

– Possible constraints• (weak) communicative• not (strong) semantic

– Further investigationneeded

Page 15: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 4: preverbal AVPsExperiment 4: preverbal AVPs

• Consider adverb phrases before a verb– Results very different

• Probability does not fall significantly between first and second AVP

• Probability does fall between third and second AVP

– Possible constraints• (weak) communicative• not (strong) semantic

– Further investigationneeded

– Not power law: R2 < 0.24 0.00

0.01

0.02

0.03

0.04

0.05

0.06

1 2 3

probability

Page 16: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 5: embedded Experiment 5: embedded clausesclauses• Another way to specify nouns in English

– add clause after noun to explicate it• the ship [that was tall and green]• the ship [in the port]

– may be embedded• the ship [in the port [with the ancient lighthouse]]

– or successively postmodified• the ship [in the port][with a very old mast]

• Compare successive embedding and sequential postmodifying clauses– Axis = embedding depth / sequence length

Page 17: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 5: methodExperiment 5: method

• Extract examples with FTFs– at least n levels of embedded

postmodification:

Page 18: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 5: methodExperiment 5: method

• Extract examples with FTFs– at least n levels of embedded

postmodification:

01

2(etc.)

Page 19: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 5: methodExperiment 5: method

• Extract examples with FTFs– at least n levels of embedded postmodification:

01

2

– problems:• multiple matching cases (use ICECUP IV to classify)• overlapping cases (subtract extra case)• co-ordination of clauses or NPs (use alternative patterns)

(etc.)

Page 20: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 5: analysis of Experiment 5: analysis of resultsresults• Probability of adding a further embedded

clause falls with each level– second < first– sequential < embedding

• Embedding only:– third < first– insufficient data for

third < second

• Conclusion:– Interaction along embedding and sequential axes

Page 21: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0 1 2 3 4

Experiment 5: analysis of Experiment 5: analysis of resultsresults• Probability of adding a further embedded

clause falls with each level– second < first– sequential < embedding

• Embedding only:– third < first– insufficient data for

third < second

• Conclusion:– Interaction along embedding and sequential axes

sequential

embedded

probability

Page 22: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0 1 2 3 4

Experiment 5: analysis of Experiment 5: analysis of resultsresults• Probability of adding a further

embedded clause falls with each level– second < first– sequential < embedding

• Fitting to f = m.x k

– k < 0 = fall ( f = m/x |k|)

– |k| is high = steep

• Conclusion:– Both match power law: R2 > 0.99

sequential

embedded y = 0.0539x

-1.2206

y = 0.0523x -1.6516

Page 23: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Experiment 5: explanations?Experiment 5: explanations?

• Lexical adjacency?– No: 87% of 2-level cases have at least one VP, NP

or clause between upper and lower heads• Misclassified cases of embedding?

– No: very few (5%) semantically ambiguous cases• Language production constraints?

– Possibly, could also be communicative economy• contrast spontaneous speech with other modes

• Positive ‘proof’ of recursive tree grammar– Established from parsed corpus– cf. negative ‘proof’ (NLP parsing problems)

Page 24: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

ConclusionsConclusions

• A new method for evaluating interactions along grammatical axes– General purpose, robust, structural– More abstract than ‘linguistic choice’ experiments– Depends on a concept of grammatical distance

along an axis, based on the chosen grammar

• Method has philosophical implications– Grammar viewed as structure of linguistic choices– Linguistics as an evaluable observational science

• Signature (trace) of language production decisions

– A unification of theoretical and corpus linguistics?

Page 25: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Comparing grammars or Comparing grammars or corporacorpora• Can we reliably retrieve known interaction

patterns with different grammars? – Do these patterns differ across corpora?

• Benefits over individual event retrievalnon-circular: generalisation across local syntaxnot subject to redundancy: arbitrary terms

makes trends more difficult to retrievenot atomic: based on patterns of interactiongeneral: patterns may have multiple explanations

• Supplements retrieval of events

Page 26: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Potential applicationsPotential applications

• Corpus linguistics– Optimising existing grammar

• e.g. co-ordination, compound nouns

• Theoretical linguistics– Comparing different grammars, same language– Comparing different languages or periods

• Psycholinguistics– Search for evidence of language production

constraints in spontaneous speech corpora• speech and language therapy• language acquisition and development

Page 27: Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.

Links and further readingLinks and further reading

• Survey of English Usage– www.ucl.ac.uk/english-usage

• Corpora and grammar– .../projects/ice-gb

• Full paper– .../staff/sean/resources/analysing-

grammatical-interaction.pdf

• Sequential analysis spreadsheet (Excel)– .../staff/sean/resources/interaction-trends.xls