That vexed problem of That vexed problem of choice choice reflections on experimental design and reflections on experimental design and statistics statistics with corpora with corpora ICAME 33 Leuven 30 May-3 June 2012 Sean Wallis, Jill Bowie and Bas Aarts Survey of English Usage University College London {s.wallis, j.bowie, b.aarts}@ucl.ac.uk
44
Embed
That vexed problem of choice reflections on experimental design and statistics with corpora ICAME 33 Leuven 30 May-3 June 2012 Sean Wallis, Jill Bowie.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
That vexed problem of That vexed problem of choicechoicereflections on experimental design and reflections on experimental design and statisticsstatisticswith corporawith corporaICAME 33
Leuven 30 May-3 June 2012
Sean Wallis, Jill Bowie and Bas AartsSurvey of English Usage University College London
{s.wallis, j.bowie, b.aarts}@ucl.ac.uk
OutlineOutline
• Introduction
• Definitions
• Refining baselines and the ratio principle
• Surveying ‘absolute’ and ‘relative’ variation
• Potential sources of interaction
• Employing alternation analysis
• Objections
• Conclusions
IntroductionIntroduction
• Research questions are really about choice– If speakers had no choice about the words or
constructions they used, language would be invariant!
• Lab experiments– Press button A or button B
• Corpus– Speakers may choose construction A or B
• But they can only actually chose one, A, at each point• We have to infer the other type, B,
counterfactually• Identifying alternates is often non-trivial
Mutual substitutionMutual substitution
• Mutual substitution A B– Given a corpus, identify all events of Type A that
alternate with events of Type B, such that A is mutually replaceable by B, without altering the meaning of the text.
• Replacement– B replaces A if B increases, and vice-versa
• p (A)+p (B)+... = 1
• Freedom to vary• p (X) [0, 1]
– Ideal: eliminate invariant Type C terms
Mutual substitutionMutual substitution
• Mutual substitution A B– Pronoun who/whom
• A = whom• B = who
Mutual substitutionMutual substitution
• Mutual substitution A B– Pronoun who/whom
• A = whom• B = who (objective)
– But whom is limited to objective case• C = who (subjective)• We therefore limit alternation to Objects
– If whom is used ‘incorrectly’ as a Subject, it has an additional constraint (social disfavour)
True rate of alternationTrue rate of alternation
• True rate of alternation– If A B
• p (A | {A, B}) =F (A)
F (A)+F (B)
True rate of alternationTrue rate of alternation
• True rate of alternation– If A B
• p (A | {A, B}) =
• Proportion (fraction) of all cases that are Type A– we use p (A) as a shorthand for p (A | {A, B}) if the
baseline {A, B} is stated
F (A)
F (A)+F (B)
True rate of alternationTrue rate of alternation
• True rate of alternation– If A B
• p (A | {A, B}) =
• Proportion (fraction) of all cases that are Type A– we use p (A) as a shorthand for p (A | {A, B}) if the
baseline {A, B} is stated
• Contingency tables
F (A)
F (A)+F (B)
IV DV A B Totalcondition 1 f1(A) f1(B) f1(A)+f1(B)condition 2 f2(A) f2(B) f2(A)+f2(B)Total F (A) F (B) F (A)+F (B)
probability
p1(A)p2(A)p (A)
True rate of alternationTrue rate of alternation
• Shall/will alternation over time in DCPSE
0
0.2
0.4
0.6
0.8
1
1955 1960 1965 1970 1975 1980 1985 1990 1995
p baseline = {shall, will}
(Aarts et al., forthcoming)
True rate of alternationTrue rate of alternation
• Shall/(will+’ll) alternation over time in DCPSE
0
0.2
0.4
0.6
0.8
1
1955 1960 1965 1970 1975 1980 1985 1990 1995
p baseline = {shall, will, ’ll}
(Aarts et al., forthcoming)
True rate of alternationTrue rate of alternation
• Logistic ‘S’ curve assumes freedom to vary– p (X) [0, 1]
0
1
t
p
True rate of alternationTrue rate of alternation
• Logistic ‘S’ curve assumes freedom to vary– p (X) [0, 1]
– as do Wilson confidence intervals
0
1p
t
shall/(will+’ll)
shall/’ll
Refining baselinesRefining baselines
• Over-general baselines– conflate opportunity and use– ‘normalisation’ per million words
• implies that every word other than A is Type B!• is this plausible?
• ‘Art’ of experimental design– refine baseline by narrowing dataset
• reduce and eliminate non-alternating Type C cases• optionally: subdivide where different constraints apply
– different baselines test different hypotheses• cf. shall / will / ’ll
AB
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
form
al f-
to-f
info
rmal
f-to
-f
tele
phon
e
b di
scus
sion
s
b in
terv
iew
s
com
men
tary
parli
amen
t
lega
l x-e
xam
asso
rt s
pont
prep
ared
sp
Total
LLCICE-GB
Refining baselinesRefining baselines
• Tensed VPs per million words, DCPSETotal: constant over time
Diachronic variation: within text categories
Synchronic variation: between text categories
(Bowie et al., forthcoming)
The ratio principleThe ratio principle
• Simple algebra– any sequence of ratios can be reduced to
the ratio of the first and last term:F (modal)
F (word)
F (modal)
F (tVP)
F (tVP)
F (word)
The ratio principleThe ratio principle
• Simple algebra– any sequence of ratios can be reduced to
the ratio of the first and last term:
– we saw that the ratio tVP:word varies synchronically and diachronically in DCPSE
• we can eliminate this variation by simply focusing on modal:tVP
• use tensed VPs as baseline for modals
F (modal)
F (word)
F (modal)
F (tVP)
F (tVP)
F (word)
The ratio principleThe ratio principle
• Simple algebra– any sequence of ratios can be reduced to
the ratio of the first and last term:
– we saw that the ratio tVP:word varies synchronically and diachronically in DCPSE
• we can eliminate this variation by simply focusing on modal:tVP
• use tensed VPs as baseline for modals
– this baseline is not a strict alternation set• we have not eliminated all Type C terms
F (modal)
F (word)
F (modal)
F (tVP)
F (tVP)
F (word)
‘‘Absolute’ and ‘relative’ Absolute’ and ‘relative’ variationvariation• Changes in core modals over time in
DCPSE
0.00
0.01
0.02
0.03
0.04
0.05
0.10
0.15
0.20
0.25
0.30
0.00can could may might must shall should will would
p (modal | tVP) p (modal | modal tVP)
Left axis: absolute change as a proportion of tensed VPs
Right axis: relative change as a proportion of set of modals
(Bowie et al., forthcoming)
• Simple grammatical interaction– Independent and dependent variables are
grammatical• mutual substitution concerns the dependent
• Grammatically diverse alternates– Biber and Gray (forthcoming) investigate evidence
for increasing nominalisation• A = nouns that have been derived from verb forms
– This paper reports an analysis of Tucker’s central prediction system model and an empirical comparison of it with two competing models. [1965, Acad-NS]
• Grammatically diverse alternates– Biber and Gray (forthcoming) investigate evidence
for increasing nominalisation• A = nouns that have been derived from verb forms
– This paper reports an analysis of Tucker’s central prediction system model and an empirical comparison of it with two competing models. [1965, Acad-NS]
• B = verbs that could be nominalised– Could just use clauses as baseline
• But this is little better than words– Better option is to enumerate types
• Grammatically diverse alternates– Biber and Gray (forthcoming) investigate evidence
for increasing nominalisation• A = nouns that have been derived from verb forms
– This paper reports an analysis of Tucker’s central prediction system model and an empirical comparison of it with two competing models. [1965, Acad-NS]
• B = verbs that could be nominalised– Could just use clauses as baseline– Better option is to enumerate types
• analysis• prediction• comparison
– Examine cases: is alternation possible?
•analyse•predict•compare
ObjectionsObjections
• If this is such a good idea, why isn’t everybody doing it?
• Three main objections are made: alternates are not reliably identifiable baselines are arbitrarily chosen by the
researcher different constraints apply to different
terms (no such thing as free variation)
Alternates are not reliably Alternates are not reliably identifiable?identifiable?
• Identifying alternates can be difficult– phrasal vs. Latinate verbs
Alternates are not reliably Alternates are not reliably identifiable?identifiable?
• Identifying alternates can be difficult– phrasal vs. Latinate verbs
• Strategies: enumerate cases from bottom, up
• find Type B cases for each Type A
Alternates are not reliably Alternates are not reliably identifiable?identifiable?
• Identifying alternates can be difficult– phrasal vs. Latinate verbs
• Strategies: enumerate cases from bottom, up
• find Type B cases for each Type A put up tolerate 4 put up with it [S1A-037 #1]
?position 3 put your feet up [S1A-032 #21]
build, make 3 shacks put up without any planning [S2B-022 #118]
display, project 2 put up two… trees [on the screen] [S1B-002 #157]
sell 2 put the plant up for sale [W2C-015 #8]
propose 2 put [a motion] up [S1B-077 #127]
increase 1 put up the poll tax [W2C-009 #3]
accommodate 1 we could put up the children [S1A-073 #197]
finance 1 put up the money [W2F-007 #36]
Alternates are not reliably Alternates are not reliably identifiable?identifiable?
• Strategies: enumerate cases from bottom, up
• find Type B cases for each Type A
Alternates are not reliably Alternates are not reliably identifiable?identifiable?
• Strategies: enumerate cases from bottom, up
• find Type B cases for each Type A refine baseline from top, down
• start with verbs, eliminate non-alternating Type Cs
– Copular verbs– Clitics– Stative verbs
• are dynamic verbs the upper bound for alternation with phrasal verbs?
Alternates are not reliably Alternates are not reliably identifiable?identifiable?
• Strategies: enumerate cases from bottom, up
• find Type B cases for each Type A refine baseline from top, down
• start with verbs, eliminate non-alternating Type Cs
– Copular verbs– Clitics– Stative verbs
• are dynamic verbs the upper bound for alternation with phrasal verbs?
– combine strategies: • identify stative verbs lexically• a few verbs are stative and dynamic
– check in situ
Baselines are arbitrary?Baselines are arbitrary?
• Is there such an ‘objective’ baseline?– No, but optimum baselines identify where
speakers have a real choice: Type A vs. Type B
• Baselines are a control– Experimental hypothesis:
• the ratio of Type A to the baseline is constant over values of independent variable
– Baseline cited as part of experimental reporting
• Indeed we can experiment with baselines– e.g. does the present perfect correlate
more with past-referring or present-referring VPs?
Comparing baselinesComparing baselines
• Does the present perfect correlate more with past-referring or present-referring VPs?
Comparing baselinesComparing baselines
• Does the present perfect correlate more with past-referring or present-referring VPs?
present present perf TotalLLC 2,696
ICE-GB 2,488
Total 5,184
present non-perf
33,13132,11465,245
35,82734,60
270,429
past present perf TotalLLC 2,696
ICE-GB 2,488
Total 5,184
other TPM VPs
18,20114,29332,494
20,89716,78
137,678
(Bowie et al., forthcoming)
Comparing baselinesComparing baselines
• Does the present perfect correlate more with past-referring or present-referring VPs?
– Present perfect correlates more withpresent-referring VPs
present present perf TotalLLC 2,696
ICE-GB 2,488
Total 5,184
present non-perf
33,13132,11465,245
35,82734,60
270,429
past present perf TotalLLC 2,696
ICE-GB 2,488
Total 5,184
other TPM VPs
18,20114,29332,494
20,89716,78
137,678
d% = -4.455.13%’ = 0.02272 = 2.68ns
d% = +14.925.47%’ = 0.06942 = 25.06s
(Bowie et al., forthcoming)
Different constraints apply in each Different constraints apply in each case?case?• Speakers choices are influenced by
multiple pressures– to talk about a single ‘choice’ is misleading– there is no such thing as free variation
• We are not attempting to infer “the reason” for a particular speaker decision– we are attempting to identify statistically
sound • patterns • correlations• trends
– across many speakers
Different constraints apply in each Different constraints apply in each case?case?• Does one or more of these multiple
constraints represent a systematic bias on the true rate?Yes = try to identify it experimentally No = ‘noise’
• Can focus on subset of cases to restrict different influences– e.g. limit shall / will by modal semantics
• This objection is misplaced:– freedom to vary
=grammatical and semantic possibility (potential)=not that choices are free from influence
A competitive ecology?A competitive ecology?
• Not everything is a binary choice– but the same principles apply
hoping to
hoping that / Ø
hoping for
0%
20%
40%
60%
80%
100%
1920s 1960s 2000s
p
0%
20%
40%
60%
80%
100%
‘cogitate’
‘intend’
quotative
interpretive
1920s 1960s 2000s
p
(Levin, forthcoming)
Meanings of THINK Complementation patterns of HOPE
ConclusionsConclusions
• Researchers need to pay attention to questions of choice and baselines– This does not mean that an observed change is due
to a single source
• Minimum condition: baseline is a control– statistics evaluate difference from this control
• is it a good control?
• Alternation studies: baseline is opportunity for making choice under investigation
• Word-based baselines should only really be used for comparison with other studies– we should not make statements about choice
unless we investigate that question
ConclusionsConclusions
• ‘Alternation’ can be interpreted – strictly
• all Type As and Type Bs identified and cases checked
– generously• small number of Type Cs permitted
– Alternation is semantically bounded but grammatical analysis helps identify cases!
• We may try different experimental designs, modifying baselines and subsets– many more novel experiments are possible
• experimental assumptionsshould always be clearly reported
ReferencesReferences
ACLW: Aarts, B., J. Close, G. Leech and S.A. Wallis (eds.) (forthcoming). The Verb Phrase in English: Investigating recent language change with corpora. Cambridge: CUP.
Preview at www.ucl.ac.uk/english-usage/projects/verb-phrase/book.
• Aarts, B., J. Close and S.A. Wallis. forthcoming. Choices over time: methodological issues in investigating current change. ACLW Chapter 2.
• Biber, D. and B. Gray. forthcoming. Nominalizing the verb phrase in academic science writing. ACLW Chapter 5.
• Bowie, J., S.A. Wallis and B. Aarts, forthcoming. The perfect in spoken English. ACLW Chapter 13.
• Levin, M., forthcoming. The progressive verb in modern American English. ACLW Chapter 8.
• Nelson, G., S.A. Wallis and B. Aarts. 2002. Exploring Natural Language. Amsterdam: John Benjamins.
• Wallis, S.A. forthcoming. Capturing linguistic interaction in a grammar:a method for empirically evaluating the grammar of a parsed corpus.
Statistical postscriptStatistical postscript
• Type Cs make statistical tests less sensitive– What happens to confidence intervals as we
add to F (A)+F (B) = 100 alternating cases?
0
0.05
0.1
0.15
0.2
0.25
100 1,000 10,000
5
20
40
60
80
95
F (A)eN/100
N
Tests assume freedom to vary (F (A)+F (B) = N )
Including Type Cs makes statistical tests conservative