That vexed problem of choice reflections on experimental design and statistics with corpora ICAME 33 Leuven 30 May-3 June 2012 Sean Wallis, Jill Bowie.

That vexed problem of That vexed problem of choicechoicereflections on experimental design and reflections on experimental design and statisticsstatisticswith corporawith corporaICAME 33

Leuven 30 May-3 June 2012

Sean Wallis, Jill Bowie and Bas AartsSurvey of English Usage University College London

{s.wallis, j.bowie, b.aarts}@ucl.ac.uk

OutlineOutline

• Introduction

• Definitions

• Refining baselines and the ratio principle

• Surveying ‘absolute’ and ‘relative’ variation

• Potential sources of interaction

• Employing alternation analysis

• Objections

• Conclusions

IntroductionIntroduction

• Research questions are really about choice– If speakers had no choice about the words or

constructions they used, language would be invariant!

• Lab experiments– Press button A or button B

• Corpus– Speakers may choose construction A or B

• But they can only actually chose one, A, at each point• We have to infer the other type, B,

counterfactually• Identifying alternates is often non-trivial

Mutual substitutionMutual substitution

• Mutual substitution A B– Given a corpus, identify all events of Type A that

alternate with events of Type B, such that A is mutually replaceable by B, without altering the meaning of the text.

• Replacement– B replaces A if B increases, and vice-versa

• p (A)+p (B)+... = 1

• Freedom to vary• p (X) [0, 1]

– Ideal: eliminate invariant Type C terms


• Mutual substitution A B– Pronoun who/whom

• A = whom• B = who


• Mutual substitution A B– Pronoun who/whom

• A = whom• B = who (objective)

– But whom is limited to objective case• C = who (subjective)• We therefore limit alternation to Objects

– If whom is used ‘incorrectly’ as a Subject, it has an additional constraint (social disfavour)

True rate of alternationTrue rate of alternation

• True rate of alternation– If A B

• p (A | {A, B}) =F (A)

F (A)+F (B)



• p (A | {A, B}) =

• Proportion (fraction) of all cases that are Type A– we use p (A) as a shorthand for p (A | {A, B}) if the

baseline {A, B} is stated

F (A)

F (A)+F (B)



• p (A | {A, B}) =

• Proportion (fraction) of all cases that are Type A– we use p (A) as a shorthand for p (A | {A, B}) if the

baseline {A, B} is stated

• Contingency tables

F (A)

F (A)+F (B)

IV DV A B Totalcondition 1 f1(A) f1(B) f1(A)+f1(B)condition 2 f2(A) f2(B) f2(A)+f2(B)Total F (A) F (B) F (A)+F (B)

probability

p1(A)p2(A)p (A)


• Shall/will alternation over time in DCPSE

0

0.2

0.4

0.6

0.8

1

1955 1960 1965 1970 1975 1980 1985 1990 1995

p baseline = {shall, will}

(Aarts et al., forthcoming)


• Shall/(will+’ll) alternation over time in DCPSE

0

0.2

0.4

0.6

0.8

1

1955 1960 1965 1970 1975 1980 1985 1990 1995

p baseline = {shall, will, ’ll}

(Aarts et al., forthcoming)


• Logistic ‘S’ curve assumes freedom to vary– p (X) [0, 1]

0

1

t

p


• Logistic ‘S’ curve assumes freedom to vary– p (X) [0, 1]

– as do Wilson confidence intervals

0

1p

t

shall/(will+’ll)

shall/’ll

Refining baselinesRefining baselines

• Over-general baselines– conflate opportunity and use– ‘normalisation’ per million words

• implies that every word other than A is Type B!• is this plausible?

• ‘Art’ of experimental design– refine baseline by narrowing dataset

• reduce and eliminate non-alternating Type C cases• optionally: subdivide where different constraints apply

– different baselines test different hypotheses• cf. shall / will / ’ll

AB

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

form

al f-

to-f

info

rmal

f-to

-f

tele

phon

e

b di

scus

sion

s

b in

terv

iew

s

com

men

tary

parli

amen

t

lega

l x-e

xam

asso

rt s

pont

prep

ared

sp

Total

LLCICE-GB

Refining baselinesRefining baselines

• Tensed VPs per million words, DCPSETotal: constant over time

Diachronic variation: within text categories

Synchronic variation: between text categories

(Bowie et al., forthcoming)

The ratio principleThe ratio principle

• Simple algebra– any sequence of ratios can be reduced to

the ratio of the first and last term:F (modal)

F (word)

F (modal)

F (tVP)

F (tVP)

F (word)



the ratio of the first and last term:

– we saw that the ratio tVP:word varies synchronically and diachronically in DCPSE

• we can eliminate this variation by simply focusing on modal:tVP

• use tensed VPs as baseline for modals

F (modal)

F (word)

F (modal)

F (tVP)

F (tVP)

F (word)



the ratio of the first and last term:

– we saw that the ratio tVP:word varies synchronically and diachronically in DCPSE

• we can eliminate this variation by simply focusing on modal:tVP

• use tensed VPs as baseline for modals

– this baseline is not a strict alternation set• we have not eliminated all Type C terms

F (modal)

F (word)

F (modal)

F (tVP)

F (tVP)

F (word)

‘‘Absolute’ and ‘relative’ Absolute’ and ‘relative’ variationvariation• Changes in core modals over time in

DCPSE

0.00

0.01

0.02

0.03

0.04

0.05

0.10

0.15

0.20

0.25

0.30

0.00can could may might must shall should will would

p (modal | tVP) p (modal | modal tVP)

Left axis: absolute change as a proportion of tensed VPs

Right axis: relative change as a proportion of set of modals


• Simple grammatical interaction– Independent and dependent variables are

grammatical• mutual substitution concerns the dependent

variable

Employing alternation analysisEmploying alternation analysis

• Simple grammatical interaction– Independent and dependent variables are

grammatical• mutual substitution concerns the dependent

variable

– Numerous examples in Nelson et al. 2002• e.g. clause table: mood transitivity

• not alternation, but survey: could be refined


CL(inter)

IV DV montr Totalexclamative

CL(montr, exclam)

interrogative CL(montr, inter)

Total CL(montr)

ditr

CL(ditr, exclam)CL(ditr, inter)

CL(ditr)

CL(exclam)

CL

… … …


• Repeating choices: to add or not to add– e.g. repeated decisions to add an attributive AJP to

specify a NP head: the tall white ship• A = add AJP• B = don’t add AJP (and stop)


• Repeating choices: to add or not to add– e.g. repeated decisions to add an attributive AJP to

specify a NP head: the tall white ship• A = add AJP• B = don’t add AJP (and stop)

– Sequential analysis: examine p (A | {A, B}) at each step

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4

p Conclusion: decision to add an AJP becomes successively more difficult

(Wallis, forthcoming)


• Grammatically diverse alternates– Biber and Gray (forthcoming) investigate evidence

for increasing nominalisation• A = nouns that have been derived from verb forms

– This paper reports an analysis of Tucker’s central prediction system model and an empirical comparison of it with two competing models. [1965, Acad-NS]

• B = verbs that could be nominalised





• B = verbs that could be nominalised– Could just use clauses as baseline

• But this is little better than words– Better option is to enumerate types

• analysis• prediction• comparison

•analyse•predict•compare





• B = verbs that could be nominalised– Could just use clauses as baseline– Better option is to enumerate types

• analysis• prediction• comparison

– Examine cases: is alternation possible?

•analyse•predict•compare

ObjectionsObjections

• If this is such a good idea, why isn’t everybody doing it?

• Three main objections are made: alternates are not reliably identifiable baselines are arbitrarily chosen by the

researcher different constraints apply to different

terms (no such thing as free variation)

Alternates are not reliably Alternates are not reliably identifiable?identifiable?

• Identifying alternates can be difficult– phrasal vs. Latinate verbs



• Strategies: enumerate cases from bottom, up

• find Type B cases for each Type A




• find Type B cases for each Type A put up tolerate 4 put up with it [S1A-037 #1]

?position 3 put your feet up [S1A-032 #21]

build, make 3 shacks put up without any planning [S2B-022 #118]

display, project 2 put up two… trees [on the screen] [S1B-002 #157]

sell 2 put the plant up for sale [W2C-015 #8]

propose 2 put [a motion] up [S1B-077 #127]

increase 1 put up the poll tax [W2C-009 #3]

accommodate 1 we could put up the children [S1A-073 #197]

finance 1 put up the money [W2F-007 #36]



• find Type B cases for each Type A



• find Type B cases for each Type A refine baseline from top, down

• start with verbs, eliminate non-alternating Type Cs

– Copular verbs– Clitics– Stative verbs

• are dynamic verbs the upper bound for alternation with phrasal verbs?



• find Type B cases for each Type A refine baseline from top, down

• start with verbs, eliminate non-alternating Type Cs

– Copular verbs– Clitics– Stative verbs

• are dynamic verbs the upper bound for alternation with phrasal verbs?

– combine strategies: • identify stative verbs lexically• a few verbs are stative and dynamic

– check in situ

Baselines are arbitrary?Baselines are arbitrary?

• Is there such an ‘objective’ baseline?– No, but optimum baselines identify where

speakers have a real choice: Type A vs. Type B

• Baselines are a control– Experimental hypothesis:

• the ratio of Type A to the baseline is constant over values of independent variable

– Baseline cited as part of experimental reporting

• Indeed we can experiment with baselines– e.g. does the present perfect correlate

more with past-referring or present-referring VPs?

Comparing baselinesComparing baselines

• Does the present perfect correlate more with past-referring or present-referring VPs?



present present perf TotalLLC 2,696

ICE-GB 2,488

Total 5,184

present non-perf

33,13132,11465,245

35,82734,60

270,429

past present perf TotalLLC 2,696

ICE-GB 2,488

Total 5,184

other TPM VPs

18,20114,29332,494

20,89716,78

137,678




– Present perfect correlates more withpresent-referring VPs

present present perf TotalLLC 2,696

ICE-GB 2,488

Total 5,184

present non-perf

33,13132,11465,245

35,82734,60

270,429

past present perf TotalLLC 2,696

ICE-GB 2,488

Total 5,184

other TPM VPs

18,20114,29332,494

20,89716,78

137,678

d% = -4.455.13%’ = 0.02272 = 2.68ns

d% = +14.925.47%’ = 0.06942 = 25.06s


Different constraints apply in each Different constraints apply in each case?case?• Speakers choices are influenced by

multiple pressures– to talk about a single ‘choice’ is misleading– there is no such thing as free variation

• We are not attempting to infer “the reason” for a particular speaker decision– we are attempting to identify statistically

sound • patterns • correlations• trends

– across many speakers

Different constraints apply in each Different constraints apply in each case?case?• Does one or more of these multiple

constraints represent a systematic bias on the true rate?Yes = try to identify it experimentally No = ‘noise’

• Can focus on subset of cases to restrict different influences– e.g. limit shall / will by modal semantics

• This objection is misplaced:– freedom to vary

=grammatical and semantic possibility (potential)=not that choices are free from influence

A competitive ecology?A competitive ecology?

• Not everything is a binary choice– but the same principles apply

hoping to

hoping that / Ø

hoping for

0%

20%

40%

60%

80%

100%

1920s 1960s 2000s

p

0%

20%

40%

60%

80%

100%

‘cogitate’

‘intend’

quotative

interpretive

1920s 1960s 2000s

p

(Levin, forthcoming)

Meanings of THINK Complementation patterns of HOPE

ConclusionsConclusions

• Researchers need to pay attention to questions of choice and baselines– This does not mean that an observed change is due

to a single source

• Minimum condition: baseline is a control– statistics evaluate difference from this control

• is it a good control?

• Alternation studies: baseline is opportunity for making choice under investigation

• Word-based baselines should only really be used for comparison with other studies– we should not make statements about choice

unless we investigate that question

ConclusionsConclusions

• ‘Alternation’ can be interpreted – strictly

• all Type As and Type Bs identified and cases checked

– generously• small number of Type Cs permitted

– Alternation is semantically bounded but grammatical analysis helps identify cases!

• We may try different experimental designs, modifying baselines and subsets– many more novel experiments are possible

• experimental assumptionsshould always be clearly reported

ReferencesReferences

ACLW: Aarts, B., J. Close, G. Leech and S.A. Wallis (eds.) (forthcoming). The Verb Phrase in English: Investigating recent language change with corpora. Cambridge: CUP.

Preview at www.ucl.ac.uk/english-usage/projects/verb-phrase/book.

• Aarts, B., J. Close and S.A. Wallis. forthcoming. Choices over time: methodological issues in investigating current change. ACLW Chapter 2.

• Biber, D. and B. Gray. forthcoming. Nominalizing the verb phrase in academic science writing. ACLW Chapter 5.

• Bowie, J., S.A. Wallis and B. Aarts, forthcoming. The perfect in spoken English. ACLW Chapter 13.

• Levin, M., forthcoming. The progressive verb in modern American English. ACLW Chapter 8.

• Nelson, G., S.A. Wallis and B. Aarts. 2002. Exploring Natural Language. Amsterdam: John Benjamins.

• Wallis, S.A. forthcoming. Capturing linguistic interaction in a grammar:a method for empirically evaluating the grammar of a parsed corpus.

Statistical postscriptStatistical postscript

• Type Cs make statistical tests less sensitive– What happens to confidence intervals as we

add to F (A)+F (B) = 100 alternating cases?

0

0.05

0.1

0.15

0.2

0.25

100 1,000 10,000

5

20

40

60

80

95

F (A)eN/100

N

Tests assume freedom to vary (F (A)+F (B) = N )

Including Type Cs makes statistical tests conservative

That vexed problem of choice reflections on experimental design and statistics with corpora ICAME 33 Leuven 30 May-3 June 2012 Sean Wallis, Jill Bowie.

Documents

f b slide

b condition

b increases

replacement b

events of type b

alternation analysis

forthcoming slide

b pronoun whowhom