Top Banner
English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE) Sean Wallis UCL
21

English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Jan 18, 2016

Download

Documents

Gerry

English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE). Sean Wallis UCL. Barber (1964): changes in English grammar. a.A tendency to regularize irregular morphology (e.g. dreamt - dreamed ); - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

English Corpus Linguistics

Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Sean WallisUCL

Page 2: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Barber (1964): changes in English grammar

a. A tendency to regularize irregular morphology (e.g. dreamt- dreamed);b. A revival of the “mandative” subjunctive, probably inspired by formal US

usage (we demand that she take part in the meeting);c. Elimination of shall as a future marker in the first person;d. Development of new, auxiliary-like uses of certain lexical verbs (e.g. get,

want – cf., e.g., The way you look, you wanna / want to see a doctor soon);e. Extension of the progressive to new constructions, e.g. modal, present

perfect and past perfect passive progressive (the road would not be being built/ has not been being built/ had not been being built before the general elections);

f. Increase in the number and types of multi-word verbs (phrasal verbs, have/take/give a ride, etc.);

g. Placement of frequency adverbs before auxiliary verbs (even if no emphasis is intended – I never have said so);

h. Do-support for have (have you any money? and no, I haven’t any money - do you have/ have you got any money? and no, I don’t have any money/ haven’t got any money)…

Page 3: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

The Diachronic Corpus of Present-daySpoken English (DCPSE)

– Orthographically transcribed spoken BrE– Fully parsed

• every ‘sentence’ has a tree diagram• searchable with ICECUP and FTFs

– 400,000+ words each from• London-Lund Corpus (aka The ‘Survey Corpus’)• ICE-GB

– Balanced by text category– Not evenly distributed by year

• LLC: samples from 1958-1977• ICE-GB: 1990-1992

Page 4: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Tree diagrams

A tree diagram for the sentence We’re getting there.

Page 5: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Barber on shall and will

• [T]he distinctions formerly made between shall and will are being lost, and will is coming increasingly to be used instead of shall. One reason for this is that in speech we very often say neither [will] nor [shall], but just [’ll]: I’ll see you to-morrow, we’ll meet you at the station, John’ll get it for you. We cannot use this weak form in all positions (not at the end of a phrase, for example), but we use it very often; and, whatever its historical origin may have been (probably from will), we now use it indiscriminately as a weak form for either shall or will; and very often the speaker could not tell you which he had intended. There is thus often a doubt in a speaker’s mind whether will or shall is the appropriate form; and, in this doubt, it is will that is spreading at the expense of shall, presumably because will is used more frequently than shall anyway, and so is likely to be the winner in a levelling process. So people nowadays commonly say or write I will be there, we will all die one day, and so on, when they intend to express simple futurity and not volition.

(Barber 1964: 134)

Page 6: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Denison on shall and will

• During the latter part of our period [1776-present day] ... in the first person shall has increasingly been replaced by will even where there is no element of volition in the meaning.

(Denison 1998: 167)

Page 7: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

The use of shall and will in written British and American English from the 1960s and 1990s

• Figures are normalised per million word frequencies• Log likelihood LL is performed against number of words

BrE LOB FLOB LL diff %will 2,798 2,723 1.2 -2.7%shall 355 200 44.3 -43.7%

AmEBrownFrown LL diff %will 2,702 2,402 17.3 -11.1%shall 267 150 33.1 -43.8%

From: Mair and Leech (2006: 327)

Page 8: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Mair and Leech’s data

• Simply counts tagged lexical tokens– Will = auxiliary verb, includes ’ll– Shall = auxiliary verb– Includes negative forms

• Does not distinguish by grammatical position or context– Does not ask whether the choice is available, e.g. limit to first

person use– Does not consider subclasses separately

• Negative cases: will not/won’t vs. shall not/shan’t?

• Do interrogative cases behave differently?

• Is written data only• Can we do better than this?

Page 9: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

An FTF for first person declarative shall

• This FTF is limited to first person cases– The FTF requires that the NP is realised by the pronoun I or we.

• Interrogative cases have a different structure• We can subtract negative (shall not) cases to exclude

them.

Page 10: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Shall vs. will

• Does the proportion of cases of shall out of {shall, will} change over time?

² for first person subject; shall vs will

d% = percentage difference (30% fall in shall between LLC and ICE-GB)

= an estimate of the size of the overall effect (a bit like d%)

2 = 2x2 chi-square test: is this change statistically significant?

2(shall) = 2x1 goodness of fit test: does shall behave differently to average?

shall will Total2(shall) 2(will) Summary

LLC 110 78 188 1.32 1.45 d% = -30.24% 20.84%

ICE-GB 40 58 98 2.53 2.79 = 0.17

TOTAL 150 136 286 3.85 4.24 2 = 8.09

Page 11: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Shall vs. will/’ll

• Does the proportion of cases of shall out of {shall, will, ’ll} change over time?

² for first person subject; shall vs will vs. ’ll

shall will ’ll Total2(shall) 2(will) 2(’ll)

LLC 104 69 371 544 9.98 0.13 2.33

ICE-GB 36 52 365 453 11.98 0.16 2.80

TOTAL 140 121 736 997 21.96 0.30 5.13

2(shall) = 2x1 goodness of fit test: does shall behave differently to average?

Page 12: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Focusing on choice

• We focused on the choice of shall vs. will– Mair and Leech simply said that total cases of shall fell– But this might have happened for other reasons

• For example there may have been more opportunities to use shall in the LLC data

• Examining choice is a more precise way of conducting experiments than counting frequencies– It allows us to consider what variables (time, genre, other choices)

affect the probability of shall being chosen

• Probability is a simple fraction from 0 to 1.– p(shall) = F(shall)

F(shall) + F(will)+…

Page 13: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Probability of shall vs. will over time

0

0.2

0.4

0.6

0.8

1

1955 1960 1965 1970 1975 1980 1985 1990 1995

p

Page 14: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Probability of shall vs. will/’ll over time

0

0.2

0.4

0.6

0.8

1

1955 1960 1965 1970 1975 1980 1985 1990 1995

p

Page 15: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Confidence intervals

• Probability p(shall):0 = no cases are of type shall1 = all cases are of type shall

• Our sample is a tiny subset of possible sentences from the same period– So we cannot say a particular observation is certain– Instead we try to estimate our confidence in an

observation using error bars or confidence intervals• The more data we have supporting an observation p,

the smaller the confidence interval around it• We set a confidence level, typically of 95%

– we are 95% sure that the true value is within the interval

0

0.2

0.4

0.6

0.8

1

1955 1960 1965 1970 1975 1980 1985 1990 1995

p

Page 16: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Modal meaning

• Remember Barber and Denison. Not all cases of shall or will mean the same thing– Root (futurity):

• I’ve got some at home so I shall take it home. [DI-A18 #30]

• I will answer you in a minute. [DI-B30 #293]

– Epistemic (volition):• So I shall have roughly from the twenty-ninth of June to the

eighth of July on which I can spend the whole of that time on those two papers. [DL-B01 #62]

• It’s certainly my long term hope that I will have some kind of companion... [DI-B53 #0257]

• We should examine these choices separately– Unfortunately this means classifying cases manually

Page 17: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Modal meaning: statistics

• Root shall / will is stable: results are not significant• Epistemic shall / will falls (d% = -30% 27%)

– The fall in shall is not explained by the sharp fall in Epistemic modals overall - from 100 (72+28) to 28 (14+14)

– This is evidence that the shift in use in C20 is concentrated within Epistemic meanings, from shall to will.

– Barber and Denison: earlier shift was in Root (future) meaning.

shall LLC 33 30.84 72 67.29 2 1.87 107ICE-GB 22 59.46 14 37.84 sig 1 2.70 37

will LLC 44 55.70 28 35.44 7 8.86 79ICE-GB 37 66.07 14 25.00 5 8.93 56

Total 136 128 sig 15 279

Root % Epistemic % Unclear % Total

Page 18: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Modal meaning: statistics

• Shall is losing its particular Epistemic meaning as a result– In the LLC data two thirds (67%) of shall uses were Epistemic.– This fell to 37% (just over one third) in ICE-GB.

shall LLC 33 30.84 72 67.29 2 1.87 107ICE-GB 22 59.46 14 37.84 sig 1 2.70 37

will LLC 44 55.70 28 35.44 7 8.86 79ICE-GB 37 66.07 14 25.00 5 8.93 56

Total 136 128 sig 15 279

Root % Epistemic % Unclear % Total

Page 19: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Conclusions

• DCPSE is – orthographically transcribed spoken English

• mostly spontaneous

– fully parsed and checked by linguists, uses phrase structure grammar based on Quirk et al.

– searchable with ICECUP and FTFs

• Even lexical studies benefit from parsing– allows us to focus on when a choice occurs

• You can use DCPSE to carry out many different experiments on real English– we looked at change over (recent) time– we might also look at how decisions interact

Page 20: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Conclusions

• Designing a Corpus Linguistic experiment means thinking carefully about your hypothesis and then attempting to test it against the corpus– We examined the shift from shall to will– We limited it to first person, declarative, positive cases– Changing baselines (including ’ll) may lead to different conclusions

• Many corpus studies only consider word baselines (or pmw)

• But it is often better to consider proportions of types of clause or phrase, or list specific alternative choices

– Alternation (choice) studies aim to hold meaning constant so the speaker/writer is free to choose between both cases:

• We focused further by subdividing data by modal meaning

Page 21: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Suggested further reading

• On shall vs. will and the progressive:– Aarts, B. Close, J. and Wallis S.A. (forthcoming) Choices over time:

methodological issues in investigating current change. In: B. Aarts et al. The changing Verb Phrase, Cambridge: CUP.

• www.ucl.ac.uk/english-usage/projects/verb-phrase/book/aartsclosewallis.pdf

– Barber, C. (1964) Linguistic Change in Present-Day English. Edinburgh and London: Oliver and Boyd.

– Denison, D. (1998) Syntax. In: S. Romaine (ed.). The Cambridge History of the English Language. IV: 1776-1997. Cambridge: Cambridge University Press. 92-329.

– Mair, C. and Leech, G. (2006) Current changes in English syntax.In: B. Aarts and A. McMahon (ed.) The Handbook of English Linguistics. Malden MA: Blackwell Publishers. 318-342.

• On statistical tests, confidence intervals and other methods:– Wallis, S.A. (2010) z-squared: the origin and use of 2. Survey of English

Usage, UCL.• www.ucl.ac.uk/english-usage/statspapers/z-squared.pdf