Differential Possessor Expression in English: Re- evaluating Animacy and Topicality Effects Catherine O'Connor Boston University Arto Anttila NYU Vivienne Fong NYU Joan Maling Brandeis University Annual Meeting of the Linguistic Society of America January 9 - 11, 2004 Boston, Massachusetts
Differential Possessor Expression in English: Re-evaluating Animacy and Topicality Effects Catherine O'Connor Boston University Arto Anttila NYU Vivienne Fong NYU Joan Maling Brandeis University Annual Meeting of the Linguistic Society of America January 9 - 11, 2004 Boston, Massachusetts. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Differential Possessor Expression in English: Re-evaluating Animacy and Topicality Effects
Catherine O'Connor Boston University Arto Anttila NYU
Vivienne Fong NYU Joan Maling Brandeis University
Annual Meeting of the Linguistic Society of AmericaJanuary 9 - 11, 2004
Boston, Massachusetts
The question:What are the factors that drive the English alternation between the "Saxon genitive" and the "Of genitive"?
The man's widow
The widow of the man
X'SSpec
OF-XComp
Hypothesis 1:ANIMACY
The X'S construction tends to attract animate possessors/ modifiers, and the OF-X construction tends to attract inanimate possessors/ modifiers. (Jespersen, Rosenbach, Stefanowitsch, Anschutz, R. Hawkins...)
This is a statistical tendency, at best:
Walking's many virtuesX'S
The many virtues of walkingOF-X
Hemispheres Magazine, 2001
DISCOURSE STATUS
The X'S construction attracts old, topical, or highly accessible modifiers. The OF-X construction attracts newer or less accessible modifiers (Deane, Anschutz)
Hypothesis 2:
This is also a statistical tendency:
its rejection (a bill)X'S
...recommend passage of itOF-X
A neighbor's carX'S
The car of a neighborOF-X
Hypothesis 3:
WEIGHT
The X'S prenominal construction attracts lighter modifiers, and the OF-X construction attracts heavier modifiers (Stefanowitch, also cf. Arnold et al., J.Hawkins, Wasow)
An analytical problem:These three hypotheses are
seriously confounded:
Humans are often topical
His advocacy of betting on the ponies
Pronouns are lightTopics are
often expressed as pronouns
The glass of waterOf-X
Another analytical problem:Which examples can legitimately be
expected to alternate between Of-X and X's?
Water's glass X's
There are many, many such distractors
Our plan of inquiry:1. Secure a large number of OF-X and X'S
tokens in the Brown Corpus.
2. Exclude tokens of non-reversible types.
3. Code remaining tokens for weight, animacy and discourse status.
4. Control for confounds where possible, and try to model the statistical findings within an OT grammar, following work by Aissen and others.
1. Cleaning the sample
First: exclude non-nominals.Using F.Karlsson's part-of-speech tagged version
of the Brown corpus (1995), we excluded all irrelevant Of-NP and NP's tokens.
A few examples:
Verbal OF-X: He thought of her.
Adjectival OF-X: bald and afraid of women.
Contraction X'S: Kate's all right.
"All NP" sample, after removal of non-nominal examples
X'S
4744
47%
OF-X
5263
53%
N = 10,006
Second: exclude all tokens of non-reversible constructions
A few examples:
Partitives: half of his stirrup guard
Measure and a drop of liquorcontainer phrases: two saucers of water
Classifier phrases: a grove of trees a flight of wooden steps
Configuration and: strips of skin constitutive phrases a...castle of pine boughs
Second: exclude all tokens of non-reversible constructions
and many more:
'Sort' phrases the crassest kind of materialism
Headless OF-X: that of a frustrated gnome
a man of brooding suspicions[the] concept of the white-suited big-daddy colonel
the notion of philosophy as Queen Bee. . .
Nominal dog-eared men's magazinescompounds:
Partially clean sample after removal of 'strict' non-reversibles
N = 7,443
X'S
4604
62%
OF-X
2839
38%
X'S
4604
62%
Third: exclude tokens where reversal substantially alters meaning--
'soft' non-reversibles(a)Idioms, fixed phrases, and titles
(b) Deverbal nominals with argument constraints
(see handout for more examples)
bachelor of science *science's bachelor
Satan's L'il Lamb #the L'il Lamb of Satan
fear of him his fear€
≠
Cleaner sample after removal of 'soft' non-reversibles
N = 6570
X'S
4585
70%
OF-X1985
30%
2. Coding the sample
My sister's houseX'S
The house of my sisterOF-X
For each token,
and the modifier.
Each was coded for animacy, definiteness, NP form, and weight.
Arnold et al., Wasow, and J. Hawkins assert that the [orthographic] word is a reasonable measure of weight for most purposes.
It is also easily automated.Each head and modifier were coded for weight in words, from 1 through >20.
How to code for Discourse Status?
Even simple codes such as 'New', 'Inferrable', and 'Old' are quite time-consuming, although they are clearly desirable.
With thousands of tokens, we chose instead to exploit certain robust relationships between NP form and discourse status / accessibility.
Relying on previous research of Prince, Gundel et al., Ariel, i.a., we coded modifiers and heads for NP form and for morphosyntactic definiteness.
Coding for NP Form and Definiteness:
Pronoun
Proper Noun
Common Noun (definite)
Common Noun(indefinite)
Most accessible, most topical, discourse-old...
Least accessible, least topical, discourse-new...
3. Controlling weights
After we coded our clean sample for weight, we noticed that 99% of our X'S examples had possessive modifiers that were 1, 2, or 3 words in weight.
We controlled for weight effects by limiting OF-X tokens to those of 1, 2, or 3 words in weight.
his only attack on the Republicansthe taxpayers' pockets
Speaker Sam Rayburn's forces
the invasion of Cubathe rapid growth of juvenile delinquency
the 9th precinct of the 23rd ward
Cleanest sample, after removal of modifiers greater than 3 words in weight
X'S
4455
75%
OF-X1485 25%
N = 6034
3. Generalizations
We decided to convert the raw numbers of X'S and Of-X tokens into ratios.
For example, of 4177 animate tokens
X'S Of-X
3909 268
€
3909268
≈ 151
X'S
Of-X
Let's compare the inanimate tokens:
1359 inanimate tokens
X'S Of-X357 1002
€
3571002
≈ 13
X'S
Of-X
Ratio of X'S to OF-X by Animacy categoryin Cleanest sample (n=6034)
0.1
1
10
100
Animate Org Inanimate
15 : 1
1.3 : 1
1 : 3
favors X'S
logscale
favors Of-X
(N=3937) (N=498) (N=1359)
Ratio of X'S to OF-X by NP form type in 'Cleanest' sample (n=6034)
0.1
1
10
100
1000
Pronoun Proper Comm.Def Comm.Indef.
297:1
1.85 :11 : 7.71 : 2.3
favors X'S
favors Of-X
(N=3577) (N=971) (N=947) (N=539)
Both animacy and discourse status seem to have a large effect. How about weight? Do we find effects of similar magnitude when we examine our possessive modifiers by our three weight values--1, 2 and 3 words?
Yes: here, the results span two orders of magnitude.
Ratio of X'S to OF-X by Weight in Cleanest sample (n=6034)
0.1
1
10
100
W=1 W=2 W=3
(N=4443) (N=1174) (N=417)
10 : 1
1 : 2
1 : 5
favors X'S
favors Of-X
Animacy, discourse status, and weight all show strong effects. If we hold one factor constant, do the other factors disappear?
First we will hold animacy constant and look at the effects of discourse status, through the proxy of NP form.
Ratio of X's to OF-X by NP form, controlling Animacy (n=6034)
0.01
0.1
1
10
100
1000
10000
Animate Org Inanimate
Pronoun
Proper
Comm.Def
Comm.Indef
(N=4177) (N=498) (N=1359)
favors X'S
favors Of-X
If we hold NP form constant, do the Animacy rankings hold up?
Ratio of X's to OF-X by Animacy category, controlling NP form (n=6034)
0.01
0.1
1
10
100
1000
10000
Pron. Prop. Com.Def Com.Indef
AnimateOrgInanimate
(n=3577)
(n=971)
(n=947)
(n=539)
Do the animacy and discourse status ratios hold the same relative values when we control for weight?
Yes.
If we repeat the process for all tokens with modifiers of weight 1, 2, and 3, the relative ranking of ratios stays the same, and the magnitude of the differences persists.
Just how robust are these results?The animacy and discourse status effects remained intact no matter what we controlled for. The relative ranking of the animacy and NP form categories was unchanged, although the ratios themselves differed somewhat in magnitude.
What would happen if we computed the same ratios on our original sample of 'All NPs' (n=10,006)? Did all our laborious extractions and exclusions really make a difference?
Return to initial sample, "All NPs"
X'S
4744
47%
OF-X
5263
53%
N = 10,006
Ratio of X'S to OF-X by NP form type: Comparison of Cleanest (n=6034) vs. All NPs (n=9963)
0.01
0.1
1
10
100
1000
Cleanest All NPs
Pronoun Proper Comm.Def Comm.Indef
297
0.96
0.040.150.13
0.44
1.85
28
Ratio of X'S to OF-X by Animacy: Comparison of Cleanest (n=6140) vs. All NPs (n=9963)
0.1
1
10
100
Clean All NPs
Animate Org Inanimate
5.53
0.59 0.11
14.59
1.29
0.36
4. Interpreting the results
Recall our goal:4. Control for confounds where possible, and try
to model the statistical findings within an OT grammar, following work by Aissen and others.
This is in progress, with very good results. A set of three binary constraints fits the data from our corpus study, and makes predictions that can be tested cross-linguistically.
OT Analysis
In this preliminary phase, we classify possessors in terms of
(a)*P/C ‘No [+pron] in Comp.’*P-NP/C ‘No [±pron] in Comp.’
(b) *A/C ‘No [+anim] in Comp.’
*A-I/C ‘No [±anim] in Comp.’
(c)*H/C ‘No [+hum] in Comp.’ *H-NH/C ‘No [±hum]in Comp.’
OT constraints for the Specifier(the X in X'S)
(a) *NP/S ‘No [−pron] in Spec.’*NP-P/S ‘No [±pron] in Spec.’
(b) *I/S ‘No [−anim] in Spec.’*I-A/S ‘No [±anim] in Spec.’
(c) *NH/S ‘No [−hum] in Spec.’ *NH-H/S ‘No [±hum] in Spec.’
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
she S S S S S S S S S S S S S S S S S S C
butler S S S S S S S S S S C S S C S C C C C
it (org) S S S S S S S S C S S S C S C S C C C
it (animal) S S S S S S S S S C S C S S C C S C C
it (other) S S S S C S C C C C S C C C C C C C C
dog S S S C S C S C S C C C C C C C C C C
government S S C S S C C S C S C C C C C C C C C
table S C C C C C C C C C C C C C C C C C C
19 predicted languages (out of 256 logically possible ones)
The OT grammar imposes a partial ordering on possessor types in terms of their
propensity to appear in the Specifier and Complement positions.
Aissen Lattice
she[+anim, +hum, +pron]
butler it (animal) it (organization)[+anim, +hum, -pron] [+anim, -hum, +pron] [-anim, +hum, +pron]
dog government it (other)[+anim, -hum, -pron] [-anim, +hum, -pron] [-anim, -hum, +pron]
table[-anim, -hum, -pron]
Empirical support?
Do the predicted rankings of affinity for Spec (X'S) hold in our corpus?
We examined the percentages of the relevant NPs in Spec vs. Comp:
Aissen Lattice
she[+anim, +hum, +pron]
butler it (animal) it (organization)[+anim, +hum, -pron] [+anim, -hum, +pron] [-anim, +hum, +pron]
dog government it (other)[+anim, -hum, -pron] [-anim, +hum, -pron] [-anim, -hum, +pron]
table[-anim, -hum, -pron]
99.9%
100%100%75%
37% 97%50%
9%
Confirmation
The predictions hold nearly perfectly in the 'Cleanest' sample.
Will they hold for the 'All NP' sample?
Aissen Lattice
she[+anim, +hum, +pron]
butler it (animal) it (organization)[+anim, +hum, -pron] [+anim, -hum, +pron] [-anim, +hum, +pron]
dog government it (other)[+anim, -hum, -pron] [-anim, +hum, -pron] [-anim, -hum, +pron]
table[-anim, -hum, -pron]
98%
99%100%31%
18% 76%35%
3%
Conclusions•We conducted a large corpus study in which
many features were controlled and investigated (relationality);
•We found robust effects for animacy, discourse status and weight; difficult to disaggregate;
•We modelled these using three binary features and six pairs of constraints; we made predictions about crosslinguistic factorial typology.
Conclusions
"Using OT you can probably model a guy frying an egg." (Arto Anttila)
However, the inevitable question: "So what?"
Moreover, our OT model treated discourse status ("±pronoun") and humanness and animacy as independent factors. This is probably wrong, and there is nothing explanatory about it.
ConclusionsThe question we would like to answer is a
metatheoretical question:
What does it mean for the theory of grammar that these animacy and discourse and weight effects are so robust, and yet so inextricably combined? Is it possible to demonstrate in a principled way whether one derives from another, at least with respect to the alternation between the X'S and OF-X constructions? We haven't answered this question.
Conclusions
(See acknowledgements section of handout for URL and email details)
So now that we have spent several years looking closely at thousands of examples of the two constructions, and now that we have shown that some effects are so robust that you can get them with really unfiltered data, we would like to invite anyone who is interested to try their hand at these questions. We will be putting our coded corpus on the web soon and we invite you to investigate it yourself...
Acknowledgements
This research was supported by NSF grant BCS-008037, "Optimal Typology of Determiner Phrases". The support of the NSF Linguistics Program is gratefully
acknowledged. No endorsement of this research is implied.
Many thanks to our graduate research assistants:Gregory Garretson
Marj HoganBarbora Skarabela
and our undergraduate research assistants:Amy Rose Deal
John Manna
Many thanks also to Joan Bresnan, Annie Zaenen, and Tom Wasow, for discussions of animacy.
Thanks to Boston University students in LS 751, Spring 2002, for discussions of some of this material.
A conjecture:Claim: The prenominal position favors accessible, topical referents. (Many clause-level constructions tend towards 'old first, new last,' and this can be observed in the NP as well.)Observation: Discourse topics tend to be human; 1st and 2nd persons are among the most accessible entities in any speech situation.As speakers discuss events involving animate actors, inanimate NPs introduce background objects, properties, and arguments of many predicates.
A conjecture:How does this well-worn observation buy us anything?
If Animates are highly topical, and thus highly accessible entities, thus mentioned frequently, they will be expressed with pronouns; lighter entities are favored in initial position. So are older entities. The preference of Animates (particularly Humans) for the initial position is a by-product of their topicality and concomitant length.
A conjecture:If inanimates are usually mentioned only once or twice, they will predominantly get expressed as definite and indefinite common nouns that must be fully informative, thus longer.
Inanimates are not usually highly topical, and thus they do not favor the X'S slot. Their appearance in the OF-X construction is a by-product of their non-topical status and length, not a fact about animacy per se.
This may explain the redundancy between inanimates, common nouns, higher weights, and the OF-X position.
Hypothesis 4:
LEXICAL SEMANTICS
Some have claimed that features of the head noun (e.g. relationality) or the semantic relation between the head and modifier account for most of the variation (Stefanowitsch, Taylor, Barker, i.a.)
Relational semantics
Recall the lexical semantics hypothesis. We have tried to control for effects of nominal head semantics by excluding as many strict and soft non-reversibles as we can find. However, there is one more issue to deal with.
Barker, Stefanowitsch, and others have claimed that only relational heads are truly reversible. Yet our X'S sample includes many examples of possession of non-relational nouns. Kim's truck --> ??the truck of Kim
These are said to be irreversible because truck is not relational.
Relational semantics
If we limited our X'S and OF-X sample to only relational heads, would the animacy and discourse status effects disappear or persist?
To test this, we selected a large sample of relational heads from our Clean sample (which excludes all strict non-reversible constructions, e.g. partitives etc.) We disaggregated all the examples that had kinship or body part heads: his cousin or the feet of Fred Astaire.
This included 934 tokens.
Ratio of X'S to OF-X by NP form type. "Clean" sample: Relational Heads only (n=934)
0.1
1
10
100
1000
Pronoun Proper Comm.Def Comm.Indef
Rel.Heads
9.0
.221.89
350
Relational semantics
Then we took all the tokens that had Concrete Inanimate heads (excluding body parts, which are relational.)
Not all of these examples are non-relational, but the prototypical examples of non-relational nouns often include concrete objects that may be possessed by humans but do not have any discernible argument structure of their own.
This included 489 tokens.
Ratio of X'S to OF-X by NP form type. Clean sample: Relational Heads (n=934) vs.
Concrete Inanimate Heads (n=489)
0.1
1
10
100
1000
Pronoun Proper Comm.Def Comm.Indef
Rel.Heads Conc.Heads
9.0
.221.89
350 327
1.9.52 .72
Text analysis (Brown Corpus): an excerpt from a Western novel.
Some facts:2000 words
Approximately 650 coded NP 'mentions'Approximately 150 distinct 'referents'
Dan Morgan his dry lipsnight his plans and dreamssleep a wife ...as fickle as Ann
Core participants:those with the most mentionsDan Morgan: 165 mentions
The visitors: 63 mentionsSharon Jones: 87 mentions
Billy Jones: 43 mentions
Peripheral participants: non-core a trick Al Budd had thought up: 2 mentions
a pathetic, woebegone expression: 1 mentionan idiot: 1 mention
ordinary years: 1 mention
Core participants:Dan Morgan: 165 mentions
The visitors: 63 mentionsSharon Jones: 87 mentions
Billy Jones: 43 mentions
How are core participants realized?More than half of the NPs in the text refer to one of these four "Core" participants. Of these 358
mentions, 83% are pronouns.
How are peripheral participants realized?Over 75% of the 308 "Peripheral" participants
are mentioned only once or twice.Of these 308 NPs, over 85% are
common nouns.
Peripheral elements:a trick Al Budd had thought up: 2 mentions
a pathetic, woebegone expression: 1 mentionan idiot: 1 mention