Differential Case Marking: Objects

Optimalitätstheorie und Pragmatik

Kompaktseminar an der Universität WienSommersemester 2005

Manfred Krifka

Stochastische OptimalitätstheorieLernalgorithmen

Evolutionäre Optimalitätstheorie

Differential Case Marking: ObjectsIn many languages, case marking of subject and object

depends on a variety of factors.Hebrew: Only definite object NPs are case marked.

Ha-seret her’a ‘et ha-milxama.‘the-movie showed ACC the-war’Ha-seret her’a (*‘et) milxama.‘the-movie showed (*ACC) war’

Spanish: Only animate object NPs are case marked.Busco a una señora.I-look-for ACC a woman.Busco (*a) una casa.‘I-look-for (*ACC) a house.’

Bossong (1985): differential object marking, attested in more than 300 languages.

Explanation in Aissen (2002) Two scales that determine differential object marking:• Animacy:

Human > Animate > Inanimate• Definiteness:

Pers.Pronoun > Name > Def.NP > Indef.Spec.NP > Nonspec. NPGeneralization: Object marking more likely at the high end of the scales.

A closer look: DOM in medieval Spanish

From Judith Aissen: Differential Object Marking. Iconicity vs. EconomyDraft, Stanford 2000

Differentielle Objektmarkierung im Deutschen: Nominativ / Akkusativ - Synkretismus

Auch im Deutschen finden wir differentielle Objektmarkierung, determiniert durch Genus:

Maskulinum: Der Mann sieht den Hasen.Der Hase sieht den Mann. NOM AKK

Femininum: Die Frau sieht den Hasen. Der Hase sieht die Frau. NOM = AKK, SynkretismusNeutrum: Das Kind sieht den Hasen. Der Hase sieht das Kind. NOM = AKK, Synkretismus

Synkretismus im Neutrum ist ererbt (allgemein in indogerman. Sprachen), im Feminum hat er sich im Mittelhochdeutschen / Frühneuhochdeutschen herausgebildet.

Synkretismus innerhalb einer Flexionsklasse der Nomina (n-Stämme) nach Belebtheit:der Mensch / den Mensch-en, der Bote / den Bot-en, der Hase / den Has-en ...der Regen / den Regen, der Kragen / den Kragen, der Besen / den Besen ...

Fische unbelebt: der Karpfen / den Karpfen, der Rochen / den RochenUnterschiedlich kategorisierbare Fälle: der Same(n) / der Wille(n), der Friede(n), ...Dubletten: der Drache, der Drachen; der Rappe, der Rappen; der Lump, der Lumpen


Belebtheit als ein Faktor des Kasus-Synkretismus im Allgemeinen:Maskuline Nomina sind wahrscheinlicher belebt als femine.

Beispiel: Korpus von Ruoff (1981), 500.000 Wörter, gesprochene Alltagserzählungen aus dem schwäbischen Raum

Frequency of Animacy, Nouns > 8 occurrences, > 0.01 %

0

50

100

150

200

250

300

Masculine Feminine Neuter Pluraliatanta

InanimatesAnimates

Genus der 100 häufigsten Substantive,

die Belebtes bezeichnen,

Maskulina69%

Feminina16%

Neutra9%

Pluralia-tanta 6%


Belebtheit als ein systematischer Faktor: Nominalderivation• Maskuline Ableitungen sind oft animat:

Lehr-er, Lehr-ling, Praktik-ant, Psycho-loge• Feminine Ableitungen sind nicht animat:

Frei-heit, Freund-schaft, Kleid-ung, Diskuss-ion, Sing-ereiAusnahme: Movierung, Präsident-in.

• Neutrum-Ableitungen sind ebenfalls oft nicht animat.

Daher: Auch der Kasussynkretismus im Deutschen hat eine Affinitätzu allgemeinen Gesetzmäßigkeiten der differentiellen Objektmarkierung.

Differential Case Marking: Subjects

Differential subject marking (“Split Ergativity”): Example: Dyirbal, Australia.

1st and 2nd person pronouns: No marking of subject NP

ɲana banaga-nyu.ɲana ɲurra-na bura-n.we returned. we you-ACC saw.

ɲurra banaga-nyu ɲurra ɲana-na bura-n.you returned you us-ACC saw

Other pronouns and NPs: Ergative marking of subject of transitive sentence:ɲuma banaga-nyu ɲuma-ɲgu yabu bura-n. Father returned. Father-ERG mother saw.

Mixed system: ɲuma-ɲgu ɲurra-na bura-n.Father-ERG you-ACC saw

Hundreds of languages (Basque, Georgian, Hindi...) distribution of subject marking governed by similar scales (Silverstein 1976):

• Animacy: Human > Animate > Inanimate

• Definiteness: Pers.Pronoun > Name > Def.NP > Indef.Spec.NP > Nonspec. NP

• Generalization: Subject marking more likely at the low end of the scales.

Differential Case Marking: Scale Alignment

Aissen (2002): Case marking patterns as the result of alignment of two scales,here illustrated with definiteness scale.

harmonic alignment, case marking unlikely

disharmonic alignment, case marking likely

Alignment of two scales produces the following markedness scales:• Subj/pronoun > Subj/name > Subj/def > Subj/spec > Subj/nonspec• Obj/nonspec > Obj/spec > Obj/def > Obj/name > Obj/pronoun

Subject Object

pronoun name definite NP spec.indef. NP nonspec. NP

Scale Alignment and OT constraintsExpression of marking tendencies, Hebrew:

Relevant parts of basic hierarchies: Subj > Obj, +def > –defAligned hierarchies: Subj/+def > Subj/–def(harmonic > disharmonic) Obj/–def > Obj/+def (only this one relevant here)Correspond. constraint ranking: *Obj/+def >> *Obj/–def “Not marking definite objects is worse than not marking indefinite objects” better interpretation: “Case marking of definite objects is more important than case marking of indefinite objects”

Markedness constraint: *STRUC: Avoid Structure (explicit marking): Speaker Economy (not strictly necessary for Hebrew case, but relevant later)

Constraint ranking: *Obj/+def >> *STRUCT >> *Obj/–def

‘the movie showed the war / war’ *Obj/+def *STRUC *Obj/–def

Ha-seret hera ‘et ha-milxama *Ha-seret hera ‘ha-milxama *Ha-seret hera ‘et milxama *

Ha-seret hera milxama *

Derivation of Dyirbal System

The facts, again:

1st and 2nd person pronouns: No marking of subject NP

ɲana banaga-nyu. ɲana ɲurra-na bura-n.we returned. we you-ACC saw.

ɲurra banaga-nyu ɲurra ɲana-na bura-n.you returned you us-ACC saw

Other pronouns and NPs: Ergative marking of subject of transitive sentence:ɲuma banaga-nyu ɲuma-ɲgu yabu bura-n. Father returned. Father-ERG mother saw.

Mixed marking: ɲuma-ɲgu ɲurra-na bura-n.Father-ERG you-ACC saw

No marking:ɲana ɲuma bura-n.we Father saw

OT Constraints, Case marking in a Dyirbal-like LanguageBasic hierarchies, universal: S(ubj) > O(bj) 1(st) > 3(rd)Aligned hierarchies: S/1 > S/3 O/3 > O/1Generated constraint orders: *S/3 >> *S/1 *O/1 >> *O/3

“marking of S/3 is more important than marking of S/1”Combined constraints: {*S/3, *O/1} >> *STRUC >> {*S/1, *O/3 }

Subj Obj *S/3 *O/1 *STRUC *S/1 *O/3 1st-Ø 3rd-Ø * *

1st-ERG 3rd-Ø * *1st-Ø 3rd-ACC * *1st-ERG 3rd-ACC **3rd-Ø 3rd-Ø * *

3rd-ERG 3rd-Ø * *

3rd-Ø 3rd-ACC * *3rd-ERG 3rd-ACC * **

3rd-Ø 1st-Ø * *3rd-ERG 1st-Ø * *3rd-Ø 1st-ACC * *

3rd-ERG 1st-ACC **

Where do the hierarchies come from?Aissen simply assumes hierarchies like S > O, 1 > 3, def > indef as given.Bresnan, Dingare & Manning (2001), Zeevat & Jäger (2002):

The hierarchies can be explained by typical patterns of language use.Example:

Subjects and objects in 3151 simple transitive clausesof Swedish everyday conversation (SAMTAL corpus, Ö. Dahl)

total +def –def +pron –-pron +anim –anim

Subj 3151 3098 53 2984 167 2984 203

Obj 3151 1830 1321 1512 1639 317 2834

Biases in the SAMTAL Corpus

Probabities that subjects and objectshave certain properties,SAMTAL Corpus of spoken Swedish(collected by Ö. Dahl, analyzed by Zeevat & Jäger)

Resulting stastical biases, expressed as conditional probabilities e.g., p(Subj | +def): probability that a +def NP is subject: 63%

p(Subj | +def) = 63% p(Subj | –def) = 4%p(Obj | +def) = 37% p(Obj | –def) = 96%p(Subj | +pron) = 66% p(Subj | –pron) = 9%p(Obj | +pron) = 33% p(Obj | –pron) = 91%p(Subj | +anim) = 90% p(Subj | –anim) = 7%p(Obj | +anim) = 10% p(Obj | –anim) = 93%

This holds for a fairly large and representative corpus of spoken Swedish;findings can be reproduced in their tendencies for other languages, communities;but collecting further data absolutely necessary.

Statistical Bias and Bidirectional OT

Zeevat & Jäger (2002), Jäger (2003)Economical encoding:• Case marking is disfavored for frequent combinations,

e.g., indefinite objects: p(Obj | –def) = 96%• but case marking is favored for infrequent combinations,

e.g., indefinite subjects: p(Subj | –def) = 4% definite objects: p(Obj | +def) = 37%

A case for weak bidirectional optimization?• Preference for simple forms: –case >> +case• Preference for meanings that correspond to bias: Obj/–def >> Obj/+def

–case, Obj/–def

–case, Obj/+def +case, Obj/–def

+case, Obj/+def

Optimal pairs, case markingpattern of Hebrew.

Problem: There is no choiceto interpret a given NPas +def or –def;this is explicitly marked!

Statistical Bias and Bidirectional OT

Zeevat & Jäger assume the following constraints:• *STRUC: Avoid structure, i.e. avoid overt marking• FAITH: Faithful interpretation of case morphemes,

e.g. ACC: Obj, ERG: Subj• BIAS: An NP of a certain category is interpreted as having

the grammatical function that is most probable for this category, e.g. Obj: inanimate

Ranking: FAITH >> BIAS >> *STRUCHearer optimality and speaker optimality (Asymmetric Bi-OT):• Hearer optimality: For a given form,

choose the meaning that shows the least severe constraint violation!In the case at hand, interpret an NP according to its case marking pattern;if there is no case marking, follow statistical bias (I-Implicature)

• If two competing forms are both hearer optimal for a given meaning,speaker can choose the preferred one (here: the one without case marking)

Hearers have to be served first, as Speakers want to be understood.Definition:• A pair F, M GEN is hearer-optimal iff there is no alternative F, M’ GEN

such that F, M’ > F, M.• A pair F, M GEN is optimal iff it is hearer-optimal

and there is no alternative form F’, M GEN such that F’, M is hearer-optimal and F’, M > F, M.

Example: Animacy in a language with ERG and ACC

Form Meaning FAITH BIAS *STRUC

anim-ERG Subj *Obj * * *

anim-Ø Subj

Obj *anim-ACC Subj * *

Obj * *

inanim-ERG Subj * *Obj * *

inanim-Ø Subj *Obj

inanim-ACC Subj * * *Obj *

hearer-opt

opt

From Pragmatics to Grammar?

One caveat:The OT-tableaus typically abstract away from important factors, e.g. word order, plausibility, selectional restrictions.The lightning killed the man.Even though the man is animate and in object position, it wouldn’t need case marking, as only animates can be killed.

A second caveat:Case marking is typically part of the core grammar,and not derived by pragmatic rules.But: Pragmatic tendencies as one source of core grammar(functionalist view of grammar).

Motivation for Stochastic Optimality Theory

Judith Aissen (2000) and Joan Bresnan (2002):There is not just a universal tendency towards differential case markingin the core grammars of language,but it can be also describe optional case marking within a language.

Example: Case marking by postpositions in colloquial Japanese(data: Fry 2001, Ellipsis and w-marking in Japanese):Subj/anim: 60% Subj/inanim: 70%Obj/anim: 54% Obj/inanim: 47%

Obligatory case marking patterns can be seen as extreme casesof statistical marking patterns, e.g. Spanish:Obj/anim: 100% Obj/inanim: 0%

Stochastic Optimaltiy Theory (StOT), Boersma (1998), Functional Phonologydeveloped originally for phonological phenomena, can be used to model this intuition:Core grammar phenomena are not essentially differentfrom statistical tendencies based on usagein phenomena that core grammar leaves, to a certain degree, optional.

Stochastic Optimality Theory (StOT)

Main differences between standard OT and Stochastic OT:• Constraint ranking on a continuous scale

Every constraint is assigned a real numberwhich determines the ranking of the constraintsand is a measure for the distance between them.

• Stochastic evaluation:For each evaluation, the placement of a constraintis modified by adding a noise value with normal distribution.The ordering of the constraints after adding this noise valuedetermines the actual evaluation of the set of candidates.

Constraints C1, C2 overlap:mostly C1 >> C2 sometimes C2 >> C1

Constraints C1, C2 do not overlap:C1 >> C2 (almost) all the time

Stochastic OT: Ordering Probabilities

Difference between mean values > 10:C1 dominates C2 categorically,p(C2 > C1) < 10-10

Difference between mean values 5:preference for C1 >> C2,but C2 >> C1 lead to grammatical results,p(C2 > C1) 10%

Difference between mean values = 0no ranking preferences,p(C2 > C1) = p(C1 > C2) = 50%,random outcomes.

Statistical OT and Gradual Learning

Boersma (1998), Boersma & Hayes (2001), in Linguistic Inquiry:Gradual Learning Algorithm (GLA) for learning constraint rankings(not for learning of possible candidates, GEN)

• In phonology: GEN: pairs of phonological formsand phonetic interpretations: //, []

• In semantics/pragmatics: GEN: pairs of syntactic/morphological forms and semantic/pragmatic interpretations: F, M

Boersma’s Gradual Learning Algorithm (GLA)

0. Initial state: All constraint values are set to 01. Learning datum: input-output pair i, o 2. Generation:

a. For each constraint, a noise value with probability following normal distribution,is added to current ranking: This is the selection point of the constraint.

b. Constraints are ranked by order of their selection points.c. The grammar generates an output o’ for the input i; alternative pair: i, o’

3. Comparison: If o’ = o, nothing happens. Otherwise, algorithm compares the constraint violationsof the learning datum i, o with the generated datum i, o’

4. Adjustment:a. All constraints that favor the learning datum i, o over the self-generated i, o’ are increased by a small predefined numerical amount (“plasticidy”)b. All constraints that favor the self generated i, o’ over the learning datum i, o are decreased by the plasticity value.

5. Final state: Steps 1 – 4 are repeated until the constraint values stablize.Plasticidy may change over life time from high to low.

Bidirectional Gradual Learning Algorithm (BiGLA)Jäger (2003): ‘The bidirectional gradual learning algorithm’• Speaker-based learning:

Input: Meaning, Output: Form. i, o = M, FSpeaker compares different forms.

• Hearer-based learning: Input: Form, Output: Likely meaning. i, o = F, MHearer compares different meanings.

Hearer also uses speaker-based reasoning:On hearing F, M with likely meaning M, speaker checks: Would I have used a different F’ to express M?If yes: Adjust rankings to increase likelihood of using F to express M.

observed form observed likely meaning

hypothesized form

hypothesized meaning

Modelling Pragmatics

The Bidirectional Gradual Learning Algorithm (BiGLA)can be tested experimentally.

Implementation: evolOT, downloadable with files at: http://uni-potsdam.de/~jaeger/nasslli03

Example: Differential Object Marking triggered by definiteness (e.g., Hebrew); input: Statistical distributions of SAMTAL corpus.

Development of Differential Object Marking

generations

rank

ing

diffe

renc

es b

etw

een

cons

train

ts

constraints

Starting state:constraints start out

equally ranked

After 1000 generations,ranking of constraints firmly established,

including previously observedm:Obj/+def >> *STRUCT >> m:Obj/–def

markdefiniteobjects!

*STRUC

markindefiniteobjects!

Development of Split Ergativity (Animacy)mark

animateobjects!

mark inanimate subjects!

don’t mark animate subjects!

don’t markinanimateobjects!

Start out with high value of FAITH:Every NP is case marked

Lower value of FAITH:Fewer NPs are case marked

*STRUC

Development of Split Ergativity: Initial State doesn’t matter

markanimateobjects!

markanimateobjects!







*STRUC

*STRUC

Development of Split Ergativity: Initial State doesn’t matter

markanimateobjects!

markanimateobjects!







*STRUC*STRUC

Learning under the Microscope: Speaker Mode

Incoming datum: Subj.anim-Ø (non-marked animate subject)In speaker mode:

Algorithm produces one of the forms:a. Subj.anim-Ø (= learning datum, nothing happens)b. Subj.anim-ERG (satisfying FAITH)

Comparison with learning datum:b. *STRUC favors datum and is promoted, m:S/+a disfavors datum and is demoted. Ultimately, *STRUC will rank higher than m:S/+a, suppressing marking of animate subjects.

In general: If a form is produced that differs from the datum and is– a non-marked NP: promotion of *STRUC and/or demotion of marking constraint (see example)– a case-marked NP: demotion of *STRUC, promotion of FAITH if case marking is different.

Assume current constraint ranking includes the following relative ranking, where m:S/+a: ‘mark animate subjects’ and *STRUC: ‘avoid marking’

m:S/+a *STRUC

m:S/+a *STRUC

Learning under the Microscope: Hearer Mode

Incoming datum: Subj.anim-Ø (non-marked animate NP interpreted as subject)In hearer mode:

Algorithm produces one of the interpretations (as subject or object):a. Subj.anim-Ø (= learning datum, nothing happens)b. Obj.anim-Ø

Comparison with learning datum:b. m:S/+a favors datum and is promoted, m:O/+a disfavors datum and is demoted.

In general: If a meaning is produced that differs from the datum and the NP is– a case-marked NP: promotion of FAITH– a non-marked NPs: promotion and/or demotion of marking constraints (see example)

Assume current constraint ranking includes the following relative ranking, where m:S/+a: ‘mark animate subjects’ and m:O/+a: ‘mark animate objects’

m:S/+a m:O/+a

m:S/+a m:O/+a

Differential Case Marking: Objects

Documents

no marking of subject

for acc

movie showed acc the

acc warspanish

acc sawother pronouns

nominativ akkusativ

deutschen finden

variety of factors