Empirical Study of the Form and Function of Linguistic Elements July 2006 Jerry T. Ball Senior Research Psychologist Air Force Research Laboratory Mesa,

Empirical Study of the Form and Function of Linguistic Elements

July 2006

Jerry T. BallSenior Research PsychologistAir Force Research Laboratory

Mesa, AZ

2

Basic Assumption

• Humans have explicit knowledge of the linguistic representations they construct during language comprehension

– They know that in “the man bit the dog”

• The expression “the man” refers to a man

• The expression “the dog” refers to a dog

• The sentence describes an action of biting involving a man and a dog, and the man did the biting!

• It’s OK to ask them about their explicit knowledge

3

Why Isn’t This Assumption Universally Shared?

• It may be regarded as conflicting with the assumption that there is an autonomous syntax module where syntactic representations are created that are impervious to higher level cognition and therefore implicit

– If semantic representations are composed of abstract concepts which are non-linguistic and non-perceptual, and syntactic representations are implicit, then there are no explicit linguistic representations to reflect on

• Linguistic Competence and Performance are different things

– Competence is implicit

4

Why Isn’t This Assumption Universally Shared?

• Normal humans are not good at explicitly verbalizing their linguistic knowledge

• Procedural knowledge is implicit

– If knowledge of language is largely procedural knowledge as may be required for parsing, then that knowledge is implicit

• Children learn language implicitly, therefore their knowledge of language must be implicit

• The explicit knowledge of language we learned in grammar school is not the same knowledge we use to understand language

5

Counter Arguments

• There is no autonomous syntax module– Much recent psycholinguistic evidence argues against the existence

of an autonomous syntax module

• Visual World Paradigm

– Tanenhaus et al. (2004); Henderson & Ferreira (2004)

– There are explicit, perceptually-based linguistic representations and there are explicit, perceptually-based non-linguistic representations of the objects and situations the linguistic representations refer to

• Linguistic Competence and Performance are closely related– We are not syntactic processing automata disguised by performance

limitations

• Ferreira (2003) vs. Townsend & Bever (2001)

6

Counter Arguments

• Normal humans are not good at explicitly verbalizing their linguistic knowledge in ways that are consistent with prevailing linguistic theory– Maybe the prevailing theory is wrong!

– Complicated linguistic representations exceed short-term working memory capacity. To the extent that they exist at all, they cannot be mentally manipulated all at once

• The process of constructing a linguistic representation may be in part implicit, but the resulting representation is declarative and explicit

7

Counter Arguments

• The process of learning a language may be in part implicit, but the words, multiword units, and constructions that are learned are declarative and explicit

• Much of the knowledge of language we learned in school is used in understanding language and is explicitly available– We are capable of using explicit grammar rules, although it is

relatively inefficient to do so

• Second language learners who rely on grammar rules cannot keep up with spoken input

– Declarative lookup is more rapid and efficient

• Language comprehension is largely driven by identification of the largest chunks of meaning—multiword expressions and constructions—not by composing the meanings of individual morphemes or words, which would be too slow and ambiguous

8

Abstract Concepts vs. Perceptually Grounded Language

“pilot”“pilot” “pilot”

PILOT

Real World Mental Box Mental BoxReal World

perception Language

of Thought

The Prevailing View An Emerging View

gro

un

din

g

perception

Implicit(Abstract)

Explicit(Perceptual)

9

Abstract Concepts vs. Perceptually Grounded Language

“pilot”“pilot” “pilot”

PILOT

Real World Mental Box Mental BoxReal World

perception

The Prevailing View An Emerging View

gro

un

din

g

perception

Implicit(Abstract)

Explicit(Perceptual)

Language

of Thought

10

Language is Grounded in a Situation Model

the

book

is

on

the

tableobj

subj

headClause

Nominal

Nominal

Pred

The book is on the table

Words!

refers

refers

refers

refers

11

Perceptually Grounded Language

• Harnad’s Symbol Grounding Problem

• Barsalou’s Perceptual Symbol Systems

• Lakoff & Johnson – Metaphors We Live BY – abstract concepts are understood via metaphorical extension of more concrete concepts– Good is Up; Bad is Down

– Life is a Journey;

• Kintsch and Van Dijk – Situation Model

• Graesser, Zwann and others: Situation Model has visuo-spatial characteristics (not just abstract propositions)

• There must be a perceptual chain to support the understanding of abstract concepts

12

Linguistic Representations

• Linguistic representations contains words (not abstract concepts) grouped into meaningful units

• These meaningful units are grounded in non-linguistic representations of the objects and situations to which they refer

• Linguistic representations encode multiple dimensions of meaning simultaneously– Variation in form reflects this trade-off in encoding

• Purely syntactic representations fail to capture important generalizations– She is smiling, happy and in a good mood

• conjunction of clausal heads, not verb, adjective and PP

13

Linguistic Representations

• Assume a strong form of the Grammatical Constraint:– “One should prefer a semantic theory that explains otherwise arbitrary

generalizations about the syntax and the lexicon…a theory’s deviations from efficient encoding must be vigorously justified, for what appears to be an irregular relationship between syntax and semantics may turn out merely to be a bad theory of one or the other” (Jackendoff, 1983)

• Syntactic and Linguistic Semantic Representations are not distinct!

• Meaning is captured in the relationship between linguistic (form and function) and non-linguistic representations (objects and situations)

14

Some Competing Representations

• There are multiple possible representations for even simple sentences

– Subject-Predicate (Traditional Grammar)

– SVO (Functional Grammar)

– Mood-Residue (Halliday’s Functional Grammar)• Mood = subject + first auxiliary

• Residue = everything else

– S NP VP (Early Generative Grammar)

– X-Bar Theory• YP = specifier

• X = head

• ZP(s) = complement(s)

XP

YP X’

X ZP(s)

15

Some Competing Representations

• What is the empirical evidence for these possibilities?

• Is one representation always preferred?

• Is there evidence for multiple representations?– Across subjects; Within subjects

– Across sentences; Within sentences

• Halliday suggests three different tiers of representation for clauses– Clause as message: Theme and Rheme

– Clause as exchange: Mood and Residue

– Clause as representation: Subject Predicator Complement

– These layers break the clause apart in different ways that can be empirically tested

• How do these representations interact with discourse contrasts like Topic/Focus and Given/New?

16

Levelt’s Early Study (1970)

• Subjects show a high degree of reliability in making judgments about the cohesiveness of words in sentences

• Subjects were asked to rate the similarity of words in sentences– Word triads from sentences

– + (plus) for most highly related pair in triad

– - (minus) for least highly related pair in triad

– Seven point rating scale of word pairs

• A relatedness matrix was generated from data and subjected to hierarchical cluster analysis

• Resulting trees reflect hierarchical knowledge of language

17

Levelt’s Early Study

the boy has lost a dollar

60

80

100

Relative frequency of “more related”

in triadic comparison

%

Connectedness

method

18


• Results consistent with basic subject-predicate and NP-VP structure of sentence

• Results inconsistent with the prevailing assumption that “lost” combines with “a dollar” forming an untensed VP before “lost a dollar” combines with “has” forming a tensed VP

• Results (at least partially) inconsistent with basic SVO structure of sentence

19


Jan

(John)

eet

(eats)

Piet

(Peter) eet

(eats)

en

(and) peren

(pears)

3.0

5.0

7.0

Mean scale value

(7-point rating scale)

Connectedness

method

appels

(apples)

20


• In the sentence “Jan eet appels en Piet eet peren” (John eats apples and Peter eats pears), Levelt found that “Jan” and “Piet” combined with the verb “eet” before the objects

– Inconsistent with any prevailing linguistic theory

• Levelt claimed that results reflected deep structure not surface structure

– I interpret this to mean meaningful units, not purely syntactic units

21

Reasons for Reviving and Modifying Levelt’s Early Study

• Despite these interesting results, this line of empirical research was not pursued beyond about 1972– Results did not match well to prevailing linguistic theory

– Some results were counter intuitive, perhaps reflecting weaknesses in the methods of analysis or interaction with extraneous factors

• Hierarchical clustering tends to impose a binary structure which may not reflect human behavior– Hierarchical clustering provides an air of objectivity, but at the cost

of imposing a fixed structure

– Is it really needed?

– Why not directly ask subjects to create meaningful groups?

• There are lots of sentences and linguistic expressions waiting to be empirically examined!

22

Three Part Empirical Study

• Part 1: Subjects asked to group words into meaningful units within the context of a linguistic expression

– Goal is to identify meaningful units

• Part 2: Subjects asked to identify the word (or words) which contributes most to the meaning of the expression, word (or words) which contributes second most, third most, etc.

– Primary goal is to identify the head

• Part 3: Subjects asked to identify the part of speech of a word within the context of a linguistic expression

– Goal is to see if head of NP is necessarily a Noun, head of VP is necessarily a Verb, etc

23

Part 1: Preliminary Results

• Use sentences from Levelt’s study

• Compare results

• 19 subjects

24

The boy has lost a dollar (n = 18)

the boy has lost a dollar

the boy a dollar lost a dollar

has lost

12 11 8 7 4

boy has lost

boy dollar

lost dollar

the boy lost a dollar

boy lost dollar

the boy has lost

3 3 3 2 2

2

25


has lost a dollar

boy has boy has lost

dollar

boy has a dollar

boy lost a dollar

boy has a dollar

the dollar

1 1 1 1 1 1 1

has a dollar

boy has lost a dollar

the boy has a dollar

boy a

1 1 1 1

26


• Subjects are quite consistent at identifying the subject (11) and the clause (12) as meaningful groups

• The predication or untensed VP “lost a dollar” (7) may be treated as a multiword unit causing subjects not to see the object NP “a dollar” (8) which is less frequently identified than the subject (11)

• The predicator or verb group “has lost” is not consistently identified (4)

• The predicate or tensed VP “has lost a dollar” is only identified by 1 subject!

27

Ball vs. Levelt Study

boy has lost a dollarthe

boy has lost a dollarthe

Levelt

Ball

28

John eats apples and Peter eats pears (n = 19)

John eats

apples

Peter eats

pears

John eats apples and Peter eats

pears

John eats

Peter eats

eats apples

eats pears

16 17 16 6 6 5 5

John eats apples

Peter eats pears

apples pears

John eats pears

4 2 2

29


John and

Peter

apples and

pears

apples Peter eats

John eats apples

and pears

John eats

apples pears

eats apples Peter

John Peter

1 1 1 1 1 1 1

30


• Subjects are very consistent at identifying clauses (16, 17, 16)

• There is some evidence (6, 6) that subjects group the subject with the verb as in the Levelt study

– “eats” can be used intransitively

– “John eats” is a normal clause

• There is also some evidence (5, 5) that subjects group the verb with the object

31

Ball vs. Levelt Study

apples and Peter eats pearsJohn eats

Levelt

Ball

appels en Piet eet perenJan eet

32

The nurse likes the patient (n = 19)

the nurse

likes the patient

the patient

the nurse

nurse patient

nurse likes the patient

nurse likes

13 13 11 7 5 5

likes the patient

the nurse likes

likes patient

nurse likes

patient

4 4 2 2

33


nurse the

patient

the nurse likes

patient

the nurse

the patient

1 1 1

34


• Subjects are very consistent at identifying the clause (13) and object (13) and quite consistent at identifying the subject (11)

• The identification of “nurse patient” (7) probably reflects their close semantic association

• The subject + verb group “the nurse likes” and the verb + object group “likes the patient” occur equally likely (4, 4) but not consistently

• The group “nurse likes the patient” (5) follows the determiner “the” suggesting a pattern like [ the X ] where X can be the rest of the sentence (not just the head of an NP). The somewhat frequent occurrence of the group “nurse likes” (5) is not expected

35

Main Meaningful Groups

the nurse likes the patient

36

The book was on the table (n = 19)

the book

was on the table

the table the book

on the table

book was

was on the table

13 10 10 5 5 4

book table

was on book was on

book on table

book was on

the table

on table

3 2 2 2 2 2

37


book was on table

the book

the table

on the book on the table

the book was

the book

was on

1 1 1 1 1 1

38


• Subjects are very consistent at identifying the clause (13) and quite consistent at identifying the subject (10) and object (10)

• Subjects do not consistently identify any other groups!

– The PP “on the table” was only identified by 5 subjects!

– The group “book was” was identified by 5 subjects

– The predicate “was on the table” was only identified by 4 subjects

– The predicator “was on” was only identified by 2 subjects

39


the book was on the table

40

The dog is not very hungry (n = 19)

the dog is not very

hungry

the dog very hungry

not very hungry

not very is not

19 9 8 7 3 3

dog not hungry

dog very

hungry

dog hungry

is hungry

not hungry

dog is not very hungry

3 3

2 2 2 2

41


the dog is

hungry

dog not dog is not

hungry

dog is not

dog is dog not very

hungry

the dog is not

1 1 1 1 1 1 1

42


• Subjects are extremely consistent at identifying the clause (19) and somewhat consistent at identifying the subject (9), the modified predicate adjective “very hungry” (8), and the negated modified predicate adjective “not very hungry” (7)

• Subjects also occasionally group “not” with “is not” (3) and “not very” (3) and other groups

– This may reflect the status of “not” as a clausal modifier

43


the dog is very hungrynot

44

Group Selection Percentage(more than 1 word)

Subject: 55% (41 of 75)

Object: 55% (31 of 56)

Clause: 80% (106 of 132)

Predication (untensed VP): 34% (19 of 56)

Predicator (verb group): 16% (9 of 56)

Predicate (tensed VP): 17% (19 of 113)

45

Prototypical Meaningful Groups

subject object

clause

The predicator (verb group) of a transitive clause does not typically

form a meaningful group by itself!

subject

predicator

aux predication

clause

The predication (untensed VP) tends to be grouped independently

from the auxiliary verb!

46

Questions?

Empirical Study of the Form and Function of Linguistic Elements July 2006 Jerry T. Ball Senior Research Psychologist Air Force Research Laboratory Mesa,

Documents

language slide

explicit knowledge of

explicit knowledge slide

explicit slide

implicit slide

understanding language

language learners

linguistic competence