Empirical Study of the Form and Function of Linguistic Elements July 2006 Jerry T. Ball Senior Research Psychologist Air Force Research Laboratory Mesa, AZ
Mar 28, 2015
Empirical Study of the Form and Function of Linguistic Elements
July 2006
Jerry T. BallSenior Research PsychologistAir Force Research Laboratory
Mesa, AZ
2
Basic Assumption
• Humans have explicit knowledge of the linguistic representations they construct during language comprehension
– They know that in “the man bit the dog”
• The expression “the man” refers to a man
• The expression “the dog” refers to a dog
• The sentence describes an action of biting involving a man and a dog, and the man did the biting!
• It’s OK to ask them about their explicit knowledge
3
Why Isn’t This Assumption Universally Shared?
• It may be regarded as conflicting with the assumption that there is an autonomous syntax module where syntactic representations are created that are impervious to higher level cognition and therefore implicit
– If semantic representations are composed of abstract concepts which are non-linguistic and non-perceptual, and syntactic representations are implicit, then there are no explicit linguistic representations to reflect on
• Linguistic Competence and Performance are different things
– Competence is implicit
4
Why Isn’t This Assumption Universally Shared?
• Normal humans are not good at explicitly verbalizing their linguistic knowledge
• Procedural knowledge is implicit
– If knowledge of language is largely procedural knowledge as may be required for parsing, then that knowledge is implicit
• Children learn language implicitly, therefore their knowledge of language must be implicit
• The explicit knowledge of language we learned in grammar school is not the same knowledge we use to understand language
5
Counter Arguments
• There is no autonomous syntax module– Much recent psycholinguistic evidence argues against the existence
of an autonomous syntax module
• Visual World Paradigm
– Tanenhaus et al. (2004); Henderson & Ferreira (2004)
– There are explicit, perceptually-based linguistic representations and there are explicit, perceptually-based non-linguistic representations of the objects and situations the linguistic representations refer to
• Linguistic Competence and Performance are closely related– We are not syntactic processing automata disguised by performance
limitations
• Ferreira (2003) vs. Townsend & Bever (2001)
6
Counter Arguments
• Normal humans are not good at explicitly verbalizing their linguistic knowledge in ways that are consistent with prevailing linguistic theory– Maybe the prevailing theory is wrong!
– Complicated linguistic representations exceed short-term working memory capacity. To the extent that they exist at all, they cannot be mentally manipulated all at once
• The process of constructing a linguistic representation may be in part implicit, but the resulting representation is declarative and explicit
7
Counter Arguments
• The process of learning a language may be in part implicit, but the words, multiword units, and constructions that are learned are declarative and explicit
• Much of the knowledge of language we learned in school is used in understanding language and is explicitly available– We are capable of using explicit grammar rules, although it is
relatively inefficient to do so
• Second language learners who rely on grammar rules cannot keep up with spoken input
– Declarative lookup is more rapid and efficient
• Language comprehension is largely driven by identification of the largest chunks of meaning—multiword expressions and constructions—not by composing the meanings of individual morphemes or words, which would be too slow and ambiguous
8
Abstract Concepts vs. Perceptually Grounded Language
“pilot”“pilot” “pilot”
PILOT
Real World Mental Box Mental BoxReal World
perception Language
of Thought
The Prevailing View An Emerging View
gro
un
din
g
perception
Implicit(Abstract)
Explicit(Perceptual)
9
Abstract Concepts vs. Perceptually Grounded Language
“pilot”“pilot” “pilot”
PILOT
Real World Mental Box Mental BoxReal World
perception
The Prevailing View An Emerging View
gro
un
din
g
perception
Implicit(Abstract)
Explicit(Perceptual)
Language
of Thought
10
Language is Grounded in a Situation Model
the
book
is
on
the
tableobj
subj
headClause
Nominal
Nominal
Pred
The book is on the table
Words!
refers
refers
refers
refers
11
Perceptually Grounded Language
• Harnad’s Symbol Grounding Problem
• Barsalou’s Perceptual Symbol Systems
• Lakoff & Johnson – Metaphors We Live BY – abstract concepts are understood via metaphorical extension of more concrete concepts– Good is Up; Bad is Down
– Life is a Journey;
• Kintsch and Van Dijk – Situation Model
• Graesser, Zwann and others: Situation Model has visuo-spatial characteristics (not just abstract propositions)
• There must be a perceptual chain to support the understanding of abstract concepts
12
Linguistic Representations
• Linguistic representations contains words (not abstract concepts) grouped into meaningful units
• These meaningful units are grounded in non-linguistic representations of the objects and situations to which they refer
• Linguistic representations encode multiple dimensions of meaning simultaneously– Variation in form reflects this trade-off in encoding
• Purely syntactic representations fail to capture important generalizations– She is smiling, happy and in a good mood
• conjunction of clausal heads, not verb, adjective and PP
13
Linguistic Representations
• Assume a strong form of the Grammatical Constraint:– “One should prefer a semantic theory that explains otherwise arbitrary
generalizations about the syntax and the lexicon…a theory’s deviations from efficient encoding must be vigorously justified, for what appears to be an irregular relationship between syntax and semantics may turn out merely to be a bad theory of one or the other” (Jackendoff, 1983)
• Syntactic and Linguistic Semantic Representations are not distinct!
• Meaning is captured in the relationship between linguistic (form and function) and non-linguistic representations (objects and situations)
14
Some Competing Representations
• There are multiple possible representations for even simple sentences
– Subject-Predicate (Traditional Grammar)
– SVO (Functional Grammar)
– Mood-Residue (Halliday’s Functional Grammar)• Mood = subject + first auxiliary
• Residue = everything else
– S NP VP (Early Generative Grammar)
– X-Bar Theory• YP = specifier
• X = head
• ZP(s) = complement(s)
XP
YP X’
X ZP(s)
15
Some Competing Representations
• What is the empirical evidence for these possibilities?
• Is one representation always preferred?
• Is there evidence for multiple representations?– Across subjects; Within subjects
– Across sentences; Within sentences
• Halliday suggests three different tiers of representation for clauses– Clause as message: Theme and Rheme
– Clause as exchange: Mood and Residue
– Clause as representation: Subject Predicator Complement
– These layers break the clause apart in different ways that can be empirically tested
• How do these representations interact with discourse contrasts like Topic/Focus and Given/New?
16
Levelt’s Early Study (1970)
• Subjects show a high degree of reliability in making judgments about the cohesiveness of words in sentences
• Subjects were asked to rate the similarity of words in sentences– Word triads from sentences
– + (plus) for most highly related pair in triad
– - (minus) for least highly related pair in triad
– Seven point rating scale of word pairs
• A relatedness matrix was generated from data and subjected to hierarchical cluster analysis
• Resulting trees reflect hierarchical knowledge of language
17
Levelt’s Early Study
the boy has lost a dollar
60
80
100
Relative frequency of “more related”
in triadic comparison
%
Connectedness
method
18
Levelt’s Early Study
• Results consistent with basic subject-predicate and NP-VP structure of sentence
• Results inconsistent with the prevailing assumption that “lost” combines with “a dollar” forming an untensed VP before “lost a dollar” combines with “has” forming a tensed VP
• Results (at least partially) inconsistent with basic SVO structure of sentence
19
Levelt’s Early Study
Jan
(John)
eet
(eats)
Piet
(Peter) eet
(eats)
en
(and) peren
(pears)
3.0
5.0
7.0
Mean scale value
(7-point rating scale)
Connectedness
method
appels
(apples)
20
Levelt’s Early Study
• In the sentence “Jan eet appels en Piet eet peren” (John eats apples and Peter eats pears), Levelt found that “Jan” and “Piet” combined with the verb “eet” before the objects
– Inconsistent with any prevailing linguistic theory
• Levelt claimed that results reflected deep structure not surface structure
– I interpret this to mean meaningful units, not purely syntactic units
21
Reasons for Reviving and Modifying Levelt’s Early Study
• Despite these interesting results, this line of empirical research was not pursued beyond about 1972– Results did not match well to prevailing linguistic theory
– Some results were counter intuitive, perhaps reflecting weaknesses in the methods of analysis or interaction with extraneous factors
• Hierarchical clustering tends to impose a binary structure which may not reflect human behavior– Hierarchical clustering provides an air of objectivity, but at the cost
of imposing a fixed structure
– Is it really needed?
– Why not directly ask subjects to create meaningful groups?
• There are lots of sentences and linguistic expressions waiting to be empirically examined!
22
Three Part Empirical Study
• Part 1: Subjects asked to group words into meaningful units within the context of a linguistic expression
– Goal is to identify meaningful units
• Part 2: Subjects asked to identify the word (or words) which contributes most to the meaning of the expression, word (or words) which contributes second most, third most, etc.
– Primary goal is to identify the head
• Part 3: Subjects asked to identify the part of speech of a word within the context of a linguistic expression
– Goal is to see if head of NP is necessarily a Noun, head of VP is necessarily a Verb, etc
23
Part 1: Preliminary Results
• Use sentences from Levelt’s study
• Compare results
• 19 subjects
24
The boy has lost a dollar (n = 18)
the boy has lost a dollar
the boy a dollar lost a dollar
has lost
12 11 8 7 4
boy has lost
boy dollar
lost dollar
the boy lost a dollar
boy lost dollar
the boy has lost
3 3 3 2 2
2
25
The boy has lost a dollar (n = 18)
has lost a dollar
boy has boy has lost
dollar
boy has a dollar
boy lost a dollar
boy has a dollar
the dollar
1 1 1 1 1 1 1
has a dollar
boy has lost a dollar
the boy has a dollar
boy a
1 1 1 1
26
The boy has lost a dollar (n = 18)
• Subjects are quite consistent at identifying the subject (11) and the clause (12) as meaningful groups
• The predication or untensed VP “lost a dollar” (7) may be treated as a multiword unit causing subjects not to see the object NP “a dollar” (8) which is less frequently identified than the subject (11)
• The predicator or verb group “has lost” is not consistently identified (4)
• The predicate or tensed VP “has lost a dollar” is only identified by 1 subject!
27
Ball vs. Levelt Study
boy has lost a dollarthe
boy has lost a dollarthe
Levelt
Ball
28
John eats apples and Peter eats pears (n = 19)
John eats
apples
Peter eats
pears
John eats apples and Peter eats
pears
John eats
Peter eats
eats apples
eats pears
16 17 16 6 6 5 5
John eats apples
Peter eats pears
apples pears
John eats pears
4 2 2
29
John eats apples and Peter eats pears (n = 19)
John and
Peter
apples and
pears
apples Peter eats
John eats apples
and pears
John eats
apples pears
eats apples Peter
John Peter
1 1 1 1 1 1 1
30
John eats apples and Peter eats pears (n = 19)
• Subjects are very consistent at identifying clauses (16, 17, 16)
• There is some evidence (6, 6) that subjects group the subject with the verb as in the Levelt study
– “eats” can be used intransitively
– “John eats” is a normal clause
• There is also some evidence (5, 5) that subjects group the verb with the object
31
Ball vs. Levelt Study
apples and Peter eats pearsJohn eats
Levelt
Ball
appels en Piet eet perenJan eet
32
The nurse likes the patient (n = 19)
the nurse
likes the patient
the patient
the nurse
nurse patient
nurse likes the patient
nurse likes
13 13 11 7 5 5
likes the patient
the nurse likes
likes patient
nurse likes
patient
4 4 2 2
33
The nurse likes the patient (n = 19)
nurse the
patient
the nurse likes
patient
the nurse
the patient
1 1 1
34
The nurse likes the patient (n = 19)
• Subjects are very consistent at identifying the clause (13) and object (13) and quite consistent at identifying the subject (11)
• The identification of “nurse patient” (7) probably reflects their close semantic association
• The subject + verb group “the nurse likes” and the verb + object group “likes the patient” occur equally likely (4, 4) but not consistently
• The group “nurse likes the patient” (5) follows the determiner “the” suggesting a pattern like [ the X ] where X can be the rest of the sentence (not just the head of an NP). The somewhat frequent occurrence of the group “nurse likes” (5) is not expected
35
Main Meaningful Groups
the nurse likes the patient
36
The book was on the table (n = 19)
the book
was on the table
the table the book
on the table
book was
was on the table
13 10 10 5 5 4
book table
was on book was on
book on table
book was on
the table
on table
3 2 2 2 2 2
37
The book was on the table (n = 19)
book was on table
the book
the table
on the book on the table
the book was
the book
was on
1 1 1 1 1 1
38
The book was on the table (n = 19)
• Subjects are very consistent at identifying the clause (13) and quite consistent at identifying the subject (10) and object (10)
• Subjects do not consistently identify any other groups!
– The PP “on the table” was only identified by 5 subjects!
– The group “book was” was identified by 5 subjects
– The predicate “was on the table” was only identified by 4 subjects
– The predicator “was on” was only identified by 2 subjects
39
Main Meaningful Groups
the book was on the table
40
The dog is not very hungry (n = 19)
the dog is not very
hungry
the dog very hungry
not very hungry
not very is not
19 9 8 7 3 3
dog not hungry
dog very
hungry
dog hungry
is hungry
not hungry
dog is not very hungry
3 3
2 2 2 2
41
The dog is not very hungry (n = 19)
the dog is
hungry
dog not dog is not
hungry
dog is not
dog is dog not very
hungry
the dog is not
1 1 1 1 1 1 1
42
The dog is not very hungry (n = 19)
• Subjects are extremely consistent at identifying the clause (19) and somewhat consistent at identifying the subject (9), the modified predicate adjective “very hungry” (8), and the negated modified predicate adjective “not very hungry” (7)
• Subjects also occasionally group “not” with “is not” (3) and “not very” (3) and other groups
– This may reflect the status of “not” as a clausal modifier
43
Main Meaningful Groups
the dog is very hungrynot
44
Group Selection Percentage(more than 1 word)
Subject: 55% (41 of 75)
Object: 55% (31 of 56)
Clause: 80% (106 of 132)
Predication (untensed VP): 34% (19 of 56)
Predicator (verb group): 16% (9 of 56)
Predicate (tensed VP): 17% (19 of 113)
45
Prototypical Meaningful Groups
subject object
clause
The predicator (verb group) of a transitive clause does not typically
form a meaningful group by itself!
subject
predicator
aux predication
clause
The predication (untensed VP) tends to be grouped independently
from the auxiliary verb!
46
Questions?