Activation of referents in the bilingual mind - pre-publication version

Activation of referents in the bilingual mind

Jacopo Torregrossa1 & Christiane Bongartz2

1 University of Hamburg, Department of Romance languages

2 University of Cologne, English Department

E-mail: [email protected], [email protected]

Abstract. This paper investigates reference production by bilingual and monolingual children. We focus on the degree to which the activation of referents is encoded by different types of referring expressions among bilinguals and monolinguals. The study is based on forty-six story retellings produced in German by twenty-five Greek-German bilingual children and twenty-one monolingual children, respectively. The activation of referents is assessed based on a multi-factorial analysis of cognitive and linguistic factors that are involved in the use of referring expressions. The results show that pronouns produced by bilingual children tend to encode a lower degree of activation of referents and to be underspecific. This may be an effect of reduced processing speed experienced by bilinguals in the mapping of the referent’s activation onto the use of a certain referring expression. More in general, we account for the observed differences between bilinguals and monolinguals in terms of cognitive mechanisms underlying bilingual language production.

1. Introduction1

This paper investigates reference production in bilingual children. In order

to refer to an entity, speakers can use different types of referring expressions

(RE henceforth) which vary in terms of explicitness. For instance, if a

professor wants to refer to a student in her class, she could use the RE the

student in the front row which is more explicit than the RE the student,

which is in turn more explicit than the pronoun she/he. According to the

traditional view on reference production, speakers use REs to mark the

activation (alias prominence, accessibility, salience – see Arnold 2010) of a

referent in discourse, and hence, to guide the listeners in the identification of

the referent in their mental model of discourse (Ariel 1990). However,

recent literature has shown that the use of REs does not depend only on the

referent’s discourse status. Under the same discourse conditions, reference

production may vary across different groups of speakers. For instance,

processing constraints may affect the referent’s representation in the

speaker’s mind, and hence, the production of REs (see Arnold 2010 for a

review).

1 The data presented in this paper have been collected and analyzed within the CoLiBi project – Cognition, Literacy and Bilingualism in Greek-German speaking children (principal investigators: Christiane Bongartz and Ianthi Tsimpli), jointly founded by IKY (Greek State Scholarship Foundation) and DAAD (German Academic Exchange Service). We thank Maria Andreou, Eva Knopp, Ianthi Tsimpli, the audience of the workshop “Cognitive and linguistic effects in anaphora resolution” (15-16.05.2015 – Thessaloniki) and our anonymous reviewer. All remaining errors “refer back” to us.

Bilingual reference production represents a privileged viewpoint for

studying the interaction between cognitive and linguistic factors that affect

the use of REs. We will investigate to what extent bilingual children diverge

from their monolingual peers in the use of REs and whether possible

differences can be accounted for by cognitive mechanisms underlying

bilingual language production.

In Section 2, we introduce the notion of a referent’s activation, as used in

Kibrik (2011). The author elaborates a model of reference production that

accounts for discourse and cognitive factors involved in the representation

of the referent in the speaker’s mind. In Section 3, we review previous

studies concerning the processing mechanisms underlying reference

production across different groups of speakers, with special focus on

bilingual language production. In Section 4, we present our study which

compares bilingual and monolingual children in the use of REs, based on

the analysis of narratives elicited from both groups. Finally, Section 5

presents a discussion of the results.

2. Activation of referents in narrative discourse

The production of a coherent narrative involves establishing discourse

relations between discourse units. For instance, when telling a story,

speakers have to keep track of different events, deciding when to introduce

them in the plot, which of them to foreground (or background), and how to

connect them by means of rhetorical relations (e.g., elaboration or

explanation). At the sentence level, this is expressed, for example, by the

use of certain tense-aspect markers or the distinction between different

levels of embedding (see Tomasello 2003: 271 for an overview).

The focus of this paper is on discourse relations established by reference

chains tracking different characters in the narrative and on the linguistic

means used to express these relations. After a character is introduced in the

story, reference to it may be maintained across two (or more) adjacent

discourse units or reintroduced after a hiatus. Reference introduction,

maintainance, or reintroduction are signaled by the use of REs, depending

on the inventory of referring forms available in each language. For instance,

in languages having the opposition between indefinites, definites, and

pronouns (like English or German), speakers are likely to use indefinite

nouns for reference introduction, reduced REs (e.g., pronouns) for

maintainance, and more informative REs (i.e., phonologically fully-fledged,

such as definite nouns) for reference reintroduction2.

The analysis presented in this paper goes beyond this static approach to the

mapping between forms (i.e., types of REs) and functions (reference 2 At a more global level, referring functions (i.e., reference introduction, maintenance or reintroduction) may be signaled by a specific informational partitioning of the sentence (e.g., topic-comment, given-new or focus-background). For example, indefinite nouns expressing introduction tend to appear post-verbally, contrary to given referents, which tend to occur pre-verbally.

introduction, reintroduction and maintenance). We follow Kibrik (2011) in

arguing that the production of REs is the result of a dynamic interaction

between cognitive and linguistic factors. In particular, Kibrik (2011:53)

claims that the use of REs is grounded on two distinct cognitive processes,

i.e., attention and activation. First mention of a referent in discourse implies

that the referent is attended to by the speaker. Once attended to, a referent

becomes activated in the speaker’s working memory (WM, henceforth).

The referent’s degree of activation varies throughout the narrative,

depending on several factors. For example, recency of mention (alias

distance) affects activation3. Recently mentioned referents are more

activated than referents that have been mentioned less recently. The

syntactic position and the grammatical role of the referent’s previous

mention (alias antecedent) are relevant factors, too. Distance being equal, a

referent whose antecedent is a subject is usually more activated than a

referent whose antecedent is a direct or an indirect object. Likewise, the

referent is more activated if its antecedent occurs in a main clause rather

than a in subordinate clause4. Moreover, the degree of activation depends on

the number of characters intervening between two mentions of the same

3 In this paper we will measure distance in terms of number of clauses (see Section 4.3). Kibrik’s (2011) assessment of distance, by contrary, is based on the representation of the hierarchical structure of discourse (ibid.:403). 4 For example, in (i) the pronoun he, which encodes a high degree of a referent’s activation, tends to refer to the subject of the main clause (John) and not to the subject of the subordinate clause (Luke), even if the latter is mentioned more recently. (i) Johni said that Lukej felt bad. Hei (…)

character. Our analysis will take into account all these factors (we refer to

Kibrik 2011 and Arnold 2010 for an overview of additional ones).

Whenever the speaker intends to refer to a character in the unfolding

narrative, she maps the referent’s degree of activation at the given point in

discourse onto the use of a RE. Typically, pronouns (or, more in general,

reduced REs) encode a higher degree of a referent’s activation than more

specific forms, such as definite nouns or proper names (see Ariel 1990 as a

main reference on this issue). However, the threshold of activation that is

relevant for the use of a reduced vs. fully-fledged RE is subject to inter- and

intra-speaker variation (see Section 3).

As an example of this, let us consider (1), which is the English translation of

the first clauses of a narrative told in German by a 8-year-old Greek-

German bilingual child (see Section 4 for a description of the task and the

participants of the study), and focus on the reference chain corresponding to

the referent denoted by the dog.

(1) There was a dog

and he had a chart with a balloon attached to it.

Then a rabbit came.

and he wanted to play with the balloon.

and the dog said…

In the first line of the narrative the character is introduced by means of an

indefinite noun (a dog) and gets activated. In the second line, the referent’s

degree of activation is high, given that the antecedent a dog is in a main

clause, in subject position and only one clause distant. The speaker maps

this high degree of activation into the use of the pronoun he. The next

mention of the dog occurs after three clauses and a new character intervenes

between the new mention and its antecedent. This involves a decay in the

referent’s activation. The resulting degree of activation is mapped into the

use of a definite noun.

Kibrik (2011) develops a quantitative model to assign an activation score to

referents at any given point in discourse. He observes that not all the above-

mentioned factors (distance, argumenthood, etc.) have equal effect in

determining the referent’s degree of activation and, based on a trial-and-

error heuristic procedure, derives the activation score of each factor (we

refer to Kibrik 2011:396-428). The referent’s activation score is derived as

the sum of all these weighted scores. It should be pointed out that the

referent is endowed with an activation score independently of its being

mentioned and of the RE used to refer to it. For example, coming back to

(1), the dog remains activated in the third and fourth line, even if it is not

mentioned. Furthermore, the activation score associated with the dog in the

second clause would have been the same even if the speaker had used a

more specific RE (e.g., the dog, this dog).

The analysis to be presented in Section 4 complies with Kibrik’s multi-

factorial approach in its main lines. We will derive the weight of each

activation factor by means of a learning algorithm (InfoGain function)

which is implemented in a machine learning software (i.e., WEKA, see Hall

et al. 2009). This will allow us to avoid weighting the contribution of each

factor by means of a trial-and-error procedure (as is the case of Kibrik 2011)

and to control for the interaction between the different factors (see Grüning

and Kibrik 2002 for a similar approach).

Before proceeding to the analysis, it is important to state clearly why

Kibrik’s model provides a useful theoretical tool to assess individual

variation in reference production (e.g., between bilingual children and their

monolingual peers). We mentioned that Kibrik defines a referent’s

activation in very specific terms, i.e., as activation in the speaker’s WM.

This entails that individual differences in WM capacity may affect the

referent’s activation and the corresponding use of REs. The picture is

further complicated by the impact of mechanisms of processing efficiency

on WM capacity (see, e.g., Bayliss et al. 2005). The quantitative approach

outlined in this section will allow us to understand whether a type of RE is

associated with the same activation score across different groups of

speakers. Any possible difference is interpreted in the light of differences in

the speakers’ cognitive profiles.

3. Activation in the bilingual mind

Section 2 contains a list of discourse factors that influence the referent’s

degree of activation, and hence, the use of REs. However, recent literature

has shown that cognitive factors play an important role, too. For instance,

Rosa & Arnold (2011) analyze the effect of a speaker’s distraction on the

performance in a narrative task: they asked the participants to do a shape-

sorting task while telling a picture-based narrative. The authors noticed that,

compared to not distracted controls, distracted speakers tend to use REs

encoding lower degrees of the referent’s activation (i.e., definite nouns vs.

pronouns). Interpreted in the light of the activation model presented in

Section 2, these results show that the cognitive load imposed by the

distraction task renders the referent less activated in WM5. This is reflected

in the use of more specific (or less reduced) REs. Thus, overspecification

seems to be the outcome of WM load. However, WM is only one of the

cognitive factors involved in the use of REs.

Van Rij et al. (2011) elaborate a model in which the role of WM in

reference production is kept distinct from mechanisms of processing

efficacy (i.e., processing speed). WM is involved in computing the factors

that affect a referent’s activation (e.g., establishing the discourse topic based

on grammatical information). Processing speed is involved in perspective

5 Rosa and Arnold do not explicitly mention the notion of WM, but their analysis is fully consistent with the theoretical framework reported in Section 2.

taking, which consists in checking whether the listener is able to recover the

intended referent. By considering both cognitive processes, the authors

account for patterns of variation in reference production across different

groups of speakers. For example, let us consider again Rosa & Arnold’s

observation that under WM load adult speakers tend to overspecify.

According to van Rij et al. (2011), WM load affects the identification of the

antecedent in subject position as the discourse topic, but does not influence

processing speed. This explains why speakers rely on an overly cautious

strategy, using a fully-fledged form (e.g., a definite noun) instead of a

pronoun, to avoid ambiguities for the listener.

Child reference production is explained according to a different pattern.

Children who are younger than 7 years have low WM (as is the case of

adults under a WM load condition) and low processing speed (see van Rij et

al. 2011 and references therein). Problems in the identification of the

discourse topic coupled with deficits in perspective taking lead to the

production of underspecific forms, which are often ambiguous for the

listener.

In this paper, we do not consider the cognitive resources involved in

perspective taking. However, the distinction between WM and processing

speed is consistent with the referent’s activation model sketched in Section

2. On the one hand, the referent’s activation score is represented in WM. On

the other hand, we assume that processing speed is involved in the mapping

of the activation score into a certain RE, i.e., in lexical retrieval of REs

(pronouns or definite descriptions). Under low processing speed, the speaker

should follow an underspecification strategy, given that reduced form are

less demanding to access and produce (see Almor 1999, Hendriks et al.

2008 and Rosa & Arnold 2011 for a review).

The focus of this paper is on bilingual reference production. Based on the

assumptions that bilingualism affects cognitive and linguistic processes and

that reference production depends on cognitive and linguistic factors, we

expect bilingual speakers to exhibit a different use of REs than their

monolingual peers. We will consider two possible scenarios.

One possibility is that bilinguals use fully-fledged REs in association with a

high degree of the referent’s activation, i.e., they overspecify. Following on

Rosa & Arnold (2011), overspecification may be the result of the cognitive

load experienced by bilinguals, due to the inhibition of the language which

is not in current use (Levy et al. 2007, Philipp & Koch 2009 and Bialystok

et al. 2011 for a review). This cognitive load reduces the referent’s degree of

activation, as explained above.

An alternative possibility is that bilinguals underspecify as an effect of

reduced processing speed in the mapping of the referent’s activation score

into the use of a RE. In particular, slow processing may disfavor the

retrieval of (complex) full nouns and lead to the production of lighter forms,

such as pronouns. Several studies have argued for a bilingual disadvantage

in tasks of lexical competence and access, which may be due either to the

competition between the two language systems (see, a.o., Bialystok et al.

2011) or to a frequency of use effect (see, a.o., Gollan et al. 2008). When

mapping the referent’s activation score into the use of a certain RE,

bilinguals have to deal with alternatives both within their current language

(e.g., use of a pronoun vs. definite noun) and between their two languages.

In particular, Greek-German bilinguals have to cope with two distinct

referential systems. In German (a non-pro-drop language with no clitics), in

a scaled cline of activation, pronouns encode a higher degree of activation

than do definite nouns. In Greek (a pro-drop and clitic language), null

subjects and clitics are in complementary distribution and encode higher

degrees of activation than do both overt pronouns and definite nouns6. Overt

pronouns in German, thus, encode a lower degree of activation than their

null counterparts in Greek.

For both scenarios outlined above (overspecification vs. underspecification),

we predict, in line with Blumenfeld & Marian (2007), that the effects of

parallel language activation are modulated by language proficiency: the

greater the proficiency in the language not currently used, the more active

this language will remain in bilingual language use.

6 In this contribution, we deal only with the German referential system. We will not consider the production of the demonstrative pronouns der, die, das, mainly due to their low frequency in our corpus. We refer to Hinterwimmer (2015) for an analysis of the conditions of use of demonstrative pronouns and to Torregrossa (to appear) for their use by German monolingual children between 8 and 10 years.

Some studies suggest that the first scenario, i.e., overspecification, is more

likely to occur. Andreou et al. (2015) account for reference production by

two groups of Greek-German bilingual children, one living in Germany and

one living in Greece (the present paper focuses on a subclass of the latter

group – see Section 4.1). After labelling each RE based on its linguistic

function (reference introduction, reintroduction and maintenance), the

authors show that in Greek, bilingual children living in Germany tend to use

explicit forms (definite nouns) in contexts in which the use of less reduced

forms (null and clitic pronouns) would have been appropriate as well. In this

respect, bilingual children differ from their monolingual peers. The authors

account for this pattern of production in terms of language dominance,

given that, among the bilinguals in Germany, Greek is the less dominant

language, for vocabulary score, early literacy preparedness and current

language use.

Examples of overspecification in bilingual reference production are also

found in the studies by Serratrice (2007) and Chen & Lei (2012). Serratrice

(2007) analyzes reference production in 8-year-old English-Italian bilingual

children in both their languages, as compared to English and Italian

monolinguals. She found that for reference maintenance in Italian, bilingual

children use more definite nouns (vs. clitics) compared to monolinguals, but

only in object position. Chen & Lei (2012) account for narrative production

by English-Chinese bilingual children, English monolinguals and Chinese

monolinguals, and show that in reference reintroduction in Chinese,

bilingual children produce more definite nouns and fewer null pronouns

than monolinguals. Interestingly, in both studies, overspecification occurred

in just one of the two languages of each bilingual child, which may

ultimately be attributed to dominance of one language on the other and

related cross-linguistic effects7. What becomes apparent, however, is that

overspecification is a specific production strategy in need of further

explication, and not a default compensatory strategy associated with

bilingualism as such. Since a consistent pattern of results has not emerged

yet, the present paper aims to shed some new light on bilingual reference

production and the cognitive factors affecting it. By relying on the

activation model for reference production presented in Section 2, we will

show whether bilinguals overspecify or underspecify and interpret the

results in terms of the abovementioned theories (with overspecification

being the result of WM-load and underspecification of low processing

speed). Finally, in order to understand to what extent language proficiency

affects bilingual reference production, we will perform a correlational

analysis between the activation scores encoded by different types of REs

and vocabulary measures.

7 In both studies, the authors do not profile the participants for language dominance. Serratrice claims that bilinguals prefer definite nouns to clitics, because definite nouns are morpho-syntactically less complex. Chen & Lei (2012) claim that the preference for nouns over pronouns in Chinese is an effect of cross-linguistic influence of English on Chinese.

4. The study

4.1. Participants

This study is based on a selection of 25 bilingual children among the 77

Greek-German bilinguals analyzed by Andreou et al. (2015). In particular,

we consider the ones living in Greece and ranging in age from 8 to 10 years

(mean age: 9 years) at the time of testing. This group of children consists of

simultaneous bilinguals (4/25), early sequential bilinguals (10/25), late

sequential bilinguals that have Greek as their L1 (10/25) and a late

sequential bilingual child with German as his L1. At the time of testing, all

children attended the German school in Thessaloniki, in which German is

the main medium of instruction (between 19 and 25 hours per week). The

hours of instruction in Greek vary between 4 and 7 hours per week. The

opportunity to attend a school in which the dominant language differs from

the dominant language in the society contributes to rendering this group of

children the most balanced among the bilinguals analyzed by Andreou et al.

(2015). This relatively balanced profile emerges also from the analysis of

other ethnographic measures extracted from the questionnaires administered

to the children before the testing (cf. Andreou et al. 2015 and Bongartz

2016), such as language use before schooling, early literacy preparedness

and current language use.

We also considered 21 age-matched German monolinguals (range = 8.1 to

10.6; mean age: 9.4) who attended a German school in Cologne (Germany).

All children (bilinguals and monolinguals) had no history of cognitive or

linguistic impairment or hearing loss (Andreou et al. 2015). To assess their

language proficiency in German (i.e., the language under analysis in this

contribution), we administered the productive vocabulary test normed for

German children (Petermann et al. 2010).

4.2. Materials

The material consists of 46 story retellings produced in German (25 by the

bilingual children and 21 by the monolinguals). The narratives were elicited

using the Edmonton Narrative Norms Instrument (ENNI) designed by

Schneider et al. (2005). ENNI includes six stories, divided into three groups

of increasing complexity. For our task, we used the two most complex ones.

Each of them consists of 13 pictures (with no text) which represent a series

of events involving two major characters (of different gender) and two

minor ones (of different gender, too). Bilinguals were asked to produce two

stories (one for each language). We will consider only the stories told in

German. Monolinguals produced only one story.

The task was administered as a sequence of Power Point slides on a

computer screen. First, participants had to choose one of three envelopes,

each containing one of the two stories. They were told that each envelope

contained a different story. Then, they looked at the story pictures two by

two, while listening to the model story on the headphones. Finally, once the

thirteen picture synopsis had appeared on the screen, they had to retell the

story to the investigator, who feigned ignorance of the plot. The stories were

audio-recorded and then transcribed by German native speakers. We refer to

Andreou et al. (2015) for further details concerning the methodology and

the procedure of the experiment.

4.3. Analysis

Unit of analysis is the clause defined by the occurrence of a verb. For each

clause, we identified REs denoting animate characters. In line with Kibrik

(2011) – see Section 2 – our analysis focuses on the activation encoded by

REs, distinguishing between pronouns (PRON) and definite noun phrases

(DEFDP, including proper names and possessive noun phrases consisting of

a possessive adjective followed by a noun, e.g., seine Mutter ‘his mother’).

On the contrary, we will not take into account REs used to attend to a

certain referent (i.e., indefinite nouns for reference introduction).

In German, PRONs, determiners in DEFDPs and possessive adjectives are

marked for case (nominative, accusative and dative), number (singular and

plural) and gender (masculine, feminine and neuter). This is exemplified in

the narrative extract in (2). In the first unit, two REs occur: der Hase (the

rabbit), which is a DEFDP whose determiner is masculine, singular and

marked for nominative, and seine Mutter (his mother), a DEFDP whose

possessive adjective is singular, feminine and marked for accusative (which

is not different from nominative in this case). The second unit contains two

pronouns, i.e., the nominative case-marked singular, masculine pronoun er

and the accusative case-marked singular, feminine pronoun sie (also in this

case, the form is not distinct from its nominative counterpart).

(2) Da fand der Hase seine Mutter

There found the NOM.SING.MASC. rabbit his ACC.SING.FEM. mother

Und er fragte sie (…)

and he NOM.SING.MASC. asked her ACC.SING.FEM.

(bilingual; 8.1)

Based on the list of activation factors given in Section 2, we coded each RE

for its linguistic features (i.e., type, syntactic position and grammatical role),

for linguistic features of the referent’s previous mention (i.e., grammatical

role and syntactic position of the RE’s antecedent), for distance between the

RE and its antecedent and for number of intervening characters. We

identified three types of REs, i.e., indefinites (INDEF) – which are not

considered in the next steps of the analysis –, pronouns (PRON) and definite

nouns (DEFDP). The factor referring to the syntactic position of the RE and

its antecedent (CLAUSE and A-CLAUSE respectively) can have two

values, i.e., MAIN (i.e., occurrence in a main clause) and SUB (i.e.,

occurrence in a subordinate clause). For grammatical role

(GRAMMATICAL), we introduced three values, i.e., SUBJ (subject), OBJ

(direct object) and OTHER (indirect object or adjunct). The distance

between the RE and its antecedent was measured in units (i.e., clauses). We

refer to Torregrossa et al. (2015) for the first presentation of this model and

Torregrossa and Bongartz (submitted) for a recent elaboration. Table 1

contains an example of the analysis, based on the story told by a 9-year-old

bilingual child.

Table 1: Example of the coding.

The first column contains the transcription of the narrative and the second

the English translation. If a unit contains more than one RE, it is repeated

(as many times as it contains REs) and underlined. Repeated sentences are

CHAIN TYPE CLAUSE GRAMMATICAL A-CLAUSE A-GRAMMATICAL A-TYPE CHARACTERS DISTANCEEs war eine Hündin und ein Hase. There were a female dog and a rabbit 1 INDEF MAIN SUBJ INTRO INTRO INTRO INTRO INTROEs war eine Hündin und ein Hase. There were a female dog and a rabbit 2 INDEF MAIN SUBJ INTRO INTRO INTRO INTRO INTRODer Hase sah die Freundin von ihm The rabbit saw the female friend of him 2 DEFDP MAIN SUBJ MAIN SUBJ INDEF 0 1Der Hase sah die Freundin von ihm The rabbit saw the female friend of him 1 DEFDP MAIN OBJ MAIN SUBJ INDEF 1 1Der Hase sah die Freundin von ihm The rabbit saw the female friend of him 2 PRON MAIN OTHER MAIN SUBJ DEFDP 1 0dass sie einen Handwagen hatte mit einem Luftballon dran. that she had a chart with a balloon attached. 1 PRON SUB SUBJ MAIN OBJ DEFDP 1 1Er wollte mit dem Ballon spielen, He wanted to play with the balloon 2 PRON MAIN SUBJ MAIN OTHER PRON 1 2und die Freundin sagte, and the female friend said, 1 DEFDP MAIN SUBJ SUB SUBJ PRON 1 2er muss ihn zuerst losbinden, den Ballon. he has to untie it first, the balloon 2 PRON MAIN SUBJ MAIN SUBJ PRON 1 2Da wollte er ihn losbinden. Then he wanted to untie it 2 PRON MAIN SUBJ MAIN SUBJ PRON 1 3Und ausversehen rutschte er ihm aus der Hand. and suddenly it slips out (to him) from the hands. 2 PRON MAIN OTHER MAIN SUBJ PRON 0 1

considered as a single unit for the measure of distance. The third column

(CHAIN) assigns an index to each character (1 for the female dog and 2 for

the rabbit, in the case at issue). The number of characters intervening

between two mentions of the same referent (CHARACTERS) is counted

based on linear criteria. For example, in (3) reference to the rabbit (von ihm

‘of his’) intervenes between the two mentions of the female dog (die

Freundin ‘the female friend’ and sie ‘she’, respectively). (4) is a

modification of (3), in which the prepositional phrase von ihm ‘of his’ is

substituted by the prenominal possessive adjective seine ‘his’. In this case,

the rabbit character does not intervene between the two mentions of the dog

character.

(3) Der Hase sah die Freundin von ihm,

The rabbit NOM.SING.MASC. saw the friend ACC.SING.FEM. of he DAT.SING.MASC.

dass sie einen Handwagen hatte.

that she NOM.SING.FEM. a ACC.SING.MASC. cart had

(4) Der Hase sah seine Freundin, dass

The rabbit NOM.SING.MASC. saw his ACC.SING.FEM. friend ACC.SING.FEM. that

sie einen Handwagen hatte.

she NOM.SING.FEM. a ACC.SING.MASC. cart had

The final dataset was uploaded in WEKA, which is a machine learning

software tool including several algorithms for data mining (Hall et al. 2009).

We used WEKA to determine the extent to which the use of REs is affected

by the factors considered in the analysis, with special attention to those

involved in reference activation, i.e., linguistic features of the antecedent,

distance and numbers of intervening characters (see Section 2). We applied

the info gain function that derives the weight of each factor in determining a

certain outcome based on a decision tree analysis (use of a pronoun vs.

definite noun in our case – see Torregrossa et al. 2015 for the representation

of the decision tree corresponding to the use of REs in Italian and Greek).

The bilingual and the monolingual data have been analyzed separately, to

understand whether the use of REs was sensitive to different factors in the

two groups of speakers.

As has been mentioned in Section 2, the referent’s activation score was

derived as the weighted sum of the values corresponding to each activation

factor. The weights were provided by the info gain function. For each factor,

we assigned a number to each of its possible values. The greater the number,

the greater its effect in strengthening the referent’s activation. For instance,

a referent whose antecedent is a subject is more activated than a referent

whose antecedent is an object (Section 2). This is reflected in the

assignment of the number 0.6 to the value SUBJ and 0.4 to the value OBJ

(see (b) below).

(a) Clause of the antecedent: MAIN (0.4) > SUBORDINATE (0.2)

(b) Grammatical role of the antecedent: SUBJ (0.6) > OBJ (0.4) > OTHER

(0.2)

(c) Distance RE-antecedent (in units): 0 (0.8) > 1 (0.6) > 2 (0.4) > 3 (0.2) >

+ 3 (0)

(d) Number of intervening characters: 0 (0.8) > 1 (0.6) > 2 (0.4) > 3 (0.2) >

+3 (0)

In the next section, we show the results of the analysis. In Section 4.4.1, we

report the weights of each factor in monolingual and bilingual reference

production, and provide some examples of how the referent’s activation

score is derived. Then, we compare the activation scores associated with the

use of pronouns (Section 4.4.2) and definite nouns (Section 4.4.3) in both

groups of speakers.

4.4. Results

4.4.1. Weight of the activation factors and derivation of the activation score

Table 2 reports the weights of each activation factor in determining the use

of REs in monolingual and bilingual reference production. The factors are

ordered from the weakest to the strongest predictor of RE type. It should be

noted that even if the values of the weights are slightly different across the

two groups, bilingual and monolingual reference production is sensitive to

the same hierarchy of factors. Distance and number of intervening

characters are the strongest factors, followed by the antecedent’s

grammatical role and syntactic position, respectively.

FACTOR MONOLINGUALS BILINGUALS

a-clause 0.003 0.007

a-grammatical 0.02 0.05

characters 0.17 0.14

distance 0.27 0.21

Table 2: Weight of each activation factor in monolingual and bilingual reference production.

The result concerning the antecedent’s syntactic position is expected, given

that child narratives exhibit a lower level of syntactic complexity than adult

narratives (for which syntactic complexity plays a more important role – see

Bongartz and Torregrossa, to appear, and Andreou et al. 2015). The data

related to the antecedent’s grammatical role require further investigation.

Van Rij et al. (2011) claim that, at the age of 7, children do not have

sufficient WM resources to identify the discourse topic based on

grammatical information (Section 3). These cognitive resources may still be

developing in the age range considered in this paper.

Table 3 shows how the referent’s activation score is calculated at each unit

of the narrative reported in Table 1. We focus on the chain referring to the

female dog character.

Table 3: Activation scores associated with the female dog character in the narrative extract in Table 1.

First, let us consider the referent’s activation score at the point in which the

referent is mentioned (see the activation scores in red in the table). In line 4,

the female dog is referred to with a definite noun, i.e., die Freundin (von

ihm) ‘the female friend of his’. The antecedent is the indefinite noun eine

Hündin (a female dog) in line 1. The referent’s activation score is derived as

indicated in (5) – see (a)-(d) in Section 4.3 for the values and Table 2 for the

weights.

(5) Activation score of the antecedent’s syntactic position: (0.4*0.003) +

Activation score of the antecedent’s grammatical role: (0.6*0.05) +

Activation score of the number of intervening characters: (0.6*0.14) +

Activation score of the distance RE-antecedent: (0.6*0.21) =

Referent’s activation score: 0.2412

TYPE A-CLAUSE WEIGHT_A-CLAUSE GRAMMATICAL WEIGHT_GRAMMATICAL CHARACTERS WEIGHT_CHARACTER DISTANCE WEIGHT_DISTANCE SCOREEs war eine Hündin und ein Hase. INDEF INTRO INTRO INTRO INTRO INTRO INTRO INTRO INTRO INTROEs war eine Hündin und ein Hase 0,2832Der Hase sah die Freundin von ihm, 0,2412Der Hase sah die Freundin von ihm, DEFDP MAIN (0,4) 0,003 SUBJ (0,6) 0,05 1 (0,6) 0,14 1 (0,6) 0,21 0,2412Der Hase sah die Freundin von ihm, 0,2732dass sie einen Handwagen hatte mit einem Luftballon dran. PRON MAIN (0,4) 0,003 OBJ (0,4) 0,05 1 (0,6) 0,14 1 (0,6) 0,21 0,2312Er wollte mit dem Ballon spielen, 0,2406und die Freundin sagte, DEFDP SUB (0,2) 0,003 SUBJ (0,6) 0,05 1 (0,6) 0,14 2 (0,4) 0,21 0,1986er muss ihn zuerst losbinden, den Ballon. 0,2412Da wollte er ihn losbinden. 0,1992Und ausversehen rutschte er ihm aus der Hand. 0,1572

Therefore, in line 4, the dog’s activation score equals 0.2412. This

activation score is mapped onto the use of a definite noun (die Freundin von

ihm). Employing this general procedure, we can calculate individual values

for each story participant at each mention, and then arrive at the average

activation score associated with the use of a pronoun or a definite noun.

Based on this, we will compare bilingual and monolingual reference

production (Section 4.4.2).

Before proceeding, however, it should be pointed out that once the dog

referent is introduced it will stay activated and hence associated with an

activation score independently of its being actually mentioned (cf. the

activation scores in black in Table 3). For instance, in the last line of Table 3

(und ausversehen rutschte er ihm aus der Hand ‘and suddenly it slipped to

him out of his hands’), only the rabbit is mentioned (e.g., ihm ‘to him’).

However, the dog is active, too. Its activation depends on the fact that its

last mention is in a main clause (0.4*0.003), in subject position (0.6*0.05),

three unit distant (0.2*0.21) with no intervening character (0.6*0.14) (if we

exclude the balloon, which is inanimate). The resulting score equals 0.1572,

as indicated in the table.

4.4.2. Activation of pronouns in bilinguals and monolinguals

Figure 1 compares the activation scores encoded by bilingual pronouns (on

the left) and monolingual pronouns (on the right). The activation scores

have been z-score normalized, given that the activation factors have

different weights across the two groups of participants (Table 2 in Section

4.4.1). Bilingual pronouns tend to encode a lower degree of activation than

monolingual pronouns (one-way ANOVA: F(1) = 60.56, p < .001).

Figure 1: Box plot of the z-score normalized activation scores associated with the use of pronouns by

bilinguals (on the left) and monolinguals (on the right).

Figure 2 shows the regression between the average activation score of

pronouns (for each participant) and the score obtained in the vocabulary

test. Vocabulary score is a good predictor of the referent’s activation score

in pronoun production (r2=.24, p<0.01): the lower the child’s proficiency,

the lower the activation score.

Figure 2: Dispersion graph between the referent’s activation encoded by pronouns (activation) and the

vocabulary score in all children (bilinguals and monolinguals).

4.4.3. Activation of definite nouns in bilinguals and monolinguals

Figure 3 compares the (z-score normalized) activation scores encoded by

definite nouns across the two groups (bilinguals and monolinguals). The

result is consistent with the one shown in Figure 1. Bilingual definite nouns

encode lower degrees of activation than monolingual definite nouns (one-

way ANOVA: F(1) = 6.04, p < .02).

A 2x2 mixed design ANOVA with factors Group (Bilingual and

Monolingual) as a within subject variable and Type of Referring Expression

as a between subject variable reveals a significant interaction between the

two factors (F(3) = 132.39, p < .01). This suggests that the difference

between monolingual pronouns and bilingual definite nouns is not as

marked as the difference between bilingual pronouns and monolingual

pronouns. Furthermore, the regression between vocabulary scores and

average activation scores of definite nouns is not significant (r2=.011, p >

0.5).

Figure 3: Box plot of the z-score normalized activation scores associated with the use of definite

nouns by bilinguals (on the left) and monolinguals (on the right).

5. Discussion and conclusions

The aim of our study was to understand whether bilingual children differ

from their monolingual peers in the use of REs and whether this difference

is related to processing mechanisms specific to bilingual language

production.

-4-2

02

4ac

tivat

ion

bilinguals monolinguals

In Section 2, we identified two cognitive processes underlying reference

production. On the one hand, WM is involved in keeping track of the factors

that affect a referent’s activation (e.g., grammatical role of the referent’s

previous mention). On the other hand, processing mechanisms are involved

in the retrieval of REs appropriate to a given degree of activation. Based on

previous studies, we claimed that individual differences in WM capacity or

processing speed might affect reference production. More specifically, low

WM involves a decay in the referent’s activation which is reflected in the

use of REs encoding lower degrees of activation, i.e., definite nouns vs.

pronouns (overspecification). Slow processing speed leads to the production

of reduced referring forms (i.e., pronouns vs. definite nouns), which are

easier to access and process (underspecification).

The results reported in Figure 1 (Section 4.4.2) show that bilinguals tend to

produce pronouns encoding lower levels of activation than monolinguals.

Table 4 shows an example of the production of an underspecific pronoun.

The rabbit (der Hase, in bold) is the subject of the coordinated VPs rennt

(runs) and will (wants) and is picked up by the (underspecific) pronoun er

(he) in the last sentence. Notice that the two mentions of the rabbit are

separated by four clauses, in which two characters occur, i.e., the balloon-

seller and the plural character denoting the rabbit and the dog (sie ‘they’ in

the penultimate line). Thus, in the last sentence, the rabbit has a relatively

low activation score, which is nevertheless mapped into the use of a reduced

RE, i.e., a pronoun. Reference to the young rabbit is underspecific and

ambiguous between the rabbit himself and the balloon-seller.

Table 4: Extract from a story told by a bilingual child (8.11)

Table 5 shows that in a very similar context, an age-matched monolingual

child does not produce an underspecific reference. In the last line, the rabbit

is referred to by means of a definite noun (der Hase ‘the rabbit’),

corresponding to the referent’s low degree of activation (distance of five

clauses between the first and last mention of the rabbit and occurrence of

one intervening character)8.

8 As mentioned in footnote 6, we did not consider the use of the demonstrative pronoun der. In Table 5, the first occurrence of der refers to the young rabbit and the second to the balloon-seller. Neither reference is ambiguous.

(…) Da kommt ein Mann mit Luftballons. (There arrives a man with balloons)

Da rennt der Hase hin, (Then the rabbit runs there)

um seine Freundin glücklich zu machen (to make his female friend happy)

und will ihr einen Luftballon kaufen. (and he wants to buy her a balloon)

Aber dann sagt der Luftballonverkäufer, (and then the balloon seller says)

dass er 50 Cent will für einen Luftballon. (that he wants 50 cents for a balloon)

und dann betteln sie ihn an (…) (and then they pray him)

Also dann sucht er in seinen Taschen. (and then he looks in his pockets).

Dann war da ein alter Hase mit uhm # mit Ballons (then there was an old rabbit with balloons)

Dann hat der gesagt: (and he – the rabbit – said)

“Können Sie mir den größten und schönsten Ballon geben (Could you please give me the biggest and

priettiest balloon)

Table 5: Extract from a story told by a monolingual child (8.9)

Based on the literature discussed in Section 3, we interpret the data in

Figure 1 as showing that bilingual reference production is constrained by

processing speed. When mapping the referent’s activation score into the use

of a RE, bilinguals go for the “easiest” option. Furthermore, Figure 2 reveals

that this effect is modulated by language proficiency: the lower the

proficiency, the greater the processing costs and the more likely the

production of underspecific forms. In this paper, we do not report any data

revealing a bilingual disadvantage in processing speed, and the correlation

between slow processing and underspecification remains partly speculative.

We refer to the study by Bongartz & Torregrossa (submitted) which takes

into account a group of Greek-German bilingual children and reports a clear

correlation between activation scores and the scores obtained in a lexical

decision task (as a measure of processing speed).

Finally, it is noteworthy that in this study, overspecification does not emerge

as a bilingual referential strategy. Overspecification is expected as an effect

of low WM capacity (Section 3). The literature reports no evidence that, in

den Sie hatten?” (that you have?)

Dann hat der gesagt. (then he – the rabbit seller – said)

er wollte dafür Geld haben. (he wanted to have money for it)

Aber der Hase hatte kein Geld dabei (…) (but the rabbit had no money with him).

WM tasks, bilinguals perform worse than monolinguals. Rather, the

opposite seems to hold true (Bialystok et al. 2011).

Figure 3 (Section 4.4.3) shows that bilingual definite nouns encode lower

degrees of activation than monolingual definite nouns, as is the case of

pronouns. While the mapping of a (relatively) low activation into the use of

a pronoun indicates underspecification (i.e., ambiguity), definite nouns are

supposed to express low activation. Our data suggest that monolinguals tend

to use definite nouns in cases in which they could have used pronouns.

Table 6 – which reports an extract from a story told by a monolingual child

– provides empirical evidence in favor of this claim (see, for example, the

use of er ‘he’ instead of der Jungenhase ‘the young rabbit’ in the second

line).

Table 6: Extract from a story told by a monolingual child (8.7)

(…) und dann war das Hundemädchen sauer auf den Jungenhasen (Then the female dog was angry

with the young male rabbit)

Und dann hat der Jungenhase ein älteren Hasen mit ganz vielen Ballons gesehen. (und then the

young male rabbit saw an older rabbit with many balloons)

Und dann hat der Jungenhase um einen den schönsten Ballons gebeten. (and then the young male

rabbit asked for the most beautiful balloon)

und der ältere Hase wollte für den Ballon Geld haben (and the older man wanted to have money for

the balloon).

und der Jungenhase fand kein Geld in seinen Hosentaschen (and the young male rabbit found no

money in his pockets).

However, a qualitative analysis of the data reveals that this pattern (i.e., the

use of definite nouns vs. pronouns) is not consistent across all monolingual

participants, as also shown by the fact that the difference between the two

groups in Figure 3 is not as marked as is the case of Figure 1 (Section 4.4.1

and 4.4.2). When producing a fully-fledged form in association with a

relatively high degree of a referent’s activation, monolinguals seem to rely

on a cautious strategy for reference production which avoids ambiguities for

the listener. In the context reported in Table 6, this strategy might be

motivated by the occurrence of two referents of the same type and gender

(i.e., two male rabbits). Interestingly, the fact that there is no great

difference between bilinguals and monolinguals in the production of definite

nouns suggests that the two groups rely on a similar discourse-pragmatic

sensitivity. The observed differences might be due to diverging mechanisms

available in the two groups for the integration of syntax and discourse-

pragmatic information.

In this paper, we have provided a processing account for the observed

differences between monolinguals and bilinguals in reference production.

This does not mean, however, that cross-linguistic influences between

Greek and German do not play any relevant role. For example, the

production of overspecific pronouns could be the product of reinterpretation

of Greek null and clitic pronouns as German overt pronouns. Likewise, the

similar use of definite nouns may be motivated by the fact that the morpho-

syntactic structure of definite nouns is very similar across the two languages

(i.e., both Greek and German mark definite nouns for case, gender and

number). Our data do not allow to disentangle the effects of processing from

cross-linguistic influences of one language on the other. An in-depth

investigation of the interaction between these two factors can only be based

on the comparison between bilinguals speaking languages with similar

referential systems and bilinguals speaking languages with different ones

(cf. Torregrossa et al. submitted for an analysis of reference production in

Albanian-Greek and German-Greek bilinguals).

Before concluding, a last remark is in order. The results reported in this

paper diverge from the ones in Andreou et al. (2015), which show that

bilingual children tend to overspecify. This is surprising given that the

children considered in this paper are a subset of the ones taken into account

by the authors. However, the two studies are not comparable for several

reasons. First, they are based on two different methodologies for data

analysis (Section 3). Moreover, the authors observe overspecification in

association with the production of null subjects and clitics in Greek, which

we do not consider in the present study9. Finally, the analysis by Andreou et

al. (2015) does not discuss whether the bilinguals living in Greece and the

ones living in Germany perform differently. In our future work, we will

9 We refer to Torregrossa & Bongartz (submitted) for the use of the multi-factorial analysis for the analysis of reference production in Greek by the same group of bilinguals presented in this paper.

investigate to what extent these differences in sociological and ethnographic

measures (e.g., type of bilingualism, schooling, and language use in

different contexts – Bongartz 2016 on this issue) affect reference

production.

References

Almor, Amit. 1999. Noun-phrase anaphora and focus: The informational

load hypothesis. Psychological Review 106(4): 748-765.

Andreou, Maria, Eva Knopp, Christiane Bongartz & Ianthi Tsimpli. 2015.

Character reference in Greek-German bilingual children’s narratives. In

EUROSLA Yearbook (16), Leah Roberts (ed.). Amsterdam: Benjamins.

Ariel, Mira. 1990. Accessing noun-phrase antecedents. London: Routledge.

Arnold, Jennifer. 2010. How speakers refer: The role of accessibility.

Language and Linguistic Compass 4(4): 187-203.

Bayliss, Donna M., Christopher Jarrold, Alan D. Baddeley, Deborah M.

Gunn & Eleanor Leigh. 2005. Mapping the developmental constraints on

working memory in span performance. Developmental Psychology 41:

579-597.

Bialystok, Ellen, Fergus I.M. Craik, David W. Green & Tamar H. Gollan.

2011. Bilingual minds. Psychological Science 10(3): 89-129.

Blumenfeld, Henrike H. & Viorica Marian. 2007. Constraints on parallel

activation in bilingual spoken language processing: Examining

proficiency and lexical status using eye-tracking. Language and cognitive

processes 22(5): 633-660.

Bongartz, Christiane. 2016. Bilingual and second language development and

literacy – emerging perspectives on an intimate relationship. Proceedings

of the 21st International Symposium of Theoretical and Applied

Linguistics, Thessaloniki 2013, April 5-7 2013.

Bongartz, Christiane & Jacopo Torregrossa. to appear. The effects of

balanced biliteracy on Greek-German bilingual children’s secondary

discourse ability. Journal of Bilingual Education and Bilingualism.

Chen, Liang & Jianghua Lei. 2012. The production of referring expressions

in oral narratives of Chinese-English bilingual speakers and monolingual

peers. Child Language Teaching and Therapy 29(1): 41-55.

Gollan, Tamar H., Rosa I. Montoya, Cynthia Cera & Tiffany C. Sandoval.

2008. More use almost always means a smaller frequency effect: Aging,

bilingualism and the weaker links hypothesis. Journal of Memory and

Language 58: 787-814.

Grüning, André & Andrej Kibrik. 2002. Referential choice and activation

factors: A neural network approach. In Proceedings of the 4th Discourse

Anaphora and Anaphora Resolution Colloquium (DAARC 2002),

Antonio Branco, Tony McEnery & Ruslan Mitkov (eds.). Lisbon:

Edições Colibri, 81-86.

Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter

Reutemann, Ian H. Witten. 2009. The WEKA Data Mining Software: An

Update; SIGKDD Explorations 11(1).

Hendriks, Petra, Christina Englert, Ellis Wubs, John Hoecks. 2008. Age

differences in adults’ use of referring expressions. Journal of Logic,

Language and Information 17(4): 443-466.

Hinterwimmer, Stefan. 2015. A unified account of the properties of German

demonstrative pronouns. In Proceedings of the Workshop on Pronominal

Semantics at NELS 40, Patrick Grosz, Pritty Patel-Grosz and Igor

Yanovich (eds.). Amherst, MA: GLSA Publications.

Levy, Benjamin J., Nathan D. McVeigh, Alejandra Marful & Michael C.

Anderson. 2007. Inhibiting your native language: The role of retrieval-

induced forgetting during second language acquisition. Psychological

Science 18: 29-34.

Kibrik, Andrej. 2011. Reference in Discourse. Oxford: Oxford University

Press.

van Rij, Jacolien, Hedderik van Rijn & Petra Hendriks. 2011. Towards a

cognitively plausible model of reference. In Proceedings of the PRE-

CogSci 2011. Workshop on Production of Referring Expressions:

Bridging the gap between computational, empirical and theoretical

approaches. Kees van Deemter, Albert Gatt, Roger van Gompel & Emiel

Krahmer (eds.).

Phillip, Andrea M. & Iring, Koch. 2009. Inhibition in language switching:

What is inhibited when switching between languages in naming tasks?

Journal of Experimental Psychology: Learning, Memory and Cognition

35: 1187-1195.

Rosa, Elise C. & Jennifer Arnold. 2011. The role of attention in choice of

referring expressions. Proceedings of PRE-Cogsci: Bridging the gap

between computational, empirical and theoretical approaches to

reference, Boston.

Schneider, P., Dubé, R. V., and Hayward, D. 2005. The Edmonton Narrative

Norms Instrument. Retrieved [23.08.2013] from University of Alberta

Faculty of Rehabilitation Medicine

website: http://www.rehabresearch.ualberta.ca/enni.

Serratrice, Ludovica. 2007. Referential cohesion in the narratives of

bilingual English-Italian children and monolingual peers. Journal of

Pragmatics 39: 1058-1087.

Tomasello, Michael. 2003. Constructing a language: A usage-based theory

of language acquisition. Cambridge MA: Harvard University Press.

Torregrossa, Jacopo. to appear. The role of executive functions in the

acquisition of reference: The production of demonstrative pronouns by

German monolingual children. In Selected Proceedings of the 12th

http://www.rehabmed.ualberta.ca/spa/enni

Conference “Generative Approaches to Language Acquisition

(GALA12). New Castle Upon Tyne: Cambridge Scholars Publishing.

Torregrossa, Jacopo & Christiane Bongartz. submitted. Reference

production in Greek-German bilingual children: A cognitive-

computational account.

Torregrossa, Jacopo, Christiane Bongartz & Ianthi Tsimpli. 2015. Testing

accessibility: A cross-linguistic comparison of the syntax of referring

expressions. In Proceedings of the 89th Annual Meeting of the Linguistic

Society of America, Portland.

Torregrossa, Jacopo, Maria Andreou, Christiane Bongartz & Ianthi Tsimpli.

submitted. Pinning down the role of type of bilingualism in the

development of referential strategies, abstract submitted to the Workshop

“Heritage Language Knowledge and Acquisition”, GLOW 40, Leiden.

Activation of referents in the bilingual mind - pre-publication version

Documents