The potential contribution of statistical learning to second language acquisition Luca Onnis 1. Introduction Many fundamental aspects of human learning can be characterized as problems of induction – finding patterns and generalizations in space and time in conditions of uncertainty and from limited exposure. Some of these problems include deriving abstract categories from experience (e.g., Tenenbaum & Gri‰ths, 2001); learning word meanings from their co- occurrence with perceived events in the world (e.g., Frank, Goodman, & Tenenbaum, 2009; Yu & Smith, 2007), learning the similarity and di¤er- ence of meanings from their co-occurrence with other words (Landauer & Dumais, 1997); and acquiring the di¤erent levels of linguistic structure (e.g., Bod, 2009; Solan, Horn, Ruppin, & Edelman, 2005). Although the areas of application and specific theoretical claims vary, all these forms of inductive learning can be described under a common framework for problems arising in developmental psychology (Gopnik & Tenenbaum, 2007), inductive reasoning (Chater & Oaksford, 2008), language acquisition (Bannard, Lieven, & Tomasello, 2009; Solan, Horn, Ruppin & Edelman, 2005), computational linguistics (Bod, 2002; Chater & Manning, 2006; Jurafsky, 2003), and machine learning (MacKay, 2003). This common framework encompassing experimental and computational approaches can be termed distributional or statistical learning, because the focus is on how learners discover structure from probabilistic information in the environment 1 . Behavioral studies have indicated that infants, toddlers, and adults can rapidly extract structural properties of stimuli from probabilistic infor- mation inherent in the input they are exposed to. For example, before the age of three children implicitly use frequency distributions to learn which phonetic units distinguish words in their native language (Kuhl, 2002; 2004; 1. Here I mainly use the term statistical learning to comply with the general trend in the literature. Terms like distributional/probabilistic learning/approaches are equally viable and often used interchangeably in the literature. Bereitgestellt von | provisional account Angemeldet | 212.87.45.97 Heruntergeladen am | 04.03.13 15:54
This volume brings together contributors from cognitive psychology, theoretical and applied linguistics, as well as computer science, in order to assess the progress made in statistical learning research and to determine future directions. An important objective is to critically examine the role of statistical learning in language acquisition. While most contributors agree that statistical learning plays a central role in language acquisition, they have differing views. This book will promote the development of the field by fostering discussion and collaborations across disciplinary boundaries.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The potential contribution of statistical learning tosecond language acquisition
Luca Onnis
1. Introduction
Many fundamental aspects of human learning can be characterized as
problems of induction – finding patterns and generalizations in space and
time in conditions of uncertainty and from limited exposure. Some of
these problems include deriving abstract categories from experience (e.g.,
Tenenbaum & Gri‰ths, 2001); learning word meanings from their co-
occurrence with perceived events in the world (e.g., Frank, Goodman, &
Tenenbaum, 2009; Yu & Smith, 2007), learning the similarity and di¤er-
ence of meanings from their co-occurrence with other words (Landauer
& Dumais, 1997); and acquiring the di¤erent levels of linguistic structure
(e.g., Bod, 2009; Solan, Horn, Ruppin, & Edelman, 2005). Although the
areas of application and specific theoretical claims vary, all these forms
of inductive learning can be described under a common framework for
problems arising in developmental psychology (Gopnik & Tenenbaum,
2007), inductive reasoning (Chater & Oaksford, 2008), language acquisition
Jurafsky, 2003), and machine learning (MacKay, 2003). This common
framework encompassing experimental and computational approaches
can be termed distributional or statistical learning, because the focus is on
how learners discover structure from probabilistic information in the
environment1.
Behavioral studies have indicated that infants, toddlers, and adults can
rapidly extract structural properties of stimuli from probabilistic infor-
mation inherent in the input they are exposed to. For example, before the
age of three children implicitly use frequency distributions to learn which
phonetic units distinguish words in their native language (Kuhl, 2002; 2004;
1. Here I mainly use the term statistical learning to comply with the general trendin the literature. Terms like distributional/probabilistic learning/approaches areequally viable and often used interchangeably in the literature.
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Maye, Werker, & Gerken, 2002), use the transitional probabilities between
syllables to segment words (Sa¤ran, Aslin, & Newport, 1996), use word
distributions to discover syntactic-like relations among adjacent and non-
adjacent elements (see also Hay & Lany, this volume), learn to form
abstract categories (Gomez & Gerken, 2000), and rapidly establish form-
meaning mappings under conditions of uncertainty (Smith & Yu, 2008).
The behavioral findings on humans’ remarkable statistical learning
abilities have been enhanced and complemented by the rapid development
of robust and sophisticated computational methods that learn from corpora
of natural language (e.g. Bayesian, connectionist, dynamical systems; see
Chater & Manning, 2006; Gri‰th et al., 2010; McClelland et al., 2010).
Such methods make it possible to obtain detailed information on the
nature of the input to which learners are exposed (e.g., distributional prop-
erties of language), as well as the structured environment in which learners
2006; Goldstein, King, & West, 2003; Roy, 2003). Importantly, these
methods now allow researchers to simulate the putative mechanisms
responsible for the behavioral findings.
This chapter asks what specific role statistical learning might play in
understanding processes of second language (L2) learning after the acqui-
sition of a first language. My first goal is to propose that L2 learners may
(be made to) become attuned to useful distributional regularities (to be
discussed below). Such regularities correlate non-randomly with structural
properties of language, for instance, phonetic boundaries, word units in
connected speech, phrasal constituents, morphemic structure, and lexical
semantics, suggesting that at least part of the acquisition of language
may involve the acquisition of knowledge of distributional regularities. In
particular, I want to propose four general learning principles that can be
gleaned from the statistical learning literature and applied to L2 learning
scenarios. These principles are: (1) Integrate information sources; (2) Seek
invariant structure; (3) Reuse learning mechanisms for di¤erent tasks;
and (4) Learn to predict. These principles are exemplified in four studies
that highlight the benefits of statistical learning at the sublexical, lexical,
morpho-syntactic, phrasal and lexico-semantic levels. They all explore
how distributional information can be brought to bear on assisting second
language learning2.
2. Coverage here is selective and illustrative rather than comprehensive. Otherimportant related literature can be found in the accompanying chapters of thisbook (e.g., Ellis & O’Donnell; Johnson; Williams & Rebuschat, this volume).
204 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
My second goal is to elaborate on how these principles derived from
experimental studies in the laboratory can be put to use for specific prob-
lems arising in second language acquisition, and sketch out some practical
suggestions for bridging this laboratory-style research with L2 instruc-
tional practices. The upshot is that statistical learning can be used as a
diagnostic toolkit for identifying learner needs and pinpointing specific
areas for improvement in adult learners of language. In addition, statistical
learning principles can be e¤ectively implemented as supportive solutions
to enhancing instruction and curricula. By considering the implications
that statistical learning research may hold for practical aspects of learning,
I hope to indicate some tentative directions in the integration of basic and
applied research. Lastly, a third goal is to propose that computational
analyses of language corpora and behavioral experiments can be jointly
used in the service of the two goals above.
2. Origins and development of statistical learning
The origins of probabilistic approaches to language can be traced back to
structural linguistics, and the focus on finding regularities in languages. In
the 1950s Zelig Harris (1954) proposed a series of heuristics for discover-
ing phonemes and morphemes, based on the distributional properties of
these units in natural languages. For Harris and distributional linguists,
the process of discovering the structure of an unknown language (e.g., an
indigenous language of the Amazon basin) was akin to cracking a code
created with a secret language. This intuitive idea was explored mathemat-
ically by Claude Shannon (1948), who developed encryption/decryption
systems during World War II based on the statistical structure of sequences
of letters in an encrypted message. His concept of entropy in information
theory describes how much ‘uncertainty’ there is in a signal transmitted
over time. This uncertainty can be reduced if for example one knows that
the frequency for the character E is much more common in English than
the frequency of the character Z. Shannon (1951) contributed the first
rigorous statistical approach to language as a sequence of letters.
In the 1950s information-theoretic ideas inspired much work in the
nascent cognitive psychology, including the first experimental studies on the
learnability of formal linguistic systems. In a project named Grammarama,
George Miller (1956, 1967) asked adult participants to memorize strings
of letters such as XLLVXL that – unbeknownst to them – were either
random or followed a set of grammatical rules of sequencing (e.g.,
L must follow X ). The grammatical strings were generated by devices
Statistical learning and second language acquisition 205
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
called finite-state grammars, a class of possible formal grammars, and were
hence called artificial grammars. Miller found that participants memorized
grammatical strings much more quickly than random strings, suggesting
they had become sensitive to some of the rules generating such strings.
In the 1960s interest declined in both theoretical and behavioral ap-
proaches to distributional learning. The distributional methods of Harris
were seen as insu‰cient to capture the hierarchical linguistic relations
postulated by Chomsky (1957). In a similar way, Miller’s attempts to
explore the learnability of language-like systems were questioned because
of a perceived lack of common ground between artificial grammars and
natural languages to make plausible generalizations from one to the other.
For a couple of decades distributional approaches to language played a
minor role in language. Artificial grammars were seen as better suited to
the study of general processes of implicit learning, not necessarily related
to language acquisition (e.g., Reber, 1967).
It was not until the 1990s that artificial grammars were applied to
infants and toddlers (e.g., Mattys, Jusczyk, Luce & Morgan, 1999; Sa¤ran,
Aslin & Newport, 1996), documenting their remarkable abilities to use
a variety of probabilistic regularities in the speech signal. Adult learners
were also studied on the assumption that they usefully approximated
‘human simulations’ of infant learning (Gillette, Gleitman, Gleitman, &
Lederer, 1999; Redington & Chater, 1996). With respect to the implicit
learning studies of the earlier decades, researchers began to create miniature
languages that more closely mimicked distributional and structural aspects
of natural languages (see sections 3 to 5 below for practical examples). The
1990s also saw the development of sophisticated computational analyses of
language corpora. For example, Nick Chater and colleagues (Redington,
Chater, and Finch, 1998) provided large-scale computational analyses of
child-directed language transcriptions that distributional information may
actually be extremely useful to children in acquiring the abstract syntactic
category of words, such as nouns and verbs (see also Redington and
Chater, 1996, 1997).
Recently, an even more direct link between statistical learning and
natural language has been documented. Studies that compare directly statis-
tical learning and language processing (e.g., in within-subject designs where
the same participants are tested on statistical learning as well as natural
language tasks) are finding that similar cognitive and neural mechanisms
may be recruited for both syntactic processing of linguistic stimuli and sta-
tistical learning of structured sequence patterns more generally (Christiansen,
For each frame type found in the ELP corpus, the conditional probability
was estimated as above. Figure 1 provides a histogram showing the distri-
bution of phonotactic frames (left panel) and orthotactic frames (right
panel) as a function of how likely they are to flank an L, given the propor-
tion of occurrences in the English corpus for which a L and R were found.
The bar height indicates the number of frame types with a given probability
of having an L between them. There are 100 bins in the histogram, so each
bin accounts for a probability range of .01. The figure illustrates that the
3. This corpus is composed of more than 40,000 English word types accompaniedby their log-frequency of use. The corpus data reported here are part of amanuscript in preparation. The experimental data reported in Section 3.2 formpart of a thesis for the Advanced Graduate Certificate in Second LanguageStudies at the University of Hawaii (Uchida, 2010).
208 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
distribution of frames in English words is strongly bimodal. Most frames
are associated only with an L or with an R segment, but not both. Indeed,
the left- and right-most bins account for 60% of the frame types occurring
with L and R in speech. This analysis shows the distribution to be very
informative in terms of identifying two distinct categories, and is similar
in the case of letters and phonemes. In other words, a typical L1 or L2
speaker exposed to reasonable amounts of natural English input will expe-
rience L (but not R) mostly in the frame K_E and R (but not L) mostly
in the frame Z_A. Even if this fact is unbeknownst to speakers at the
conscious level, the information is important for SLA researchers and
language teachers when identifying learning di‰culties or when designing
materials that may support the learning of the L/R distinction. This is
because these consistent frames become predictive of when an L or an
Figure 1. Phonemes and letters immediately flanking L and R in English wordsare highly predictive of either L or R. y axis: the number of frame typesthat predict an L (as opposed to an R); x axis: the probability of pre-dicting an L versus an R. Of the 589 phonotactic frame types and 589orthotactic frame types found in English, most predict and L with a highdegree of certainty (rightmost column) or predict an R with a highdegree of certainty (leftmost column. A probability ¼ 0 for predicting Lequals probability ¼ 1 for predicting R). Analyses with frame tokensexhibit a similar bimodal distribution.
Statistical learning and second language acquisition 209
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
R is more likely. Thus, speakers may act upon this information in their
regular language use. As a corollary, if learners are helped to become
sensitive to this type of cue, their implicit mechanisms may be tuned to
perhaps learn to use it predictively as well.
3.2. An experiment with English pseudowords
The computational analyses carried out on natural language such as the
one above provide a way to estimate the potential informativeness of a cue
inherent in a language – here phonotactics and orthotactics of English. Do
people actually use such cues? This question was tested in a letter guessing
game similar to the classic ‘hangman’, in which English native speakers
and Japanese learners of English were presented with a list of ortho-
graphic pseudowords lacking one letter (e.g., SA*G ). The game consisted
of guessing which one of two letters is the most likely for a given pseudo-
word. Critical trials contained the R-L pair (‘‘Is L or R the missing letter?’’),
while filler trials contained other letter pairs (e.g., ‘‘Is M or N the missing
letter?’’). The critical trials were 90 frames from the corpus analyses above,
one third were sampled from the 30 most frequent frames in the left-most
bin of Figure 1, or those being in principle very informative in predicting
an L (L-informative). Another 30 critical trials were chosen among the 30
most frequent frames in the right-most bin of Figure 1 (R-informative).
As a control, another 30 critical trials were frames sampled among those
having closer to or equal to 0.5 probability, or those being the least infor-
mative (LR-ambiguous).
Results indicated that both native English speakers and Japanese learners
of English preferred L-responses most for trials containing frames predictive
of L and least for trials containing frames predictive of R. For frames that
were ambiguous between L and R, the di¤erence in preference for L or R
was not significant for both native and non-native groups. Thus, both
groups’ responses reflected the bimodal distribution of orthographic frames
in English. Participants did not just make a random guess about a single
letter in isolation. Rather, their linguistic choices under uncertainty were
guided intuitively by the integration of a larger context of information,
the sublexical distribution of letters in English. There were two further
interesting results pertaining to the Japanese participants. First their prefer-
ence for reading in English (measured on a self-assessment scale) correlated
with better predictions for the missing letter, suggesting that experience with
reading texts in a second language may naturally induce sensitivity to ortho-
210 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
tactics. Second, Japanese participants were poor on the classic perception
discrimination task with various spoken tokens of /l/ versus /r/, which tests
perception of sounds in isolation. The encouraging results on their ability
to use orthotactics leaves open the possibility that when tested on a per-
ception task that involves phonotactic frames these participants may
improve their perception judgments and better discriminate /l/ and /r/
sounds. Thus, the hard problem of perceiving novel speech contrasts may
be hardest when tested in isolation, and yet it may be attenuated in the
presence of other sources of information available in the signal (coarticu-
lation may be another cue not investigated here).
At present, these results suggest that pronunciation practices that situate
learning targets within highly informative phonotactic contexts may also
be advantageous in principle. Inviting L2 English learners to listen to
statements, then say whether they are ‘obviously true’, ‘strongly implied’,
or ‘clearly false’ may help surmount problems with training using minimal
pairs. Because these phrases exhibit a variety of /l/ and /r/ frames, some of
them strongly favoring one phoneme or the other (refer to the underlined
segments), they may be easier for learners who have already had some
exposure to the language to produce. Also, the pedagogical emphasis can
be on communication and intelligibility, rather than the far more di‰cult
ability to distinguish between sounds in contexts disguising such distribu-
tional tendencies (e.g., ‘light’ and ‘right’). In the next section, I illustrate a
case of useful distributional regularities above the lexical level, the discovery
of non-adjacent morphosyntactic relations in language.
4. Learning principle II: Seek invariance
Various aspects of inflectional morphology, such as gender and number
agreement on noun phrases and verb phrases, remain particularly di‰cult
to master even for second language learners at advanced levels of pro-
ficiency (e.g., Montrul, Foote, & Perpinan, 2008; Slabakova, 2008). The
phenomenon has generated a lively debate on the nature of such insensi-
tivity, with some accounts claiming lack of accessibility to L1-like linguistic
knowledge, and others placing the burden on online processing deficiencies
(for a review, see Clahsen & Felser, 2006). While much attention has been
devoted to the theoretical underpinnings of such insensitivity, and peda-
gogical research has addressed improving morphosyntax (see, e.g., Spada
& Tomita’s 2010 meta-analysis of the e¤ects of instruction on simple and
Statistical learning and second language acquisition 211
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
complex linguistic features), few studies have taken underlying statistical
learning ability into account, although studies have looked at the influence
of frequency of forms (e.g., Ellis & Schmidt, 1998) and the role of implicit
and explicit learning (e.g., Robinson, 2005). Are there ways to improve L2
processing abilities, for instance by making the target structures distribu-
tionally salient?
4.1. Artificial languages that mimic natural language
As noted in the introduction, artificial grammar learning tasks have been
used extensively to inquire into the nature of human implicit processes
(Cleeremans, Destrebecqz, & Boyer, 1998; Shanks, 2005) and their relation-
ship to language knowledge (Kaufman, DeYoung, Gray, Jimenez, Brown,
Mackintosh, 2010; Misyak & Christiansen, 2011). Tasks that tap into
language processes typically involve exposing participants to sentence-like
sequences of word-like stimuli (presented either visually or auditorily),
such as these: pel wadim jic, vot puser tood, dak wadim rud, vot loga tood.
While these pseudo-sentences appear random, they respect some under-
lying rule defined a priori by the experimenter, and learners exposed to
limited exemplars in relatively brief sessions end up becoming sensitive
to such rules, even though they cannot often explicitly verbalize what
the hidden rules were. For example, the pseudo-sentences above were
constructed by Gomez (2002; Figure 2) to simulate the learning of non-
adjacencies similar to morphosyntactic agreement and other non-local
structural regularities in natural languages: each specific first word predicts
a specific third word all the time. In the examples above, pel predicts jic,
vot predicts tood, and dak predicts rud consistently (e.g., probability
Pð jicjpel ¼ 1Þ, while the second middle word has no predictive value,
for example wadim precedes any third word with equal probability
ðPð jic jwadim ¼ 0:33Þ. It is possible to test learners’ implicit knowledge
after training, by presenting grammatical ( pel wadim jic), as well as un-
grammatical sentences (*pel wadim rud ), and even sentences that have
zero probability, for instance, the sentence pel hiftam jic is not encoun-
tered during the learning phase, but it crucially maintains the correct
structural non-adjacent relations ( pel __ jic).
In my brief review of the origins of statistical learning I noted that an
important development in the use of artificial grammars has been their
much closer contact with natural language phenomena. For example,
Gomez (2002) noted that sequences in natural languages typically involve
212 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
some items belonging to a relatively small set (functor words and mor-
phemes like am, the, -ing, -s, are) interspersed with items belonging to a
very large set (e.g. nouns, verbs, adjectives). Such asymmetry translates
into patterns of highly invariant nonadjacent items separated by highly
variable material (am cooking, am working, am going, etc.). How do learners
detect non-adjacent invariant structures? Gomez showed that the variability
of the material intervening between dependent elements (the first and third
word in her study) modulates the ability to detect non-local dependencies
in the grammar above. Learning improves consistently as the variability of
elements that occur between two dependent items increases. One explana-
tion for this pattern is that when the set of items that participate in the
dependency is small relative to the set of elements intervening, the non-
adjacent dependencies stand out as invariant structure against the changing
background of more varied material, as in pel wadim jic, pel puser jic, pel
coomo jic, pel loga jic, dak coomo rud, dak wadim rud, dak puser rud, etc.
(see Figure 2 and 3, columns 2–5. The di¤erent intervening words are
indicated as indexed Xs).
Figure 2. The underlying structure of the artificial grammars used by Gomez(2002; columns 2–5) and Onnis et al. (2003; 2004; columns 1–5).Sentences with three non-adjacent dependencies are constructed with anincreasing number of possible intervening X words. Gomez used 2, 6,12, and 24 intervening words. Onnis et al. added a new condition inwhich X ¼ 1.
Statistical learning and second language acquisition 213
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
This e¤ect also holds in the absence of variability of intervening words
shared by di¤erent nonadjacent items, as in pel wadim jic, vot wadim tood,
dak wadim rud (the word wadim is common to all sentences, see Figure 2
and 3, first column), as the intervening material becomes invariant with
respect to the variable dependencies (Onnis, Christiansen, Chater, &
Gomez, 2003). In natural languages, long-distance relationships such as
singular and plural agreement between noun and verb may in fact be
separated by the same material, for example the books on the shelf aredusty and the book[0] on the shelf is dusty.
Importantly, while artificial grammars appear at first limited in their
generativity, they can be used to test learners’ knowledge to generalize cor-
rectly to novel sentences never encountered in the training. For example,
the ability of adult learners to endorse pel hiftam jic while rejecting *pel
hiftam rud (where hiftam is a new word) is modulated by the same con-
ditions of zero or high variability (Onnis, Monaghan, Christiansen, &
Chater, 2004). The upshot of these studies on variability is that there is a
Figure 3. Data from Onnis et al. (2003) incorporating the original Gomez experi-ment. Learning of non-adjacent dependencies results in a U shape curveas a function of the variability of intervening X words, in five conditionsof increasing variability.
214 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
striking tendency to detect variant versus invariant structure that is in turn
adaptive to the informational demands of their input (for putative mecha-
nisms responsible for this e¤ect see Gomez & Maye 2005). And learning a
rule such a non-adjacent dependency is not an all or none phenomenon,
but is mediated by distributional properties in which such dependencies
happen to be experienced.
4.2. Invariance is at hand
A complementary line of studies using artificial grammars and corpora of
naturalistic child-directed speech has highlighted how invariant linguistic
structure may be at hand, namely available to learners in a short window
of time. Here I would like to show how invariant features of language can
be detected when two sentences are allowed to partially overlap in imme-
diate succession in a text or in the speech stream. In a study by Onnis,
Waterfall and Edelman (2008) adult learners were asked to find the novel
words of an alien language out of unparsed (unsegmented) whole sentences
such as kedmalburafuloropesai. In the absence of acoustic and prosodic cues
(the sentences were generated by a speech synthesis software), each sentence
could in principle be composed of a range of possible words, from a single
long word to as many words as there were identifiable syllabic clusters.
Again, as the words in the sentences were all novel, the task was di‰cult,
and it simulated some of the features and conditions involved in second
language learning. It was found that learners were significantly better at
the word segmentation task when a portion of sentences in the training
set were ordered so as to partially repeat themselves one after the other
(e.g., kedmalburafuloropesai, rafuloro), as opposed to a control learning
situation in which no sentences overlapped immediately (although the train-
ing set was composed of exactly the same sentences in both conditions).
Such immediate partial repetition across sentences facilitates comparison.
When aligned, the partial overlap of the two sentences suggests three
candidate units (kedmalbu, rafuloro, pesai) without the need for learners to
entertain all possible unit candidates over several sentences. Importantly,
the study also found evidence for a global learning e¤ect. That is, not only
did learners more reliably prefer word units heard in partially self-repeated
sentences during learning, but they also segmented units that never occurred
in such order more accurately (e.g. gianaber, kiciorudanamjeisulcaz). Similar
results were found in a second experiment in which the phrasal structure of
sentences was to be discovered, suggesting that the same mechanisms of
comparison of invariant structure can signal structure at di¤erent levels
of linguistic analysis.
Statistical learning and second language acquisition 215
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
How can these laboratory studies inform L2 instruction? First, it is
possible to construct L2 teaching materials that reflect the principles of
variability over invariant structure described above. Given the instructed
nature of L2 learning, the input to an L2 learner can be manipulated to
a large extent – and much more flexibly than the input to a child. For
instance, applying the concept of large variability to morphosyntactic rela-
tions in Spanish might involve a sequence of sentences like:
Tengo las botas para el matrimonio.
Tengo las pelıculas para el fin de semana.
Tengo las pelotas para el nino.
In the examples above the female gender and plural number agreements
(las __ -as) are repeated while the intervening lexical items are modified
(bot-, pelıcul-, pelot-). The prediction is that a large enough number of
intervening words should facilitate the extraction of the invariant non-
adjacent relations (las __ -as), either implicitly or by promoting the explicit
noticing of the target structure (Schmidt, 2001). It may also be useful to
keep constant the non-target elements of the sentence, so that the target
elements can vary. For instance, according to the zero variability condi-
tion described in Onnis et al. (2003; 2004) the following learning situation
could also have a facilitative e¤ect, where the same lexical item is shared
between the two gender-agreement constructions in Spanish:
Mabel es la amiga de Carlos
Carlos es el amigo de Maria.
I have described principles of distributional learning that exploit the
contrast between variable and constant materials. As mentioned earlier,
one of the goals of this chapter is to show how such principles may apply
to a range of learning situations, and thus be reusable in di¤erent tasks.
Theoretically, this raises the possibility to understand human learning in
terms of a relative small set of mechanisms. Practically, second language
learning problems that are treated as di¤erent or unrelated may be amena-
to-picture mappings for a given trial. For instance, when four words and
four pictures were presented simultaneously on each trial, there could be
4� 4 possible word-referent combinations (Figure 4).
The participants’ task was to infer the correct word-picture mappings
across these training trials. At test, they heard a single word and selected
the picture (among four) that they thought mapped onto that word.
Importantly, this task can only be solved if relations between words and
referents are tracked across multiple trials, hence the term cross-situational
learning (e.g., Yu & Smith, 2007). Onnis and colleagues were able to show
that a learning condition where a specific single word-referent pair repeated
successively across any two given trials (while all other pairs di¤ered) con-
tributed to the immediate disambiguation of the word-scene mappings.
Importantly, even pairs that had not appeared in such contiguous condi-
tions were shown to be learned, suggesting that principles of local alignment
and comparison did not only a¤ect the pairs involved locally, but had a
global benefit on learning the form-meaning pairs.
Cross-situational learning o¤ers a useful approach to modeling natural-
istic L2 learning, in addition to yielding results that can potentially inform
foreign language instruction. For example, Robinson and Ellis (2008),
following Slobin (1996), have referred to the adjustment required to use
conceptually novel form-meaning correspondences as rethinking for speak-
ing. In this respect, word-referent learning experiments may shed light on
the learning of lexical items and the structuring of conceptual domains in a
second language. In addition, L2 vocabulary acquisition takes place in a
rich extra-linguistic context. Quine (1960) illustrated this in his hypothe-
tical account of a field linguist attempting to discern the meaning of the
Statistical learning and second language acquisition 217
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
word gavagai (see also Ellis, 2005). In order to approximate the com-
plexity of cross-situational learning in naturalistic environments, labora-
tory studies may need to increase the number of referents in a given trial,
since visual scenes in the real world typically present much richer evidence
for learning (see Cenoz & Gorter, 2008, for a multimodal account of the
L2 linguistic landscape). Because the composition of actual visual scenes
may overwhelm learners’ computational abilities, prior knowledge of con-
ceptual and social domains, which is readily available to L2 learners,
could also be incorporated in further experiments to inform the design
of computer-based vocabulary tutorials and explore how learners might
solve Quine’s dilemma outside of virtual learning environments.
Figure 4. Two possible trials in the cross-referential learning paradigm used byOnnis, Edelman, & Waterfall (2011). In each single trial the simul-taneous presentation of 4 novel words and referents makes the form-meaning mapping task impossible. However, across trials learners wereable to reduce uncertainty, by comparing the elements that changedversus those that stayed constant. In the example here, one word-referent pair remains constant across the two trials, which one is it?
218 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
The training interventions briefly envisioned here await robust evidence
before they can be adapted to real-world L2 scenarios, but they open up
ways to connect basic research to instructional concerns. In the following
section, I conclude my overview of illustrative examples by looking at how
meaning can be inferred from distributions of words across texts, and how
knowledge of lexical distributions improves reading fluency.
6. Learning principle IV: Learn to predict
Corpus analyses suggest that many words entail probabilistic semantic
consequences that can be expressed as expectations for upcoming words.
For instance, in English, the verb provide typically precedes positive words
as in to provide assistance/benefits/relief, while the verb cause typically
precedes negative items, as in to cause death/damage/disruptions (Sinclair,
1996). Interestingly, while the denotational meaning of say cause involves
an agent and an e¤ect, there is little reason to assume a priori that in
actual use cause may be associated with negative words (Guo et al., in
press). Furthermore, while many speakers are fortunate enough to never
directly experience negative events such as bleeding, war, and death, they
learn that for instance wars are caused by famine rather than wars are pro-
vided by famine. Thus, although not the only way of discovering meaning,
the connotational meaning of certain words may emerge as being distributed
over the co-text and co-speech of their occurrences in natural language.
On these assumptions, connotational meaning naturally lends itself to
being modeled by distributional analyses of corpora.
One class of available computational models of semantic knowledge –
semantic space models – represents each word as a vector in a high-
dimensional state space (Rogers & McClelland, 2004; Vigliocco, Vinson,
Lewis, & Garrett, 2004). The meaning of a word is obtained from the
frequency distributions of the words that occur in the immediate context
of a target word, over a large corpus. This method captures empirically
the intuition that words that occur in the same sorts of contexts tend to
be similar in meaning. For example, road and street are similar because
they occur in similar co-texts (down the road/street, cross the road/street,
the road/street to the left) and are dissimilar from tea and co¤ee, which
co-occur with other words (co¤ee/tea and sugar, pour a cup of co¤ee/tea).
Using a vector space model, Onnis, Farmer, Baroni et al. (2009) were
able to derive the semantic orientation (valence tendency) of a number
of words such as cause, provide, encounter, markedly, largely, impressive,
Statistical learning and second language acquisition 219
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
purely on distributional grounds. This orientation was measured as a signed
value for each word. The authors further obtained independent human
values of semantic orientation in a sentence continuation task. Native
speakers of English were asked to provide a free completion for sentences
like ‘‘The mayor was surprised when he encountered. . .’’. When the portions
of sentence continuations were scored as positive or negative on a Likert
scale by a di¤erent group of participants, their values correlated signifi-
cantly with those assigned by the vector space model on a purely distribu-
tional way. This suggests that a) native speakers are sensitive to the general
semantic orientation of a word, and constrain their free productions to
accommodate it; and b) the semantic orientation of a word can be inferred
automatically by simple distributional properties of texts (the computer
model does not have any inbuilt notion of semantics). Computer models
like this one might approximate to a fair degree the cognitive mechanisms
available to human learners.
The presence of valence tendencies may facilitate language compre-
hension in real-time situations. If producing a given word in a sentence,
say the verb to encounter, prompted speakers to narrow down the set of
possible sentence continuations, then on the comprehender’s side sensi-
tivity to this semantic valence tendency may help anticipate the sentence
continuation, resulting in a measurable gain in comprehension fluency.
This idea was tested in a self-paced reading experiment in which words
in sentences were presented one by one incrementally, and participants
pressed a key on the keyboard to read the next word. This allows the mea-
surement of reading times for each given word in a sentence. It was found
that on-line reading was slowed down significantly in sentences that
contained an incongruent semantic orientation (e.g., the news on television
caused optimism in the audience), as opposed to when the sentences
contained a congruent semantic orientation (the news on television caused
pessimism in the audience). There is mounting evidence in the sentence
processing literature that humans use expectations as the sentence unfolds
in order to reduce the set of possible competitors to a word or sentence
continuation (e.g., Altmann and Kamide, 1999; Tanenhaus et al., 1995).
At each time step the linguistic processor uses the currently available input
and the lexical information associated with it to anticipate possible ways
in which the input might continue.
An important consideration is that distributional patterns of words
a¤ord speakers the necessary fluent generativity to understand and produce
not only crystallized collocations (e.g. to cause damage which has a high
co-occurrence and is probably learned by rote), but also novel ‘on-the-fly’
220 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
combinations of words that are nonetheless congruent with the general
valence tendency of a given word. Thus, learning about distributions of
words in the lexicon may support generative processes and is not limited
to rote memorization processes.
Explaining how learners acquire new vocabulary as well as how they
become fluent speakers figure prominently in second language research.
In this section I have o¤ered a glimpse of a distributional account of how
lexical semantics may be acquired and how it improves language fluency.
Researchers have long recognized the role of learning phraseology in
developing proficiency, for example collocations and other extended units
of meaning (e.g., Boers et al., 2006; Gries, 2008). The study reported here
further shows that having knowledge of language-specific selectional
restrictions and probabilistic tendencies is not a mere matter of sounding
‘native-like’ from a stylistic point of view. Rather, there is correlation
between knowledge of language-specific phraseology and language fluency
in native speakers (for studies of L2 see Howart, 1998; Onnis, 2001; Towell
et al., 1996). The study also o¤ers some methodological advances. Often
proposals of vocabulary learning have been described in qualitative mental-
istic terminology that may not entirely provide causal and mechanistic
explanations. Exactly how the denotational and connotational meanings
of words are learned? I have argued that at least some aspects of lexical
semantics can in principle be derived distributionally from a corpus using
simple computational procedures. While still underdeveloped for instruc-
tional purposes, this approach opens up ways to think about what types
of texts and word distributions within texts can optimize the salience of,
for example, semantic orientations. Thus, one promise of computational
modeling for second language learning is the possibility of making assump-
tions explicit and testable under specific conditions in computer simulations,
as well as in testable conditions with human learners.
7. Discussion: Distributional approaches to SLA
In this chapter I have proposed that by looking at language learning as
induction of patterns and generalizations over patterns, important insights
can be gained, not only for L1 but also in L2 research. I have further
suggested some ways in which L2 instruction inspired by principles as
well as methodologies o¤ered by statistical learning may help adult learners
capitalize on distributional information that correlates with di¤erent types
of linguistic structure at di¤erent levels of analysis – sublexical, morpho-
Statistical learning and second language acquisition 221
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
syntactic, lexical and phrasal, and lexico-semantic. My overarching goal
has been to make the case for a closer integration of the research para-
digm and methods of statistical learning and research on second language
acquisition. I also wanted to stress the role of miniature artificial languages
for unveiling principles of adult human learning. To date, most miniature
languages involving adults have been intended to simulate scenarios of child
language acquisition. Adult learners are thought of as useful ‘human simu-
lations’ (Gillette, Gleitman, Gleitman, & Lederer, 1999) that approximate
some learning behavior in infancy. However, these studies may also be
directly linked to adult second language acquisition, because adults already
possess knowledge of a linguistic system when they engage in learning a
novel miniature language. As such, artificial grammar experiments with
adults can be seen as useful human simulations of second language learn-
ing processes. Sections 3 to 6 reviewed relevant literature and proposed
four principles of learning.
Section 3 contributed the idea that learning di‰culties can be overcome
by integrating di¤erent probabilistic sources to the task at hand. A tradi-
tional view that sees language separated in modular representational levels
(e.g., phonetic – phonemic – sublexical) may underestimate the large redun-
dancy of probabilistic information available in the signal. Accordingly, the
perception of a foreign sound would be treated as a purely acoustic prob-
lem, and as such its solution sought at the acoustic level only. Instead,
phonotactic and orthotactic regularities (along with other information yet
to be assessed) may come in handy in recognizing the di‰cult sound.
Sections 4 and 5 discussed the principle that learners seek invariance in
the signal. Becoming sensitive to what changes versus what stays constant
in the linguistic environment can highlight structural relations in language
such as word boundaries, non-adjacent dependencies, syntactic phrases,
and form-meaning mappings. Importantly, the putative underlying mech-
anisms of alignment and comparison of candidate structures are simple
enough general learning mechanisms and can be ‘recycled’ at di¤erent
levels of linguistic representation, providing a general framework for learn-
ing structure (see further below).
In Section 6 I discussed how probabilistic lexico-semantic constraints
impose choices on sentence continuations in free productions. In addition,
knowledge of lexical semantics improves fluency in realistic conditions
such as when reading text. Finally, Section 3 and 6 together contributed
the idea of integrating computational analyses of language to make experi-
mental predictions about which statistical properties are useful for learning
and processing language. Computational analyses of corpora allow one to
222 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
assess the a priori usefulness of one or more probabilistic cues, which can
then be evaluated empirically with language learners. In sum, statistical
approaches to language contribute a diagnostic toolkit for testing what is
easy and di‰cult to learn in experimentally controlled settings, and may
further o¤er supportive solutions to instructional needs.
7.1. Implications for L2 instruction
While it is early to sketch a map of how statistical learning will inform
educational practices in meaningful ways, I speculate here on a few possi-
bilities. For instance, statistical learning can be seen as complementary
to existing techniques of input-based enhancement, which attempt to make
certain features of the language more salient (e.g., Sharwood Smith, 1991).
While textual enhancement can be achieved via manipulation of typo-
graphical cues such as bolding or italics, meta-analytic reviews of this
research domain show that learners exposed to enhanced texts barely out-
perform those exposed to unenhanced, flooded texts on targeted gram-
matical features (Lee & Huang, 2008). It may be possible, therefore, to
structure texts such that certain distributional properties enhance a partic-
ular target structure. In this respect, presenting a di‰cult structure in
variation sets might inherently bring it to the attention of the learner,
giving rise to the establishment of form-meaning connections. In addition,
attempts to direct attention to L2 mappings may result in even greater
performance gains when cues are made salient. That is, instructional inter-
ventions that orient learners to multiple distributional cues in ways that
take advantage of the contribution of each cue in the real-time compre-
hension or production of fully-formed sentences or utterances may further
reinforce learning.
Such proposals are consistent with an emerging consensus on the
part of researchers from both generative (Slabakova, 2008) and cognitive-
interactionist (Ortega, 2007) traditions who recommend practicing form
and function in meaningful contexts. Therefore, one major advantage of
applying statistical learning to second language teaching is its potential
applicability to actual learning scenarios. If certain distributional properties
of the input accelerate learning (as documented in several independent
experiments on adult artificial language learning in this volume), then
it is possible in principle to tailor the learner’s experience to reflect such
optimal conditions, providing conditions of ‘statistically structured input’,
in line with existing work (e.g., Lee & Van Patten, 2003).
Statistical learning and second language acquisition 223
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Statistical learning research on L2 has also practical advantages that
work in L1 settings does not. The initial stages of the development of
language in infants and young children are mostly under parental control
and di‰cult to modify with explicit interventions. Conversely, modulating
the input an L2 learner receives can be practically achieved in various
flexible ways, either in the classroom, or via educational software, or via
the construction of materials that incorporate statistical learning principles.
7.2. The relationship with implicit learning
While artificial language studies have been used in SLA, most have focused
on the nature of implicit learning (see Dienes, this volume; Shanks, 2005)
and knowledge in L2 (see Hamrick and Rebuschat, this volume; Leung
and Williams, 2006; Schmidt, 1994), rather than on providing mechanisms
of statistical learning. In most cases these studies do not directly include
manipulations of distributional information in their designs, as opposed
to the studies presented here. In this respect, research on statistical learning
can be seen as complementary and orthogonal to the implicit/explicit dis-
tinction, the latter still being a useful framework for investigating processes
of human learning. Statistical learning may occur on a cline from com-
pletely implicit to explicit. For example, a textbook or a learning task
may present scenarios and sentences that implicitly form variation sets
(see Section 4). The outcome of learning may at this point be fully explicit
(a sort of ah-ah experience: ‘‘I recognize that what stays constant here may
be an L2 construction’’), or less so, with the construction standing out
without direct awareness on the part of the learner – who is perhaps
engaged in encoding or decoding the meaning or the pragmatic relevance
of the event.
Furthermore, it is possible to direct L2 learners to explicitly find patterns
of invariance in collections of texts, as indicated by pedagogical uses of
corpora (e.g., Aston, Bernardini, & Steward, 2004). The relation between
statistical regularities and implicit learning can be quite complex in second
language learning. While certain distributional properties of language,
especially low-level ones such as probabilistic phonotactics, are definitely
learnt implicitly in one’s first language and may appear di‰cult to teach
explicitly, there is also evidence to the contrary. Al-jasser (2008) reported
on a pre-post test intervention study investigating the e¤ect of teaching
English phonotactics to Arabic speakers with the purpose of improving
their lexical segmentation abilities. His post-test results showed significant
gains in the lexical segmentation of running speech in English. Therefore,
224 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
while it is quite reasonable to assume that statistical learning in infancy
and childhood is implicit, for second language learning this line of research
o¤ers non-intuitive insights beyond the classic implicit/explicit divide.
7.3. Defusing the internalist/externalist debate
Research into statistical learning, in addition to guiding the development
of novel instructional interventions, may also provide theoretical insight
into the mechanisms underlying existing forms of L2 instruction, the e¤ec-
tiveness of which has already been demonstrated. The trend in L2 research
toward meta-analytic reviews (Norris & Ortega, 2006, 2011) has generated
robust evidence for, among other areas, the role of interaction in learning
in another language (Keck, Iberri-Shea, Tracy-Ventura, Wa-Mbaleka, 2006;
Mackey & Goo, 2007).
Many researchers now see the divide between social and cognitive
dimensions of learning as hurtful to a better understanding of language
and communication, in both first and second language research. While in
this chapter I have focused on finding language-internal regularities in the
input, such regularities need not be e¤ective in isolation, because there
is already evidence that they do take e¤ect in social settings. Statistical
sensitivity develops both within the linguistic input learners are exposed
to, and across the linguistic and non-linguistic exchanges with their
interlocutors during social interaction. Thus, distributional information
inherent in the input along with social interaction can provide reliable
cues to discovering structural and abstract properties of language (for a
review, see Meltzo¤, Kuhl, Movellan, & Sejnowski, 2009).
In this respect, one general framework for statistical learning that
invokes cognitive principles directly relevant to interactionist approaches
has been put forth by Goldstein and colleagues (2010). This framework
uses the acronym ACCESS as a mnemonic for several key principles in
learning from distributional patterns (Align Candidates, Compare, Evaluate
Statistical/Social significance). Each of these components has a clear ana-
logue in interactionist SLA research. To begin, L2 interaction is funda-
mentally a matter of exposure to input through conversational discourse,
as illustrated by the following example, adapted from a classroom study
on learner interaction in computer-mediated communication. Here, Kin
and Gin are exchanging opinions in a communicative task:
(1) Kin: If you don’t have much money, you can’t go university.
Gin: but why do you go to the university?
Statistical learning and second language acquisition 225
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Her interlocutor’s response o¤ers Kin an immediate opportunity to
align candidates. For example, she may pay attention (Schmidt, 2001) to
the partial reformulation of the verb phrase ‘go to the university’. Kin’s
ability to restructure her knowledge of the usage required here may rely
on cognitive comparison, during which learners’ output ‘‘must be com-
pared with the relevant data available from the contingent utterances of
their more competent interlocutors’’ (Doughty, 2001, p. 225). As hypothe-
sized by Laufer and Hulstijn (2001), task-induced involvement is what
drives L2 learning in this case, through need, search, and evaluation. The
involvement load hypothesis acknowledges that motivational as well as
cognitive components are involved in incidental second language learning
(see also Dornyei, 2009). The statistical significance of the information Kin
is presented with is registered according to mechanisms detailed throughout
this chapter (but see Ellis, 2006 on related factors that impede learning from
input). Finally, SLA theory o¤ers several theoretical perspectives emphasiz-
ing the sociocultural (Lantolf, 2000), sociocognitive (Atkinson, 2011), and
socially distributed (Markee & Seo, 2009) aspects of L2 interaction that
may help interpret the social significance of the linguistic choices in the
present dyadic exchange. In sum, an interactionist account of SLA that
incorporates principles of statistical learning is not merely possible; in
many respects it already exists. What remains to be done is to more ex-
plicitly articulate these connections in order to strengthen future empirical
work.
To conclude, I have argued that there is an important potential role for
statistical learning research in terms of direct links to practical aspects of
second language learning and instruction, namely diagnosing learner needs,
enhancing instruction and curricula, and defining principles to put into
practice in a variety of ways, as called for by the specific details of the learn-
ing context.
Acknowledgements
I would like to thank Shimon Edelman, Kevin Gregg, Daniel Jackson,
Hannah Jones, Elizabeth Kissling, Phillip Hamrick, Julie Lake, William
O’Grady, Lourdes Ortega, Patrick Rebuschat, Dick Schmidt, and two
anonymous reviewers for their comments on earlier versions of this chapter.
The manuscript also benefited from useful discussions with several graduate
students in the SLS program at the University of Hawaii. The author was
partially supported by a Language Learning Research Grant.
226 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
References
Akahane-Yamada, R., Kato, H., Adachi, T., Watanabe, H., Komaki, R., Kubo,R., Takada, T, and Ikuma, Y.
2004 ATR CALL: A speech perception/production training systemutilizing speech technology, The 18th International Congress onAcoustics, III, 2319–2320.
Al-jasser, F.2008 The e¤ect of teaching English phonotactics on the lexical segmen-
tation of English as a foreign language. System, 36, 1, 94–106.Altmann, G.T.M., & Kamide, Y.1999 Incremental interpretation at verbs: Restricting the domain of
subsequent reference. Cognition, 73, 247–264.Atkinson, D.2010 Extended, embodied cognition and second language acquisition.
Applied Linguistics, 31, 599–622.Aston, G., Bernardini, S., & Stewart D. (Eds.)2004 Corpora and language learners. Amsterdam: Benjamin.
2007 The English Lexicon Project, Behavior Research Methods, 39,445–459.
Bannard, C., Lieven, E. & Tomasello, M.2009 Modeling children’s early grammatical knowledge, PNAS, 106,
41, 17284–17289.Bod, R.2009 From exemplar to grammar: A probabilistic analogy-based model
of language learning, Cognitive Science, 33, 752–793.Boers, F., J. Eyckmans, J. Kappel, H. Stengers & M. Demecheleer2006 Formulaic sequences and perceived oral proficiency: Putting a
lexical approach to the test. Language Teaching Research, 10,245–261.
Cenoz, J., & Gorter, D.2008 The linguistic landscape as an additional source of input in
second language acquisition. IRAL, 46, 267–287.Chambers, K.E., Onishi, K.H., & Fisher, C.2003 Infants learn phonotactic regularities from brief auditory experi-
ence. Cognition, 87, B69–B77.Chater, N., & Manning, C.D.2006 Probabilistic models of language processing and acquisition.
Trends in Cognitive Sciences, 10, 335–344.Chater, N., & Oaksford, M. (Eds.)2008 The probabilistic mind: Prospects for Bayesian cognitive science.
Oxford: Oxford University Press.
Statistical learning and second language acquisition 227
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
116, 382–393.Christiansen, M., Onnis, L., & Hockema, S.2009 The secret is in the sound: From unsegmented speech to lexical
categories. Developmental Science, 12(3), 388–395.Christiansen, M.H., Conway, C., & Onnis, L.2007 Neural responses to structural incongruencies in language and
statistical learning point to similar underlying mechanisms. InProceedings of the 29th Annual Meeting of the Cognitive ScienceSociety.
Clahsen, H. & C. Felser2006 How native-like is non-native language processing? Trends in
Cognitive Sciences, 10, 564–570.Cleeremans, A., Destrebecqz, A., & Boyer, M.1998 Implicit learning: News from the front, Trends in Cognitive
Sciences, 2, 406–416.Dale, R., & Spivey, M.J.2006 Unraveling the dyad: Using recurrence analysis to explore patterns
of syntactic coordination between children and caregivers in con-versation. Language Learning, 56, 3, 391–430.
Dell, G.S., Reed, K.D., Adams, D.R., & Meyer, A.S.2000 Speech errors, phonotactic constraints, and implicit learning: A
study of the role of experience in language production. Journalof Experimental Psychology: Learning, Memory, & Cognition,26, 1355–1367.
Dienes, Z.this volume Conscious versus unconscious learning of structure.
Dornyei, Z.2009 Individual di¤erences: Interplay of learner characteristics and
learning environment. Language Learning, 59, 230–248.Doughty, C.2001 Cognitive underpinnings of focus on form. In P. Robinson (Ed.),
Cognition and second language instruction (pp. 206–257). Cam-bridge: Cambridge University Press.
Ellis, N.C.2005 At the interface: Dynamic interactions of explicit and implicit
language knowledge. Studies in Second Language Acquisition,27, 305–352.
Ellis, N.C.2006 Selective attention and transfer phenomena in L2 acquisition:
2010 Implicit learning as an ability. Cognition, 116, 321–340.Keck, C.M., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S.2006 Investigating the empirical link between task-based interaction
and acquisition: A meta-analysis. In J. M. Norris & L. Ortega(Eds.), Synthesizing research on language learning and teaching(pp. 91–131). Amsterdam: John Benjamins.
Kuhl, P.K.2004 Early language acquisition: Cracking the speech code. Nature
Reviews Neuroscience, 5, 831–843.Kuhl, P.K.2000 A new view of language acquisition. Proceedings of the National
Academy of Science, 97, 11850–11857.Landauer, T.K., & Dumais, S.T.1997 A solution to Plato’s problem: The latent semantic analysis
theory of acquisition, induction, and representation of knowledge,Psychological Review, 1, 2, 211–240.
Lantolf, J. (Ed.).2000 Sociocultural theory and second language learning. Oxford: Oxford
University Press.Laufer, B., & Hulstijn, J.2001 Incidental vocabulary acquisition in a second language: The con-
struct of task-induced involvement. Applied Linguistics, 22, 1–26.Lee, J., & Van Patten, B.2003 Making Communicative Language Happen. New York: McGraw
Hill.
230 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Lee, S., & Huang, H.2008 Visual input enhancement and grammar learning: A meta-
analytic review. Studies in Second Language Acquisition, 30, 307–331.
Leung, J. & Williams, J.N.2006 Implicit learning of form-meaning connections. In Sun, R. &
Miyake, N. (Eds) Proceedings of the Annual Meeting of the Cog-nitive Science Society, pp. 465–470. Mahwah, N.J.: LawrenceErlbaum.
MacKay, D.J.C.2003 Information Theory, Inference, and Learning Algorithms, Cam-
bridge University Press.Mackey, A., & Goo, J.2007 Interaction research in SLA: A meta-analysis and research syn-
thesis. In A. Mackey (Ed.), Conversational interaction in secondlanguage acquisition (pp. 407–452). Oxford: Oxford UniversityPress.
Seidenberg, M.S., and Smith, L.B.2010 Letting Structure Emerge: Connectionist and Dynamical Systems
Approaches to Understanding Cognition. Trends in CognitiveSciences, 14, 348–356.
McClelland, J.L.1998 Connectionist models and Bayesian inference. In Rational models
of cognition, ed. by Mike Oaksford and Nick Chater, 21–53.Oxford: Oxford University Press.
Meltzo¤, A.N., Kuhl, P.K., Movellan, J., & Sejnowski, T.J.2009 Foundations for a new science of learning. Science, 325, 284–
288.Miller, G.A.1956 Information and memory, Scientific American, 1956, 195 (2), 42–
47.
Statistical learning and second language acquisition 231
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Miller, G.A.1958 Free recall of redundant strings of letters. Journal of Experimen-
tal Psychology, 56, 485–491.Misyak, J.B., Christiansen, M.H. & Tomblin, J.B.2010 Sequential expectations: The role of prediction-based learning in
language. Topics in Cognitive Science, 2, 138–153.Montrul, S., Foote, R., & Perpinan, S.2008 Gender agreement in adult second language Learners and Spanish
heritage speakers: The e¤ects of age and context of acquisition.Language Learning, 58, 3, 503–553.
Norris, J., & Ortega, L.2010 Research synthesis. Language Teaching, 43, 461–479.
Norris, J., & Ortega, L. (Eds.).2006 Synthesizing research on language learning and teaching. Amster-
dam: John Benjamins.Onnis, L.2001 Fluency in native and non-native speakers. In Carli A. (Ed.)
Aspetti linguistici e interculturali del bilinguismo. (pp. 20–139)Milano: Franco Angeli.
Onnis, L., Farmer, T., Baroni, M., Christiansen, M.H., and Spivey, M.J.2009 Generalizable distributional regularities aid fluent language proc-
essing: The case of semantic valence tendencies. Special issue ofthe Italian Journal of Linguistics, 20(1), 129–156.
Onnis, L., Christiansen, M.H., Chater, N., and Gomez, R.2003 Reduction of uncertainty in human sequential learning: Evidence
from artificial language learning. Proceedings of The 25th AnnualConference of the Cognitive Science Society. (pp. 886–891).Mahwah, NJ: Lawrence Erlbaum.
Onnis, L., Monaghan, P., Christiansen, M.H., & Chater, N.2004 Variability is the spice of learning, and a crucial ingredient
for detecting and generalizing nonadjacent dependencies. In Pro-ceedings of the 26th Annual Conference of the Cognitive ScienceSociety.
Onnis, L., Waterfall, H. & Edelman, S.2008 Learn locally, act globally: Learning language from variation set
cues. Cognition, 109, 423–430.Onnis, L., Edelman, S., & Waterfall, H.2011 Local statistical learning under cross-situational uncertainty. In
L. Carlson, C. Holscher and T. Shipley (Eds.). Proceedings ofthe 33rd Annual Conference of the Cognitive Science Society.
Onnis, L. Uchida, Y. & Magnuson, J.in preparation Distributional phonotactic cues assist the perception of speech
contrasts.Ortega, L.2007 Meaningful L2 practice in foreign language classrooms: A
cognitive-interactionist SLA perspective. In R.M. Dekeyser (Ed.),
232 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Practice in a second language: Perspectives from applied linguisticsand cognitive psychology (pp. 180–207). Cambridge: CambridgeUniversity Press.
Pacton, S., Perruchet, P., Fayol, M., & Cleeremans, A.2001 Implicit learning out of the lab: The case of orthographic regu-
larities. Journal of Experimental Psychology: General, 130, 401–426.
Quine, W.V.O.1960 Word and object. Cambridge, MA: MIT Press.
Reber, A.S.1967 Implicit Learning of Artificial Grammars. Journal of Verbal
Learning and Verbal Behavior, 6, 855–863.Redington, M. & Chater, N.1998 Connectionist and statistical approaches to language acquisition:
A distributional perspective. Language and Cognitive Processes,13, 129–191.
Redington, M., Chater, N., & Finch, S.1998 Distributional information: A powerful cue for acquiring syntac-
tic categories. Cognitive Science, 22, 425–469.Redington, M. & Chater, N.1997 Probabilistic and distributional approaches to language acquisi-
tion. Trends in Cognitive Sciences, 1, 273–281.Redington, M. & Chater, N.1996 Transfer in artificial grammar learning: A reevaluation. Journal
of Experimental Psychology: General, 125, 123–138.Robinson, P.2005 Cognitive abilities, chunk-strength, and frequency e¤ects in im-
plicit artificial grammar and incidental L2 learning: Replicationsof Reber, Walkenfeld, and Hernstadt (1991) and Knowlton andSquire (1996) and their relevance for SLA, Studies in SecondLanguage Acquisition, 27, 2, 235–268.
Robinson, P., & Ellis, N.C.2008 Conclusion: Cognitive linguistics, second language acquisition,
and L2 instruction – issues for research. In P. Robinson & N.C.Ellis (Eds.), Handbook of cognitive linguistics and second languageacquisition (pp. 489–545). New York: Routledge.
Rogers, T.T. & McClelland, J.L.2004 Semantic Cognition: A Parallel Distributed Processing Approach.
Cambridge, MA: MIT Press.Roy, D.2009 New Horizons in the Study of Child Language Acquisition. Pro-
ceedings of Interspeech 2009. Brighton, England.Sa¤ran, Aslin, & Newport1996 Statistical Learning by 8-Month-Old Infants. Science, 274 (5294).
1926–1928.
Statistical learning and second language acquisition 233
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Shanks, D.R.2005 Implicit learning. In K. Lamberts and R. Goldstone (Eds.), Hand-
book of Cognition (pp. 202–220). London: Sage.Shannon, C.1951 Prediction and Entropy of Printed English, Bell System Technical
Journal, 30, pp. 50–64. Reprinted in D. Slepian, (Editor) (1974).Key Papers in the Development of Information Theory, New York:IEEE Press.
Shannon, C.1948 A Mathematical Theory of Communication, Bell System Techni-
cal Journal, 27, 379–423 and 623–656. Reprinted in D. Slepian,(Editor) (1974). Key Papers in the Development of InformationTheory, New York: IEEE Press.
Sharwood Smith, M.1991 Speaking to many minds: On the relevance of di¤erent types
of language information for the L2 learner. Second LanguageResearch, 7, 2, 118–132.
Schmidt, R.2001 Attention. In P. Robinson (Ed.), Cognition and second language
instruction (pp. 3–32). Cambridge: Cambridge University Press.Schmidt, R.1994 Implicit learning and the cognitive unconscious: Of artificial
grammars and SLA. In N. C. Ellis (Ed.), Implicit and ExplicitLearning of Languages (pp. 165–209). London: Academic Press.
Sinclair, J.1996 The search for units of meaning, Textus, IX, 75–106.
Slabakova, R.2008 Meaning in the second language. Berlin: Mouton de Gruyter.
Slobin, D.I.1996 From ‘‘thought and language’’ to ‘‘thinking for speaking’’. In J.J.
Gumperz & S.C. Levinson (Eds.), Rethinking linguistic relativity(pp. 70–96). Cambridge: Cambridge University Press.
statistics. Cognition, 106, 1558–1568.Solan, Z., Horn, D., Ruppin, E., and Edelman, S.2005 Unsupervised learning of natural languages. Proceedings of the
National Academy of Science, 102, 11629–11634.Spada, N., & Tomita, Y.2010 Interactions between type of instruction and type of language
feature: A meta-analysis. Language Learning, 60(2), 263–308.Tanenhaus, M., Spivey-Knowlton, M., Eberhard, K., & Sedivy, J.1995 Integration of visual and linguistic information in spoken language
comprehension. Science, 268, 1632–1634.
234 Luca Onnis
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Tenenbaum, J.B., and Gri‰ths, T.L.2001 Generalization, similarity, and Bayesian inference, Behavioral and
Brain Sciences, 24, 629–641.Thiessen, E.D.2007 The e¤ect of distributional information on children’s use of pho-
nemic contrasts. Journal of Memory and Language, 56, 16–34.Tokowicz, N., & Warren, T.in press Beginning adult L2 learners’ sensitivity to morphosyntactic vio-
lations: A self-paced reading study.Towell, R., Hawkins, R., & Bazergui, N.1996 The development of fluency in advanced learners of French.
Applied Linguistics, 17, 84–119.Uchida, Y.2010 Measuring knowledge of English Orthotactics in Japanese learners
of English: Towards the establishment of a training scheme for/l/-/r/ Perception. Unpublished thesis for the Advanced GraduateCertificate, Department of Second Language Studies, Universityof Hawai‘i at Manoa.
Vigliocco, G., Vinson, D.P, Lewis, W. & Garrett, M.F.2004 Representing the meanings of object and action words: The
featural and unitary semantic space hypothesis. Cognitive Psy-chology, 48, 422–488.
Williams, J.N.2004 Implicit learning of form-meaning connections. In J. Williams,
B. VanPatten, S. Rott, and M. Overstreet (Eds.), Form MeaningConnections in Second Language Acquisition. Mahwah, NJ:Lawrence Erlbaum Associates. 2004, pp. 203–218.
Yu, C., and Smith, L.B.2007 Rapid word learning under uncertainty via cross-situational sta-
tistics. Psychological Science, 18 (5), 414–420.
Statistical learning and second language acquisition 235
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97
Heruntergeladen am | 04.03.13 15:54
Bereitgestellt von | provisional accountAngemeldet | 212.87.45.97