Statistical learning and L2 learning: Many questions and a few answers Ram Frost The Hebrew University Haskins Laboratories & BCBL and Noam Siegelman- Hebrew U.
Statistical learning and L2 learning: Many questions and a few answers
Ram Frost The Hebrew University
Haskins Laboratories & BCBL and Noam Siegelman- Hebrew U.
L2 literacy acquisition theoretical basic assumptions:
1. The organization of any reading system reflects the
overall structure of the language (Frost, BBS, 2012).
2. L2 literacy is shaped by the statistical properties of the language- the correlations of orthographic, phonologic, and semantic units, which are implicitly (or explicitly) picked up by the readers.
In a nutshell: This theoretical approach assumes that: There is a general cognitive capacity for
statistical learning. Like any human capacity it would have a
normal distribution of individual differences. This capacity predicts, at least to some
extent, individual differences in the ease or difficulty in acquiring the new set of regularities that determine reading in L2 .
(Frost et al. in press, Psych. Science; Wu et al., 2012).
Correlation between VSL and morphological priming Frost et al., 2103 (Psych Science)
Most past and current research on S.L
Studies aim to show how far S.L can go. To explain rule abstraction and complex
behaviours such as word segmentation of a continuous speech stream.
S.L or I.L are implicitly treated as a general entity, and are monitored in a single task chosen by the experimenter. This is perhaps suitable for the above research but not in the domain of individual differences !
The capacity for statistical learning- four theoretical questions:
Is S.L a unified capacity or a unified mechanism responsible for all possible detection of correlations, or is it a componential capacity?
If it is not (and probably it is not) a unified capacity, how are the different components of S.L interrelated?
How does S.L relates to other general cognitive capacities such as intelligence, memory, executive functions, etc?
Is S.L a “stable” capacity of the individual that remains more or less constant across time, such as intelligence, working memory, etc?
Statistical Learning: A preliminary mapping sentence
(Facet theory, Gutman 1959)
Statistical learning is the ability to implicitly
pick up regularities of {verbal/non-verbal} information in the {visual/auditory} modality,
when contingencies are {adjacent/non adj.},
thereby shaping behavior.
Modality
Visual Auditory
Preliminary mapping
Type of Information
Verbal Non-verbal
Type of contingencies
Adjacent Non-adjacent
The present research project
Students of the Hebrew university were tested in a series of experimental tasks that monitored general cognitive abilities and verbal abilities. Participants were also tested with various form of statistical learning tasks that cover some of the SL theoretical space, and these were repeated twice at T1 and T2.
Aims:
1. To examine in a within-subject design how S.L is related to other general cognitive or verbal abilities. This will tell us whether S.L is a subset (nested) of a more general ability such as intelligence or memory.
2. To examine whether performance in a given S.L task predicts performance in another S.L task. This will tell us something about the unity/componentiality of S.L as a cognitive ability.
3. To examine whether S.L is a stable (therefore reliable) capacity of individuals. Reliability is a necessary condition to validity of predictions!
Investigated tasks
Statistical learning tasks VSL (visual modality, non
verbal, adjacent contingencies).
ASL (auditory modality, verbal, adjacent contingencies).
ANA (auditory, non-verbal, adjacent)
AVN (auditory, verbal, non-adjacent)
SRT (Visual, adjacent, probabilistic serial RT)
Cognitive abilities tasks
Raven advanced-(IQ) Digit span Wais-R (WM)
Verbal WM (NITE) Switch task (executive func.)
Syntactic processing Rapid naming (RAN).
Design and procedure
A within-subject design. 4 separate testing sessions. Each participant is tested once in all tasks of cognitive abilities. Each participant is tested twice in all tasks of statistical learning, test-retest. Testing sessions of S.L are separated by at least three months.
24 shapes:
Visual Statistical Learning-VSL task (adapted from Turk-Brown, Jung, & Scholl, 2005)
8 Triplets Foils
1 1
VSL- implicit TPs
0 0
… … The 8 triplets are presented in a random order to create a 10 minutes familiarization stream. Q: Can S’ pick-up the rules regarding the TPs of the visual shapes?
The experimental setting
Familiarization:
Test: Which triplet belongs to the sequence?
The experimental setting
12 words Part-Words
bo de sa
le ka ti
lu ri vo
…
de sa le
ka ti lu
sa lu ri
…
0.5 0.5 0.5 0.187
Endress & Mehler, 2009
Auditory Statistical Learning – Adjacent
The 12 words are presented in a random order to create a 10 minutes familiarization stream.
Familiarization: (10 minutes long…)
Test: Which word belongs to the language?
The experimental setting
12 Triplets Part-Triplets
A B M
J Q L
G Q I
…
B M J
I J Q
…
0.5 0.5 0.5 0.187
Gebhart, Newport & Aslin (2009)
Statistical Learning: Non-Linguistic
Exactly the same as the adjacent linguistic statistical learning experiment, with the only difference that the syllables are replaced with18 non-linguistics noises (A,B … R).
0.187 0.5
Familiarization: (10 minutes long…)
Test: Which triplet belongs to the sequence?
The experimental setting
12 words Part-words
Statistical Learning – Auditory-Non Adjacent
Subjects can extract “words” based upon the non-adjacent statistics: Words have consonant “roots”. Adjacent TP between syllables: p=0.5 Non-adjacent TP between consonant: within words: p=1. between words: p=0.5
pa ve gu
pa vo ga
pi ve ga
pi vo gu
du ka be
du ki bo
da ka bo
da ki be
me tu sa
me ta si
mo tu si
mo ta sa
ve gu da
tu sa du
…
Legal Words Foils – part roots
Statistical Learning - Non Adjacent
During the test, Participants hear legal “words” that are composed of a consonant root and a new vowel patterns, and nonwords that are assembled from “part-roots” and the same vowel patterns, and are requested to choose which belongs to “language”.
pu ve gi
po vi ga
di ku bo
du ke ba
me to sa
ma tu se
vu ge di
bo pi va
gi du ko
ku be ma
te so da
ga mu te
Familiarization: (10 minutes long…)
Test: Which word belongs to the language?
The experimental setting
Probabilistic Serial Learning Task Kaufman et al., 2010
In each trial, stimulus is appearing at one of four locations on the screen, and subjects are asked to press a corresponding key.
Press ‘1’ Press ‘2’ Press ‘3’ Press ‘4’
Probabilistic Serial Learning Task Kaufman et al., 2010
Subjects are unaware that the sequence of the successive stimuli is determined probabilistically by the last two presented stimuli:
• P= 0.85 for sequence A (1-2-1-4-3-2-4-1-3-4-2-3).
• P= 0.15 for sequence B (3-2-3-4-1-2-4-3-1-4-2-1).
p=0.85
p=0.15
Last two stimuli
Distribution of scores in the five S.L tasks
In order for the tasks to reliably predict cognitive abilities, the task has to: Have a level of difficulty that is not at floor
or at ceiling. Have a variance that is large enough to
allow for a wide distribution of individual differences.
Visual Statistical Learning Distribution
0
2
4
6
8
10
12
14
16
18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
n=179
5.56 %), SD=69.4, 32(of 22.2 Mean=
Auditory Statistical Learning (Adjacent) Distribution
n=102
5.64%), SD=59.1, 36(of 21.6 Mean=
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Auditory Statistical Learning (Non-Linguistic, Adjacent) Distribution
n=103
3.33 %), SD=57.1, 36(of 20.56 Mean=
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Auditory Statistical Learning (Linguistic, Non-Adjacent) Distribution
n=102
* 3.98%), SD=57.1, 36(of 20.58 Mean=
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
*but without the two extreme observations: SD = 3.52
Serial Reaction Time (SRT) Distribution
0
2
4
6
8
10
12
-15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
Mean=17.54 ms, SD = 17.36
Q1:
Is statistical learning a subset (nested) of general cognitive abilities?
What are the inter-correlations of scores in the various S.L tasks and the scores
obtained in the cognitive tasks?
Cognitive Tasks
Raven advanced- IQ RAN (dig., let., obj.)- Output speed Digit span- WM Verbal working memory Syntactic processing- verbal abilities Switch task- Executive functions
If S.L. is an independent theoretical construct, we should expect it not to be nested within a given faculty (very small correlations), with some correlation with intelligence.
Correlations of VSL and tests of general cognitive capacities
VWM DIGIT SPAN
Raven’s Advanced Matrices
Switch Task
Syntactic Processing
VSL
0.194* (n=178, p=0.01)
0.151 (n=103)
0.271* (n=89,
p=0.01)
-0.163 (n=44)
0.029 (n=44)
RAN - objects
RAN - letters
RAN - digits
VSL 0.147 (n=127)
0.023 (n=127)
0.103 (n=127)
(all other p’s > 0.05)
Correlations of ASL (adjacent) and tests of general cognitive capacities
VWM DIGIT SPAN
Raven’s Advanced Matrices
Switch Task
Syntactic Processing
ASL
0.051 (n=102)
-0.021 (n=102)
0.039 (n=89)
-0.018 (n=48)
0.042 (n=44)
RAN - objects
RAN - letters
RAN - digits
ASL -0.060 (n=48)
0.075 (n=48)
0.072 (n=48)
all p’s > 0.2
Correlations of AVN (non-adjacent) and tests of general cognitive capacities
VWM DIGIT SPAN
Raven’s Advanced Matrices
Switch Task
Syntactic Processing
AVN -0.085 (n=102)
-0.116 (n=102)
-0.046 (n=89)
0.060 (n=48)
0.001 (n=44)
RAN - objects
RAN - letters
RAN - figures
AVN -0.152 (n=48)
0.033 (n=48)
-0.046 (n=48)
all p’s > 0.2
Correlations of ANA (non-linguistic) and tests of general cognitive capacities
VWM DIGIT SPAN
Raven’s Advanced Matrices
Switch Task
Syntactic Processing
ANA -0.084 (n=102)
0.039 (n=102)
0.129 (n=89)
0.096 (n=48)
0.149 (n=44)
RAN - objects
RAN - letters
RAN - figures
ANA 0.039 (n=48)
-0.065 (n=48)
0.080 (n=48)
all p’s > 0.2
Correlations of SRT and tests of general cognitive capacities
VWM DIGIT SPAN
Raven’s Advanced Matrices
Switch Task
Syntactic Processing
SRT -0.090 0.081 0.165 0.150 -0.113 (n=44)
RAN - objects
RAN - letters
RAN - figures
SRT -0.034 -0.166 -0.128
all other n’s = 48, all p’s > 0.2
Q2:
Is statistical learning a unified or componential Capacity?
What are the inter-correlations of scores in the various S.L tasks?
Correlations between the different Statistical Learning Tasks
ASL- non
linguistic
ASL- non
adjacent
ASL- adjacent
VSL
0.080 -0.191 0.084 *** VSL
0.009 -0.056 *** ASL- adjacent
-0.132 *** ASL- non
adjacent
*** ASL- Non
linguistic
n=102,
all p’s > 0.05
Correlations between SL tasks and SRT
ASL- Non
linguistic
ASL- non adj
ASL- adj VSL
-0.001 0.131 -0.189 0.112 SRT
n=48, all p’s > 0.2
Q3:
Is statistical learning a reliable and stable capacity?
Is there a test-retest reliability ?
Test-Retest Reliability - VSL
0
4
8
12
16
20
24
28
32
0 4 8 12 16 20 24 28 32
n=36, r=0.7
Test-Retest Reliability - ASL
n=38, r=0.69
0
4
8
12
16
20
24
28
32
36
0 4 8 12 16 20 24 28 32 36
Test-Retest Reliability – ASL Non-Adjacent
n=39, r=0.6
0
4
8
12
16
20
24
28
32
36
0 4 8 12 16 20 24 28 32 36
Test-Retest Reliability – ASL Non-linguistic
n=40, r=0.06
0
4
8
12
16
20
24
28
32
36
0 4 8 12 16 20 24 28 32 36
Test-Retest Reliability – SRT
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
-20.00 0.00 20.00 40.00 60.00 80.00 100.00
n=44, r=0.28
Is there a learning effect from T1 to T2?
Conclusions- S.L tasks
1. Not all tasks that monitor S.L as an individual ability are equally good in terms of performance distribution (variance) and test-retest reliability.
2. From all the tasks examined in our study, VSL seems to provide the best fit. It has a normal distribution of performance, it is reliable, it has a small but significant correlation with intelligence, and it does seem to predict success or failure in L2 literacy (e.g., Frost et al, Psych Science).
3. Tasks that do not allow for test-retest reliability should be avoided when individual differences are concerned (SRT, non-linguistic auditory sounds).
General conclusions.
1. S.L does not seem to be a unified ability. Individuals may be good in detecting correlations in one context and not as good in another.
2. S.L does not seem to be a subset of intelligence or WM, although some task correlate with intelligence to some extent.
3. Detection of correlations and regularities in a given context seems to be a stable and reliable individual ability.
4. Our research so far shows that individual differences in detecting transitional probabilities of adjacent shapes in the visual modality are the best predictor of L2 literacy acquisition.
More questions to be answered:
Is there something specific about the correlation of VSL with reading (language in the visual modality), or does VSL correlate with other aspects of L2 learning?
If S.L underlies L2 learning why are there some statistical properties of the language that are consistently more difficult to learn? (e.g., Gender agreement, Numbers…)
What characterizes these general difficulties? Do they have something in common?
Will a comprehensive theory of L2 learning be a mix of domain specificity and general ability?
Thanks to all members of the Laboratory for Verbal Information Processing
But mostly, this work is done by Noam Siegelman ! Funding: NICHD (RO1 HD 067364, PO1 HD 01994) ISF (159/10)