Statistical learning and L2 learning: Many questions and a ...

Statistical learning and L2 learning: Many questions and a few answers

Ram Frost The Hebrew University

Haskins Laboratories & BCBL and Noam Siegelman- Hebrew U.

L2 literacy acquisition theoretical basic assumptions:

1. The organization of any reading system reflects the

overall structure of the language (Frost, BBS, 2012).

2. L2 literacy is shaped by the statistical properties of the language- the correlations of orthographic, phonologic, and semantic units, which are implicitly (or explicitly) picked up by the readers.

In a nutshell: This theoretical approach assumes that: There is a general cognitive capacity for

statistical learning. Like any human capacity it would have a

normal distribution of individual differences. This capacity predicts, at least to some

extent, individual differences in the ease or difficulty in acquiring the new set of regularities that determine reading in L2 .

(Frost et al. in press, Psych. Science; Wu et al., 2012).

Correlation between VSL and morphological priming Frost et al., 2103 (Psych Science)

Most past and current research on S.L

Studies aim to show how far S.L can go. To explain rule abstraction and complex

behaviours such as word segmentation of a continuous speech stream.

S.L or I.L are implicitly treated as a general entity, and are monitored in a single task chosen by the experimenter. This is perhaps suitable for the above research but not in the domain of individual differences !

The capacity for statistical learning- four theoretical questions:

Is S.L a unified capacity or a unified mechanism responsible for all possible detection of correlations, or is it a componential capacity?

If it is not (and probably it is not) a unified capacity, how are the different components of S.L interrelated?

How does S.L relates to other general cognitive capacities such as intelligence, memory, executive functions, etc?

Is S.L a “stable” capacity of the individual that remains more or less constant across time, such as intelligence, working memory, etc?

Statistical Learning: A preliminary mapping sentence

(Facet theory, Gutman 1959)

Statistical learning is the ability to implicitly

pick up regularities of {verbal/non-verbal} information in the {visual/auditory} modality,

when contingencies are {adjacent/non adj.},

thereby shaping behavior.

Modality

Visual Auditory

Preliminary mapping

Type of Information

Verbal Non-verbal

Type of contingencies

Adjacent Non-adjacent

The present research project

Students of the Hebrew university were tested in a series of experimental tasks that monitored general cognitive abilities and verbal abilities. Participants were also tested with various form of statistical learning tasks that cover some of the SL theoretical space, and these were repeated twice at T1 and T2.

Aims:

1. To examine in a within-subject design how S.L is related to other general cognitive or verbal abilities. This will tell us whether S.L is a subset (nested) of a more general ability such as intelligence or memory.

2. To examine whether performance in a given S.L task predicts performance in another S.L task. This will tell us something about the unity/componentiality of S.L as a cognitive ability.

3. To examine whether S.L is a stable (therefore reliable) capacity of individuals. Reliability is a necessary condition to validity of predictions!

Investigated tasks

Statistical learning tasks VSL (visual modality, non

verbal, adjacent contingencies).

ASL (auditory modality, verbal, adjacent contingencies).

ANA (auditory, non-verbal, adjacent)

AVN (auditory, verbal, non-adjacent)

SRT (Visual, adjacent, probabilistic serial RT)

Cognitive abilities tasks

Raven advanced-(IQ) Digit span Wais-R (WM)

Verbal WM (NITE) Switch task (executive func.)

Syntactic processing Rapid naming (RAN).

Design and procedure

A within-subject design. 4 separate testing sessions. Each participant is tested once in all tasks of cognitive abilities. Each participant is tested twice in all tasks of statistical learning, test-retest. Testing sessions of S.L are separated by at least three months.

24 shapes:

Visual Statistical Learning-VSL task (adapted from Turk-Brown, Jung, & Scholl, 2005)

8 Triplets Foils

1 1

VSL- implicit TPs

0 0

… … The 8 triplets are presented in a random order to create a 10 minutes familiarization stream. Q: Can S’ pick-up the rules regarding the TPs of the visual shapes?

The experimental setting

Familiarization:

Test: Which triplet belongs to the sequence?


12 words Part-Words

bo de sa

le ka ti

lu ri vo

…

de sa le

ka ti lu

sa lu ri

…

0.5 0.5 0.5 0.187

Endress & Mehler, 2009

Auditory Statistical Learning – Adjacent

The 12 words are presented in a random order to create a 10 minutes familiarization stream.

Familiarization: (10 minutes long…)

Test: Which word belongs to the language?


12 Triplets Part-Triplets

A B M

J Q L

G Q I

…

B M J

I J Q

…

0.5 0.5 0.5 0.187

Gebhart, Newport & Aslin (2009)

Statistical Learning: Non-Linguistic

Exactly the same as the adjacent linguistic statistical learning experiment, with the only difference that the syllables are replaced with18 non-linguistics noises (A,B … R).

0.187 0.5


Test: Which triplet belongs to the sequence?


12 words Part-words

Statistical Learning – Auditory-Non Adjacent

Subjects can extract “words” based upon the non-adjacent statistics: Words have consonant “roots”. Adjacent TP between syllables: p=0.5 Non-adjacent TP between consonant: within words: p=1. between words: p=0.5

pa ve gu

pa vo ga

pi ve ga

pi vo gu

du ka be

du ki bo

da ka bo

da ki be

me tu sa

me ta si

mo tu si

mo ta sa

ve gu da

tu sa du

…

Legal Words Foils – part roots

Statistical Learning - Non Adjacent

During the test, Participants hear legal “words” that are composed of a consonant root and a new vowel patterns, and nonwords that are assembled from “part-roots” and the same vowel patterns, and are requested to choose which belongs to “language”.

pu ve gi

po vi ga

di ku bo

du ke ba

me to sa

ma tu se

vu ge di

bo pi va

gi du ko

ku be ma

te so da

ga mu te


Test: Which word belongs to the language?


Probabilistic Serial Learning Task Kaufman et al., 2010

In each trial, stimulus is appearing at one of four locations on the screen, and subjects are asked to press a corresponding key.

Press ‘1’ Press ‘2’ Press ‘3’ Press ‘4’

Probabilistic Serial Learning Task Kaufman et al., 2010

Subjects are unaware that the sequence of the successive stimuli is determined probabilistically by the last two presented stimuli:

• P= 0.85 for sequence A (1-2-1-4-3-2-4-1-3-4-2-3).

• P= 0.15 for sequence B (3-2-3-4-1-2-4-3-1-4-2-1).

p=0.85

p=0.15

Last two stimuli

Distribution of scores in the five S.L tasks

In order for the tasks to reliably predict cognitive abilities, the task has to: Have a level of difficulty that is not at floor

or at ceiling. Have a variance that is large enough to

allow for a wide distribution of individual differences.

Visual Statistical Learning Distribution

0

2

4

6

8

10

12

14

16

18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

n=179

5.56 %), SD=69.4, 32(of 22.2 Mean=

Auditory Statistical Learning (Adjacent) Distribution

n=102

5.64%), SD=59.1, 36(of 21.6 Mean=

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Auditory Statistical Learning (Non-Linguistic, Adjacent) Distribution

n=103

3.33 %), SD=57.1, 36(of 20.56 Mean=

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Auditory Statistical Learning (Linguistic, Non-Adjacent) Distribution

n=102

* 3.98%), SD=57.1, 36(of 20.58 Mean=

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

*but without the two extreme observations: SD = 3.52

Serial Reaction Time (SRT) Distribution

0

2

4

6

8

10

12

-15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

Mean=17.54 ms, SD = 17.36

Q1:

Is statistical learning a subset (nested) of general cognitive abilities?

What are the inter-correlations of scores in the various S.L tasks and the scores

obtained in the cognitive tasks?

Cognitive Tasks

Raven advanced- IQ RAN (dig., let., obj.)- Output speed Digit span- WM Verbal working memory Syntactic processing- verbal abilities Switch task- Executive functions

If S.L. is an independent theoretical construct, we should expect it not to be nested within a given faculty (very small correlations), with some correlation with intelligence.

Correlations of VSL and tests of general cognitive capacities

VWM DIGIT SPAN

Raven’s Advanced Matrices

Switch Task

Syntactic Processing

VSL

0.194* (n=178, p=0.01)

0.151 (n=103)

0.271* (n=89,

p=0.01)

-0.163 (n=44)

0.029 (n=44)

RAN - objects

RAN - letters

RAN - digits

VSL 0.147 (n=127)

0.023 (n=127)

0.103 (n=127)

(all other p’s > 0.05)

Correlations of ASL (adjacent) and tests of general cognitive capacities

VWM DIGIT SPAN


Switch Task


ASL

0.051 (n=102)

-0.021 (n=102)

0.039 (n=89)

-0.018 (n=48)

0.042 (n=44)

RAN - objects

RAN - letters

RAN - digits

ASL -0.060 (n=48)

0.075 (n=48)

0.072 (n=48)

all p’s > 0.2

Correlations of AVN (non-adjacent) and tests of general cognitive capacities

VWM DIGIT SPAN


Switch Task


AVN -0.085 (n=102)

-0.116 (n=102)

-0.046 (n=89)

0.060 (n=48)

0.001 (n=44)

RAN - objects

RAN - letters

RAN - figures

AVN -0.152 (n=48)

0.033 (n=48)

-0.046 (n=48)

all p’s > 0.2

Correlations of ANA (non-linguistic) and tests of general cognitive capacities

VWM DIGIT SPAN


Switch Task


ANA -0.084 (n=102)

0.039 (n=102)

0.129 (n=89)

0.096 (n=48)

0.149 (n=44)

RAN - objects

RAN - letters

RAN - figures

ANA 0.039 (n=48)

-0.065 (n=48)

0.080 (n=48)

all p’s > 0.2

Correlations of SRT and tests of general cognitive capacities

VWM DIGIT SPAN


Switch Task


SRT -0.090 0.081 0.165 0.150 -0.113 (n=44)

RAN - objects

RAN - letters

RAN - figures

SRT -0.034 -0.166 -0.128

all other n’s = 48, all p’s > 0.2

Q2:

Is statistical learning a unified or componential Capacity?

What are the inter-correlations of scores in the various S.L tasks?

Correlations between the different Statistical Learning Tasks

ASL- non

linguistic

ASL- non

adjacent

ASL- adjacent

VSL

0.080 -0.191 0.084 *** VSL

0.009 -0.056 *** ASL- adjacent

-0.132 *** ASL- non

adjacent

*** ASL- Non

linguistic

n=102,

all p’s > 0.05

Correlations between SL tasks and SRT

ASL- Non

linguistic

ASL- non adj

ASL- adj VSL

-0.001 0.131 -0.189 0.112 SRT

n=48, all p’s > 0.2

Q3:

Is statistical learning a reliable and stable capacity?

Is there a test-retest reliability ?

Test-Retest Reliability - VSL

0

4

8

12

16

20

24

28

32

0 4 8 12 16 20 24 28 32

n=36, r=0.7

Test-Retest Reliability - ASL

n=38, r=0.69

0

4

8

12

16

20

24

28

32

36

0 4 8 12 16 20 24 28 32 36

Test-Retest Reliability – ASL Non-Adjacent

n=39, r=0.6

0

4

8

12

16

20

24

28

32

36

0 4 8 12 16 20 24 28 32 36

Test-Retest Reliability – ASL Non-linguistic

n=40, r=0.06

0

4

8

12

16

20

24

28

32

36

0 4 8 12 16 20 24 28 32 36

Test-Retest Reliability – SRT

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

-20.00 0.00 20.00 40.00 60.00 80.00 100.00

n=44, r=0.28

Is there a learning effect from T1 to T2?

Conclusions- S.L tasks

1. Not all tasks that monitor S.L as an individual ability are equally good in terms of performance distribution (variance) and test-retest reliability.

2. From all the tasks examined in our study, VSL seems to provide the best fit. It has a normal distribution of performance, it is reliable, it has a small but significant correlation with intelligence, and it does seem to predict success or failure in L2 literacy (e.g., Frost et al, Psych Science).

3. Tasks that do not allow for test-retest reliability should be avoided when individual differences are concerned (SRT, non-linguistic auditory sounds).

General conclusions.

1. S.L does not seem to be a unified ability. Individuals may be good in detecting correlations in one context and not as good in another.

2. S.L does not seem to be a subset of intelligence or WM, although some task correlate with intelligence to some extent.

3. Detection of correlations and regularities in a given context seems to be a stable and reliable individual ability.

4. Our research so far shows that individual differences in detecting transitional probabilities of adjacent shapes in the visual modality are the best predictor of L2 literacy acquisition.

More questions to be answered:

Is there something specific about the correlation of VSL with reading (language in the visual modality), or does VSL correlate with other aspects of L2 learning?

If S.L underlies L2 learning why are there some statistical properties of the language that are consistently more difficult to learn? (e.g., Gender agreement, Numbers…)

What characterizes these general difficulties? Do they have something in common?

Will a comprehensive theory of L2 learning be a mix of domain specificity and general ability?

Thanks to all members of the Laboratory for Verbal Information Processing

But mostly, this work is done by Noam Siegelman ! Funding: NICHD (RO1 HD 067364, PO1 HD 01994) ISF (159/10)

Statistical learning and L2 learning: Many questions and a ...

Documents