Monica Tamariz Richard Shillcock monica@ling.ed.ac.uk rcs@cogsci.ed.ac.uk

Post on 12-Jan-2016

32 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Real World Constraints on the Mental Lexicon: Assimilation, the Speech Lexicon and the Information Structure of Spanish Words. Monica Tamariz Richard Shillcock monica@ling.ed.ac.uk rcs@cogsci.ed.ac.uk. Overview. Use information profiles of word systems (corpus, lexicons). - PowerPoint PPT Presentation

Transcript

Real World Constraints on the Mental Lexicon: Assimilation,

the Speech Lexicon and the Information Structure of Spanish

Words.Monica Tamariz Richard Shillcock

monica@ling.ed.ac.uk rcs@cogsci.ed.ac.uk

Overview

• Use information profiles of word systems (corpus, lexicons).

• More realistic representations of speech generate flatter profiles.

• Flatter profiles reflect more efficient use of the representational space.

Assumptions

• Phonology plays a part in the organization of the mental lexicon.

• For maximal efficiency, information should be spread as evenly as possible over the representational space.

Distribution of information over the representational space

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

representational space segment

info

rmat

ion:

ent

ropy

Entropy

• Concept of entropy from information theory (Shannon, 1948).

• Measure of the uncertainty or informationH = - (pi · log pi)

• Redundancy R = 1 - H

Data sets

• Speech corpusSpeech corpus: 707,000 word tokens.

• Speech lexiconSpeech lexicon: 42,000 word types.

• Dictionary lexiconDictionary lexicon: 28,000 headwords.

Transcriptions

• Citation (30 phonemes)

• Fast-speech(50 phonemes

and allophones)

Fast-speech transcription

• Glides

• Approximant B D G

• Consonant assimilatione.g. []

[z][dental n],

[dental l]

ORTHO. CITATION FAST-SPEECH

admitir admitIr amitIrcarnets karnEts kanEscolgado kolgAdo kol Aoquitarle kitArle kitAlegracias grAzias Aziagaitero gaitEro gatEro

Transcriptions

The Information Profile

Calculate entropy H = - (pi · log pi)

Words Count of phonemes in each segment position1 2 3 4 5 6 7 P1 P2 P3 P4 P5 P6 P7

1 a d m i t I r a 4189 6917 1533 1741 258 3923 92002 t e r m I n a b 1789 947 2020 2918 1547 358 53 k a r n E t s d 3075 164 1496 980 2066 4350 2324 k o l g A d o e 4481 8918 1003 1792 284 3427 38665 m o m E n t o f 1104 169 766 131 26 12 26 f a m I l i a g 1110 169 1739 677 285 654 257 k i t A r l e i 1553 4278 1850 2561 4430 2129 1068 t e n E m o s k 3856 1075 1911 567 745 1364 599 g r A z i a s l 567 1450 2207 994 1562 1612 126810 t o d a b I a m 3540 972 4688 2668 3294 668 8etc etc

T. 45559 45559 45559 45559 45559 45559 45559

The Information profileCorpus. Citation

y = -0.0256x + 0.8941

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7segment position

Slope (-m) Mean level of entropy (Hrel)

The LERR principle

(Levelling effect of realistic representations)

“Processes that make

the representation of words

more accurate

will flatten the information profiles”

The effect of the transcription:Information profile slopes

• Fast speech has flatter profiles (as in other languages)

• Longer words have flatter profiles

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

4 5 6 7

slop

e (-

m)

CitationFast-speech

The effect of the transcription: level of entropy.

• More entropy in the citation transcription.

• Fast speech is more redundant and thus, more predictable. 0.6

0.65

0.7

0.75

0.8

4 5 6 7

Hre

l

Citation

Fast-speech

The Speech Lexicon: Information profile slopes.

• Speech Lexicon: the active mental lexicon represented in the brain.

• The speech lexicon has flatter profiles.

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

4 5 6 7

slop

e (-

m)

Dict. Lex.Speech Lex.

The Speech Lexicon:Level of entropy

• Speech lexicon: low redundancy levels.

• Level varies little across word lengths.

• Support for Butterworth ‘Full Listing Hypothesis’.

0.6

0.65

0.7

0.75

0.8

4 5 6 7

Hre

l

Dict. Lex.

Speech Lex.

Corpus vs. Lexicon: Information profile slopes

• Corpus: representation over time.

• Lexicon: representation over space.

• The lexicon yields flatter profiles.

00.010.020.030.040.050.060.07

4 5 6 7

w ord length

slop

e (-

m)

CorpusLexicon

Corpus vs. Lexicon: Level of entropy.

• The lexicon generates higher entropy levels.

0.6

0.65

0.7

0.75

0.8

4 5 6 7

w ord lengthH

rel

Corpus

Lexicon

Discussion

• Fast-speech rules and a ‘Full List’ mental lexicon flatten the information profile.

• In the speech lexicon, the main constraint is efficiency of storage.

• In the corpus, other constraints - such as lexical segmentation - interact with the optimization of communication.

Conclusion

• This simple analysis of the information profile of word systems is a useful tool that can provide insights into the validity of psycholinguistic theories.

Real World Constraints on the Mental Lexicon: Assimilation,

the Speech Lexicon and the Information Structure of Spanish

Words.Monica Tamariz Richard Shillcock

monica@ling.ed.ac.uk rcs@cogsci.ed.ac.uk

top related