Top Banner
How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951
27

How much information does a language have?

Feb 09, 2016

Download

Documents

cybill

How much information does a language have?. Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951. Motivation/ Skills. Redundancy. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How much information does a language have?

How much information does a language have?

• Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951

Page 2: How much information does a language have?

Motivation/Skills

Page 3: How much information does a language have?

RedundancyThe redundancy of ordinary English, not

considering statistical structure over greater distances than about eight letters, is roughly 50%. This means that when we write

En_ _ _sh ha_f o_ w_ _t w_ w_ _te i_ dete_ _ _ _e_ b_ t_e str_ct_r_ _ f _ _ _ lang_ _ _ _ a_d H_ _f i_ c_os_n fre_ _ _

Redundancy =1-H/Hmax

Page 4: How much information does a language have?
Page 5: How much information does a language have?

Entropy

How much information is produced on average

for each letter

i

ii ppH log

27

1

logi

ii ppH

Page 6: How much information does a language have?

‘L’Evêqe en effet est très streect: le clergé, de temps en temps, se permet de révéler ses préférences envers des

‘événements’ frenchement débreedés, mets l’évêqe hème qe ses fêtes respectent des règles sévères et les

trensgresser, c’est fréqemment reesqer de se fère relegger’.

Saisi par l'inspiration, il composa illico un lai, qui, suivant la tradition du Canticum Canticorum Salomonis,

magnifiait l'illuminant corps d'Anastasia : Ton corps, un grand galion où j'irai au long-cours, un sloop, un brigantin tanguant sous mon roulis, Ton front, un fort dont j'irai à

l'assaut, un bastion, un glacis qui fondra sous l'aquilon du transport qui m'agit,

Page 7: How much information does a language have?

>

0.131E0.105T0.082A0.08O

0.071N0.068R0.063I0.061S0.053H0.038D0.034L0.029F0.028C0.025M0.025U0.02G0.02Y0.02P

0.015W0.014B0.009V0.004K0.002X0.001J0.001Q8E-04Z

13.4EnglishS

01.4SpanishS

Page 8: How much information does a language have?
Page 9: How much information does a language have?

How much information is obtained by adding one

letter?

ji

ii jbpLogjbp,

2 ),(),(

S E

i

ii bpLogbp )()( 2 NF

E

NN FLimH

0.131E

0.105T

0.082A

0.002X0.001J0.001Q8E-04Z

SE

Page 10: How much information does a language have?

FnBits per letter

F04.75

F14.03

F23.32

F33.1

Page 11: How much information does a language have?

3 order

IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF

DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

Page 12: How much information does a language have?

#WordProbability

1The.071

2of.034

3and.03

Vocabulary size (no. lemmas)

%of content in OEC

Example lemmas

10 25% the, of, and, to, that, have

100 50% from, because, go, me, our, well, way

1000 75% girl, win, decide, huge, difficult, series

7000 90% tackle, peak, crude, purely, dude, modest

50,000 95% saboteur, autocracy, calyx, conformist

>1,000,000 99% laggardly, endobenthic, pomological

Page 13: How much information does a language have?

#WordProbability

1The.071

2of.034

3and.03

Page 14: How much information does a language have?

Zipf’s Law

82.11wordF

nPn

1.

11

nnP 1

8727

1

n

nP

62.2LengthFword

Page 15: How much information does a language have?

Is English trying to warn us?

992-995 America ensure oil opportunity

2629-2634 bush admit specifically agents smell denied

16047-16048 arafat unhealthy

#WordProbability

1The.071

2of.034

3and.03

Page 16: How much information does a language have?

How to continue?Aoccdrnig to rseearch at an Elingsh uinervtisy, it deosn't

mttaer in waht oredr the ltteers in a wrod are, the

olny iprmoatnt tihng is that the frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can

sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by

it slef but the wrod as a wlohe.

Page 17: How much information does a language have?

Revealing the statistic of the language

• Q….. 2034 words start with q

• ….q 8 words finish with q

….q …. Ira0q q0.1

Page 18: How much information does a language have?

Revealing the statistic of the language

THERE IS NO REVERSE ON A MOT0RCYCLE 1115112112111511711121321227111141111131

FRlEND 0F MINE FOUND THIS OUT861311111111111621111112111111

RATHER DRAMATICALLY THE OTHER DAY41111111151111111111161111111111111R R R R4 1 1 1

Page 19: How much information does a language have?

# of times guessedPosition of the guessed letter

Page 20: How much information does a language have?

What is the probability to find the number 1 in the third position?

THE 1 1 1REV1 1 5ERS1 1 2MOT1 1 2THA112

Page 21: How much information does a language have?

THE 1 1 1ANT3 1 3ERS1 1 2MOT1 1 2HER222

THA 1 1 2

HEN1 1 3ERS1 1 2TH_

1 1 3AN_312

HE_ 2 2 1REV1 1 5ERS1 1 2MOT1 1 2AND311

11 ...

11 ),,...,(Nii

NNi jiipq

21

),,( 2131

ii

jiipq

LASCUProbability to find the number I in

the place N

21

),,( 2132

ii

jiipq

Page 22: How much information does a language have?

BoundsTHERE IS NO REVERSE ON A

MOT0RCYCLE

F0 (all the letter have the

same probability)

F1(each letter has its own

probability)

F2(correlation of two letters)

1115112112111511711121321227111141111131

F0 (all the numbers have the

same probability)

F1(each number has its own

probability)

F2(correlation of two

numbers) FN

27

1i

Ni

Ni Logqq

Page 23: How much information does a language have?

Bounds

27

1i

Ni

NiN LogqqF

Page 24: How much information does a language have?

27

11 )()(

iN

Ni

Ni FiLogqqi

Entropy

Page 25: How much information does a language have?
Page 26: How much information does a language have?

4

47

17

13

3

1

5

3

4

47

17

13

1.3

1.3

1.3

1.3

4

.47

.17

.13

0.013

0.013

0.013

0.013

41q

42q

444 ii LogqqF

27

1

41

4 )(i

ii Logiqqi

Page 27: How much information does a language have?

Bounds

Redundancy ~ 75%FnBits per letterF04.75F14.03F23.32F33.1