Section 2.7: The Friedman and Kasiski Tests Practice HW (not to hand in) From Barr Text p. 1-4, 8.

Section 2.7: The Friedman and Kasiski Tests

Practice HW (not to hand in)

From Barr Text

p. 1-4, 8

• Using the probability techniques discussed in the last section, in this section we will develop a probability based test that will be used to provide an estimate of the keyword length used to encipher a message with the Vigenere cipher. We also develop another test designed to estimate the keyword length that is based on the coincidental alignment of letter groups in the plaintext with the keyword. We first develop some facts concerning probability of letters occurring in standard English.

Probability of Selecting Multiple Letters in Standard English

• In the standard English frequency table for letters, the probability of selecting a single letter list is the relative frequency converted to decimal, that is

100

tablein the foundletter theoffrequency relative

English standardin

letter single a selecing

P

http://www.radford.edu/~npsigmon/courses/cryptography/letterfrequencies.doc

Example 1: Using the standard English

frequency table, what is the probability of

selecting an E? A X?

Solution:



Example 2: In a large sample of English text,

estimate the probability of selecting two E’s. Two

A’s.

Solution:

For convenience, we will assign the variables

to represent the probabilities of selecting the

letters A, B, C, D, E, …, Y, Z from the standard

English alphabet. The subscripts of the variables

correspond to the MOD 26 alphabet assignment

number of the corresponding alphabet letter. We

will use this variable assignment in the next

example.

2524432 ,10 ,, , , , ppppppp

Example 3: What is the probability that two

randomly selected English letters are the same?

Solution: Using the standard English frequency

table, we see that



0.065

)00074.0(

)01974.0( )00150.0( )02360.0( )00978.0( )02758.0(

)09056.0( )06327.0( )05987.0( )00095.0( )01929.0(

)07507.0( )06749.0( )02406.0( )04025.0( )00772.0(

)00153.0( )06966.0( )06094.0( )02015.0( )02228.0(

)12702.0()04253.0()02782.0()01492.0()08167.0(

s) Z'2(s)Y' 2(s)C' 2(s)B' 2(s)A' 2(same theare

letters two

2

22222

22222

22222

22222

22222

225

224

22

21

20

ppppp

PPPPPP

█

Friedman Test

• The Friedman Test is a probabilistic test that can be used to determine the likelihood that the ciphertext message produced comes from a monoalphabetic or polyalphabetic cipher. This technique of cryptanalysis was developed in 1925 by William Friedman.

http://encyclopedia.thefreedictionary.com/William%20Friedman

• If the cipher is a polyalphabetic Vigenere encipherment, Friedman’s test is also useful in approximating the length of the keyword used. To show how this works, we start with the following definition:

Definition: Index of Coincidence.

Denoted by I, the index of coincidence represents

the probability that two randomly selected letters

are identical.

Index of Coincidence for Monoalphabetic Ciphers

In monoalphabetic ciphers, the frequencies of

letters in standard English are preserved when

converting from plaintext to ciphertext. The

following example illustrates why this is true.

Example 4: Illustrate why the Caesar shift cipher

preserves frequencies when converting from plain

the ciphertext.

Solution:

Recall the index of coincidence represents the

probability that two randomly selected letters are

identical. Since monoalphabetic ciphers

preserves frequencies, the index of coincidence

of the plaintext alphabet of standard English will

be exactly the same as the index of coincidence

of the ciphertext alphabet for a monoalphabetic

cipher. Using the result from Example 3, this fact

results in the following statement:

Index of Coincidence for Monoalphabetic Ciphers

065.0same theare letters

selectedrandomly Two

Ciphers eticMonoalphabFor

eCoincidenc ofIndex

PI

Index of Coincidence for Polyalphabetic Ciphers

• In a polyalphabetic cipher, the goal is to distribute the letter frequencies so that each letters has the same likelihood of occurring in the ciphertext. The next example determines what the index of coincidence is for a polyalphabetic cipher for a large collection of letters.

Example 5: Determine the probability that two

randomly selected letters are identical of the

ciphertext of a message enciphered with a

polyalphabetic cipher, assuming there are a very

large number of letters in the ciphertext.

Solution:

Since the index of coincidence represents the

probability that two randomly selected letters are

identical, Example 5 allows us to make the

following statement:

Index of Coincidence for Polyalphabetic Ciphers

0385.0same theare letters

selectedrandomly Two

Ciphers eticPolyalphabFor

eCoincidenc ofIndex

PI

The index of coincidence values for

monoalphabetic (0.065) and polyalphabetic

ciphers (0.0385) were derived assuming that the

plaintext message has a very large number of

letters. When messages are enciphered and

deciphered, these messages are normally

much shorter. Hence, the index of coincidence for

a typical enciphered message enciphered will be

bounded somewhere between 0.0385 and 0.065.

This leads to the following statement:

Index of Coincidence Bound

For a typical ciphertext message, the index of

coincidence I satisfies:

Fact: If I is close to 0.0385, then the cipher is

likely to have been obtained from a

polyalphabetic cipher. If I is closer to 0.065, the

cipher is likely to be monoalphabetic.

065.00385.0 I

Knowing what the value for the index of

coincidence tells us, we now need to derive a

formula for calculating it. Before doing this, we

need to recall the following fact concerning

summation notation:

FactSummation notation is a shorthand notation in

mathematics for indicating the sum of many

terms. We say that

represents the sum of k terms , ,

where the index i starts at the first term (i = 1) and

we sum until we reach the upper index k of the

summation symbol.

k

iki aaaa

121

kaaa , , , 21

Example 6: Compute

.

Solution:

5

1

2

i

i

Derivation of Formula for the Index of Coincidence for a Given Ciphertext Message

Suppose a ciphertext message is received. Let

be the counts of the

number of occurrences of the alphabet letters A,

B, C, …, Y, Z that occur in the ciphertext (note

that the subscript of each variable corresponds to

the MOD 26 alphabet assignment number of the

corresponding letter). Suppose

2524210 , , , , , nnnnn

represents the total sum of all of the letters in the

ciphertext. Recall that the index of coincidence

represents the probability that two randomly

selected letters are identical. Using the

multiplication principle of probability for n total

letters, we can compute the following probabilities

for each individual letter:

2524210 letters ofnumber Total nnnnnn

)1(

)1(

)1(

)1(

selected

randomly are sA' 2 0000

nn

nn

n

n

n

nP

)1(

)1(

)1(

)1(

selected

randomly are sB' 2 1111

nn

nn

n

n

n

nP

)1(

)1(

)1(

)1(

selected

randomly are sC' 2 2222

nn

nn

n

n

n

nP

)1(

)1(

)1(

)1(

selected

randomly are sY' 2 24242424

nn

nn

n

n

n

nP

)1(

)1(

)1(

)1(

selected

randomly are s Z'2 25252525

nn

nn

n

n

n

nP

Since these probabilities are mutually exclusive,

we can sum the individual probabilities for each

letters to find the index on coincidence.

We summarize this result.

25

0

25252424221100

25252424221100

)1()1(

1

)1()1()1()1()1()1(

1

)1(

)1(

)1(

)1(

)1(

)1(

)1(

)1(

)1(

)1(

same theare selected

randomly letters 2

eCoincidenc

ofIndex

iii nnnn

nnnnnnnnnnnn

nn

nn

nn

nn

nn

nn

nn

nn

nn

nn

P

Formula for the Index of Coincidence

The index of coincidence I is given by the formula:

where n represents the total number of letters in the ciphertext and represents the number of letters corresponding to each individual letter in the ciphertext with , , etc.

)1()1()1()1()1()1(

1

)1()1(

1

25252424221100

25

0

nnnnnnnnnnnn

nnnn

Ii

ii

25 , 1, ,0 , ini

sA' ofnumber the0 n sB' ofnumber the1 n sC' ofnumber the2 n

Example 7: Suppose we receive the message"HLUBN WFSFK IGIHM GBSIM MBSEJ MAFUT QECII LJSUB BAXMA JCWXC MBSGZ GGSMK BHUQB ETVUS MLMER CFDTW UBASW ERFIE LOMVY SIMMY YEDDM MSGZA NCOFY YTIHL JRYOH KLOFH IEFKQ OFAAI ZGIEJ HAKNZ JSQRU QXDKW HSNNF AOUMO ROFAA IZPIQ YHQFY SWEFK ILDPQ GIXUE ADFWN NFVYO TRXRG QKRUS HVYHA GYONT TZISI EPUOF XAZRN ZTSQK BGIIS MIMII SMIMX HAHUF ZNMFG WIMMB QWQLT SMZTU XRBSA EMFGW IHUAM WQFFV CKNSM TIJYY RCOJR ASBOE YHQAI KYPAK YJKUX VUFIG GBCFY HQKIJ QDFVU LBOGZ XTQOI MIMWH QOXUQ EMBIX KYAIB SAEFC UKPYA ILKJL RRIQT URSYD QUOYS OJLXR IQTUB IHC"

Use the index of coincidence to decide if this

ciphertext was produced by a monoalphabetic or

polyalphabetic cipher.

Solution: Using the Friedman Maplet, we can

generate the following frequency table for the

letters in this message:

A B C D E F G H I J K L M

N O P Q R S T U V W X Y Z

This gives 4382524210 nnnnnn

total letters. Thus the index of coincidence is:

I =

25

0

)1()1(

1

iii nn

nn

= )]1()1()1()1()1([)1(

125252424221100

nnnnnnnnnn

nn

=

230 n 181 n 102 n 83 n 164 n 255 n 166 n 187 n 378 n 129 n 1710 n 1211 n 2912 n

1113 n 1814 n 515 n 2216 n 1617 n 2518 n 1419 n 2220 n 721 n 1222 n 1323 n 2124 n 1125 n

1011202112131112

6721221314242515162122451718101128291112

16171112363717181516242515167891017182223

437438

1

=

=

Since 0.043186 is much closer to 0.0385 than 0.065, the cipher is likely polyalphabetic.

█

1104201561324246218260024046220

30611081213227213213323062406002405690306506

191406

1

043186.095703

4133

191406

8266)8266(

191406

1

Using the Index of Coincidence to Estimate the Keyword Length for the Vigenere Cipher

• So far we have used the index of coincidence to determine whether a cipher is polyalphabetic or monoalphabetic. Once this is determined, it can be used to estimate the keyword length. Knowing the keyword length is the first essential step when attempting to break the Vigenere cipher. The following formula gives an estimate of the keyword length:

Keyword Length Formula for the Vigenere Cipher

where

n = the number of letters in the ciphertext

message.

I = index of coincidence.

k = keyword length.

)0385.0(065.0

0265.0

InI

nk

Example 8: For the ciphertext message given in

the previous example, use the index of

coincidence to estimate the keyword length.

Solution:

The Kasiski Test

• The Kasiski test is another method that can be used to approximate the keyword length in the Vigenere cipher. The cipher was first published by a retired Prussian Army officer named Friedrich Wilhelm Kasiski in 1863. The Kasiski test had been independently discovered almost a decade earlier, in 1854, by the English inventor Charles Babbage .

http://en.wikipedia.org/wiki/Charles_Babbage

http://en.wikipedia.org/wiki/Charles_Babbage

• The Kasiski test relies on the occasional coincidental alignment of letter groups in the plaintext with the keyword to give a keyword length estimate. The test says if a string of characters appears repeatedly in a polyalphabetic ciphertext message, it is possible (though not certain), that the distance between the occurrences is a multiple of the length of the keyword.

• To demonstrate how this works, suppose the Vigenere cipher is used to encipher the message “THE CHILD IS FATHER OF THE MAN” using the keyword “POETRY” to produce the following ciphertext.

Plaintext T H E C H I L D I S F A T H E R O F T H E M A N

Keyword P O E T R Y P O E T R Y P O E T R Y P O E T R Y

Ciphertext I V E V Y G A R M L M Y I V E K F D I V E F R L

The keyword “POETRY” is six letters long. Note that the trigraph “IVE” occurs three times in the ciphertext. The second occurrence of “IVE” occurs 12 character positions after the first. The third occurrence of “IVE” occurs 6 character positions after the second.

• This leads to the assertion that the separations of common letter occurrences stand a good chance of being multiple of the keyword. This observation leads to the following fact concerning the Kasiski test.

Fact: The greatest common divisor or divisor of it of the separations of common characters that occur in a ciphertext enciphered by the Vigenere cipher tends to be a good chance of being equal or at least some multiple of the keyword.

• Since “IVE” was separated by 12 characters and then 6 characters, then by observing that gcd(6, 12) = 6, we see that we have hit exactly the number of letters that occurred in the keyword. We conclude with one more example illustrating the Kasiski test.

Example 9: Using the Kasiski Maplet, estimate

the keyword length of the ciphertext given in

Example 7 applying the principles of the Kasiski

test.

Solution: Will demonstrate using the Kasiski

Maplet in class.

█

Section 2.7: The Friedman and Kasiski Tests Practice HW (not to hand in) From Barr Text p. 1-4, 8.

Documents

probability of letters

english letters

standard english alphabet

frequencies of letters

selected letters

probability techniques

multiple letters

ciphertext alphabet