Cryptography, Attacks and Countermeasures Lecture 3 - Stream Ciphers

1

Cryptography, Attacks and Countermeasures Lecture 3 - Stream Ciphers

John A Clark and Susan StepneyDept. of Computer Science

University of York, UK{jac,susan}@cs.york.ac.uk

2

Stream Ciphers Part I: Pseudo-random number

generators. Lots of Bad Ways

Part II: Divide and conquer attacks.

3

Stream Ciphers - Vernam Vernam Cipher works by generating a random

bit stream and then XORing that stream on a bit by bit basis with the plaintext.

BiPi Ci

BiPi

Random

Stream Bi

Random

Stream Bi

Key K Key K

Both sender and receiver can generate key stream Bi. Receiver XORs the ciphertext stream with the key stream to recover the plaintext stream. We will use this cipher to illustrate several concepts.

4

Linear Feedback Shift Registers

Lij

At each iteration there is a right shift, a bit falls off the end, and the leftmost bit is set according to the linear feedback function. Here 0+0+1=1

11 1 0 0 1 1 0 11

111 1 0 0 1 1 0 11 1 1 0 0 1 1 0 111

5

Periodicity We would like the stream to be ‘random-looking’. One feature should be that the stream should not

repeat itself too quickly. Note that this is in effect a finite state machine

and so must repeat itself eventually. The maximal period for an n-bit register is 2n-1.

Why not 2n ?

6

Maximal Period m-sequences The tap sequence defines the linear

feedback function and is often regarded as a finite field polynomial.

You have to choose the tap sequence very carefully.

Some choices provide a maximal length period. These are primitive polynomials

7

Primitive Polynomials Give m-sequences

00 1 1

t D3 D2 D1 D0

0 0 1 1 0

1 0 0 1 1

2 1 0 0 1

3 0 1 0 0

4 0 0 1 0

5 0 0 0 1

6 1 0 0 0

7 1 1 0 0

t D3 D2 D1 D0

8 1 1 1 0

9 1 1 1 1

10 0 1 1 1

11 1 0 1 1

12 0 1 0 1

13 1 0 1 0

14 1 1 0 1

15 0 1 1 0

Common to denote the above by the polynomial C(D)=1+D+D4. Note we are back to where we started.

8

Some Polynomials Don’t

00 1 1

t D3 D2 D1 D0

0 0 1 1 0

1 1 0 1 1

2 1 1 0 1

3 0 1 1 0

4 1 0 1 1

5 1 1 0 1

6 0 1 1 0

7 1 0 1 1

t D3 D2 D1 D0

8 1 1 0 1

9 0 1 1 0

10 1 0 1 1

11 1 1 0 1

12 0 1 1 0

13 1 0 1 1

14 1 1 0 1

15 0 1 1 0

The polynomial C(D)=1+D+D3 does not give a maximal period sequence.

9

Not good for PRNG Consider a 64 bit register. Can this be used as a key stream

generator? No. Once you know a very small amount of plaintext (e.g. 32

consecutive bits) then you can calculate the corresponding key stream and so you know the rightmost 32 bits in the register.

You can now try in turn all other 232 combinations for the rest. When you get the right one, you are able to generate the whole key stream

And so plaintext should make sense. This is just too easy to break. But LFSRs are very easy to implement; and execute quickly.

Can we fix matters? How about a less primitive way of extracting the key stream. How about combining several streams to achieve any better

security?

10

Very Simple Model

LFSR 1

Zj

f

Use some function f to operate on some subset of the LFSR register components

11

Boolean Functions – Algebraic Normal Form (ANF)

A Boolean function on n-inputs can be represented in minimal sum (XOR +) of products (AND .) form:

This is the algebraic normal form of the function.

The algebraic degree of the function is the size of the largest subset of inputs (i.e. the number of x j in it) associated with a non-zero co-efficient. 1 is a constant function (as is 0) x1+x3+x5 is a linear function x1.x3+x5 is a quadratic function x1.x3.x5+x4x5+x2 is a cubic function

f(x1,…,xn)=a0+a1. x1 +…+an. xn+ a1,2.x1.x2+…+ an-1,n.xn-1.xn+

… …+a1,2,..,n x1.x2 ...xn

12

Very Simple Model

LSFR 1

Zj

f

What about a linear function f?

13

Very Simple Model

This would be pretty awful. Suppose we know a sequence of keystream bits z0, z1 , z2, z2 ,…=1, 1, 1, 1,…

Essentially every key stream output can be expressed as a linear function of the elements of the initial state. We can derive a number of these equations and then solve them by standard linear algebra techniques.

00 1 1

14

Very Simple Model

z0= s0 +s2s3 s2 s1 s0

s0 +s3 s3 s2 s1

s0 +s1 +s3 s0 +s3 s3 s2

s0 +s1 +s2 +s3

s0 +s1 +s3

s0 +s3 s3

s0 +s1 +s2 s0 +s1 +s2 +s3

s0 +s1 +s3 s0 +s3

z1= s1 +s3

z2= s2 +s0 +s3

z3= s3 +s0 +s1+s3 = s0 +s1

XORed to give feedback

15

Very Simple Model

z0= s0 +s2

z1= s1 +s3

z2= s2 +s0 +s3

z3= s3 +s0 +s1+s3 = s0 +s1

We can apply linear algebra equation solving techniques and solve for the si.

1 0 1 00 1 0 11 0 1 11 1 0 0

s0

s1

s2

s3

1111

z0

z1

z2

z3

This has solution

s0

s1

s2

s3

0110

00 1 1

16

Harder Model

LSFR 1

Zj

f

What about a non-linear function f

This is better but it is still possible to attack such systems if f is approximated by a linear function. We will talk about approximations later.

17

Classical Stream Cipher Model

Plaintext Stream Pj

Keystream Zj

Cipherstream Cj

LSFR 1

LSFR 2

LSFR n Zj

L1j

L2j

Lnj

Combining Boolean function f.Receiver can generate key stream and recover plaintext

N- Bit registers

Initial register values form the ‘key’

Choose f very carefully

f

Pj Cj

18

Periodicity The LFSRs need not all be the same length. The LFSRs will give a vector input which has

period that is the product of the least common multiple of the periods of each of the LFSRs.

E.g. if period LFSR1=3,LFSR2=7 then overall period is 21

19

Awful Choice for f

LSFR 1

LSFR 2 Zj

x1j

x2j

32- Bit registers


Zj=f(x1j , x2j)= x1j f

Pj Cj

This is a truly awful choice. The key is intended to be 2 x 32 = 64 bits.

You have completely ignored LFSR 2.

Key size = 32 bits only

20

Better but Still Awful Choice for f

LSFR 1

LSFR 2 Zj

x1j

x2j

32- Bit registers


f

Pj Cj

Congratulations! You have not ignored LFSR 2!

Key size = 64 bits?Zj=f(x1j , x2j) = x1j x2j

+

21

Better but Still Awful Choice for f

Well not quite such a good choice. Suppose you know 32 consecutive bits of

plaintext (or can guess them correctly). Calculate the 32 bits of key stream. But if stream bit is 0 then there are only 2 possible

pairs. Similarly, if stream value is 1. Effective key size = 232

x32 x31 x30 x2 x1

y32 y31 y30 y2 y1

1 1 0 1 0

+ + + + +

= = = = =

xk

yk

0

+

=

0

0

0

+

=

1

1

0

+

=

or

22

First Bit of Bad Linearity The combination function here is a linear function

of the inputs: f(x1,x2)=x1 f(x1,x2)=x2 f(x1,x2)=x1+x2

The following are quadratic functions: f(x1,x2)=x1.x2 + x1 f(x1,x2)= x1.x2 + x2 f(x1,x2)= x1.x2 + x1+x2

Extreme examples given but beware linearity - even a hint of it can spell trouble.

linear functions are so called because they can cause some cryptosystems to be broken straight away??

23

Divide and Conquer Attacks

Exploiting simple correlations in the combining function

24

Geffe Generator

LSFR 1

LSFR 2

LSFR 3

Zj

a1j

b2j

c3j

2x1Multiplexor

select

Z=(a & b) + (not(a) & c)

25

Geffe Generator – DIVIDE AND CONQUER

Looking at the table it is clear that the output z agrees with b 75% of the time.

Also agrees with c 75% of the time.

a b c z

0 0 0 0

0 0 1 1

0 1 0 0

0 1 1 1

1 0 0 0

1 0 1 0

1 1 0 1

1 1 1 1

26

Geffe Generator – DIVIDE AND CONQUER

So consider each possible initial state s of register LFSR2. Determine the LFSR2 stream that s produces. Check the degree of agreement of this stream with the actual

key stream. Turns out:

if state s is correct you will get roughly the right amount of agreement.

if state s is incorrect you will get roughly random (50%) agreement.

Thus we have targeted LFSR2 and can easily break it. Now can target LFSR3 in exactly the same way. So we can get LFSR2 and LFSR3. Now we can derive the

selection LFSR1 state very straightforwardly: try every possible state. The correct one should allow you to

simulate the whole sequence. Other ways too.

27

Divide and Conquer Divide and conquer attacks were

suggested by Siegenthaler as a means of exploiting approximate linear relationships between function inputs and its output.

This led to new criteria being developed as countermeasures to these correlation attacks.

We will consider an extremely simple example.

28

Divide and Conquer Consider the following combining function

f(x1,x2)=x1.x2+x1 Clearly not linear. But…

f(x1,x2) agrees with x1 75% of the time here. Consider each possible initial state of LFSR1 and determine the

degree of agreement with the actual key stream. The correct initial state will give approximately 75% agreement

and the rest will give fairly random agreement. It’s also obvious that if we know f(x1,x2)=1 then we know both x1

and x2 => this is simply due to the incredibly small nature of the example.

x1 x2 f(x1,x2)

0 0 0

0 1 0

1 0 1

1 1 0

29

Divide and Conquer Consider two functions f(x1,x2) and

g(x1,x2) We say that f(x1,x2) is approximated by

g(x1,x2) if the percentage of pairs (x1,x2) which given the same values for f and g differs from 50%.

If they agree precisely half the time we say that they are uncorrelated.

Note: if the percentage of agreement is less than 50% we can always find a function that has positive agreement, namely

g(x1,x2)+true.

30

Ideas Generalise We can consider similar ideas for n-input

functions: f(x1,x2,…,xn); and g(x1,x2,…,xn)

Degree of approximation with linear functions may be slight.

The smaller the degree of approximation the more data you need to have to break the system.

31

And then what? The idea of multiple LFSRs is that the size

of the keyspace should be the product of the keyspace sizes for each register.

Divide and conquer reduces this to a sum of key sizes and you attack each in turn.

Note what happens when you crack one LFSR. The complexity of the remaining task is reduced:

f(x1,x2)=x1.x2+x1 Once you know x1 then the task for x2 is simpler

– whenever you know x1=1 you know what x2 is.

32

All Fall Down In a similar vein, suppose:

There is a small exploitable correlation with input x1.

There is a small correlation with x1+x2. If LFSR 1 can be broken to reveal x1 then we

have now have a straightforward correlation with x2 to exploit.

33

Don’t tell them But what if you don’t publicise the taps

sequence – keep the feedback polynomial secret (as part of the key).

Makes things harder but there are in fact some further attacks here too.

34

Summary Have presented some very simple stream

cipher models. Divide and conquer attacks. Dangers of linearity and hints of it. Next lecture:

What do we do about the dangers? Boolean function criteria.

High non-linearity. High algebraic degree. Correlation immunity. Tradeoffs between them.

35

36

Cryptography, Attacks and Countermeasures Lecture 3 - Stream Ciphers

Documents