Project report – General Attacks on Elliptic Curve Based ... · PDF fileProject report – General Attacks on Elliptic Curve Based ... General Attacks on Elliptic Curve Based...

Project report – General Attacks on Elliptic Curve Based Cryptosystems

By Ofer Schwarz, Winter 2012-2013

Project advisor: Barukh Ziv

1 Introduction ....................................................................................................................... 2

2 Background ........................................................................................................................ 2

2.1 Elliptic curves ............................................................ 2

2.2 The discrete logarithm problem ............................... 2

2.3 Collisions and Pollard's Rho ...................................... 3

2.4 The Pohlig-Hellman reduction .................................. 4

3 Improvements and Optimizations ..................................................................................... 5

3.1 The Montgomery trick for inversion ........................ 5

3.2 Distinguished points ................................................. 5

3.3 Nivasch method for cycle detection ......................... 6

3.4 Negation Map ........................................................... 7

4 Implementation ................................................................................................................. 8

4.1 Elliptic curve arithmetic library ................................ 8

4.2 Collision library ......................................................... 8

4.3 Putting it all together ............................................... 8

5 Challenges .......................................................................................................................... 9

6 Results ............................................................................................................................. 10

6.1 The basic challenges ............................................... 10

6.2 Previous results ...................................................... 11

6.3 Optimization efficiency tests .................................. 11

7 Schedule .......................................................................................................................... 12

8 Bibliography ..................................................................................................................... 13

General Attacks on Elliptic Curve Based Cryptosystems Ofer Schwarz

2

1 Introduction

Many modern cryptographic algorithms, specifically for asymmetric encryption and digital

signatures, use the mathematical properties of groups as a basis for their strength. Elliptic

curves are a type of group whose structure is simple enough to allow efficient group

operations, yet complex enough that there are no known attack methods other than generic

ones, which have an exponential time complexity. Thus they are gaining increasingly

widespread popularity.

The project aimed to implement the current best-known general attack on elliptic curve-

based systems, with various improvements and optimizations, with the ultimate goal of

attacking a 64-bit curve. (See section ‎6 for detailed results.)

2 Background

Note: this section is a summary of the relevant material in [1].

2.1 Elliptic curves

An elliptic curve over the field is the set of all solutions ( ) to the equation

where , with an extra symbolic "point at infinity" marked . Using

geometric properties of the "graph" of solutions, one can define an abelian group structure

for the curve, with the point at infinity serving as the unit member of the group.

Cryptographic systems usually use elliptic curves over prime fields ( for some large prime

number ) or binary fields ( for some integer ), since field arithmetic in these

particular fields can be implemented very efficiently. In this project we have focused on

prime field curves.

In curves over a prime field, the Weierstrass equation above can be expressed, using a

variable change, as a much simpler equation of the form .

2.2 The discrete logarithm problem

The Diffie-Hellman, El Gamal and DSA cryptographic algorithms are all based on the

computational hardness of a mathematical problem called DLP, or Discrete Logarithm

Problem. In this problem, we are given a group , a member of the group with a finite

order, and a member of the subgroup ⟨ ⟩ generated by , i.e., such that for

some (or, using the additive syntax for abelian groups, which is commonly used for elliptic

curves: ). The goal is to find the number . (This problem is called discrete logarithm

since this operation is logically very similar to the real-valued logarithm function)

DLP in the group of integers modulo has been demonstrated to be computationally

equivalent to the problem of integer factorization, and indeed, sub-exponential algorithms

exist for both. However, the best known algorithms for DLP over a general group are

exponential, and as mentioned eariler, these are also the best-known methods for DLP over

elliptic curves.


3

2.3 Collisions and Pollard's Rho

The current best-known method for solving ECDLP (Elliptic Curve DLP) relies on the following

idea: given two points on the curve, and , such that , suppose that we find

integers so that . Given these, we can easily find :

( ) ( ) ( ) ⇒ ( ) ( ) ( )

where is the order of in the group.

The naïve method for finding such a collision is using the birthday paradox: if we generate

multiple random pairs of ( ) and store the resulting linear combinations ( )

in a hash table, we can expect to find a collision after (√ ) pairs. However, this method

also requires (√ ) memory, which is highly impractical for even relatively small values of

(real cryptographic systems typically use elliptic curves of at least 192 bits).

Pollard's Rho method [2] uses an alternative approach, based on the properties of random

functions. Every function over a finite space is composed of finite-length chains:

Figure 1 : The eponymous Rho, with a highlighted collision

Each chain contains a cycle and, more importantly for us, a collision. In a random function,

the expected chain length is √ , with about half of that belonging to the tail and the

other half to the cycle part of the chain. The collision can then be found using one of several

methods, for example by choosing a random starting point and applying the well-known

Floyd's algorithm for cycle detection [3], using an expected √ applications of the

function and constant memory.

Now all that is left to do is to find a way to create a pseudorandom function so that given

, it will be easy to find that satisfy ( ) . This is achieved by

using an additive walk: first we partition the curve into several disjoint subsets ,

typically according to the value of the least bits of the coordinate, then we

choose random integers and , and define ( ) for

. While this process doesn't result in a truly random function, it does yield

good enough results, and allows for very easy calculation of the linear combinations as

mentioned.


4

2.4 The Pohlig-Hellman reduction

The Pohlig-Hellman method [4] reduces an ECDLP calculation (or really any discrete

logarithm) over a group of order to several instances of ECDLP over subgroups of orders of

the prime factors of , meaning that an elliptic curve cryptosystem with order is only as

strong as the highest prime factor of .

The reduction consists of two parts: the first is noting that if

, and we have

( ) , then we can find using the Chinese Remainder theorem.

Therefore we can assume WLOG that for some prime number and positive integer

.

Now, there is a unique way to write

where

. The second part of the Pohlig-Hellman reduction finds the

values of iteratively, using a single ECDLP instance of order for each step.

First, define

and

. Then we have:

but since this means that , and we can find by solving ECDLP for

( ), where the order of is .

Once we have , we can calculate in a similar fashion, by defining

( ) ( ) (

) (

) (

)

and solving ECDLP for ( ). We can find all this way: in the 'th iteration we'll have

[ (

) ]


5

3 Improvements and Optimizations

While the attack described above is enough to achieve an asymptotic complexity of (√ ),

when dealing with time frames of hours and even days, it is obvious that even constant

factors can count for a lot. To that end, several improvements and optimizations were used,

that don't give any asymptotic benefit but do result in a significant reduction of calculation

time and/or number of iterations.

3.1 The Montgomery trick for inversion

The underlying observation behind the Montgomery trick is that the most expensive field

operation is inversion (which is required for division). For example, in a prime field elliptic

curve, given ( ) and ( ), in order to calculate ( ) we need

to make the following calculations:

( ) ( )

(A quick aside: the value of is actually the geometrical "slope" of the line connecting the

points and )

Addition and subtraction in a prime field are very cheap (and even more so in binary fields).

Multiplication and squaring are more expensive, and use more modular reductions, but still

pale in comparison to inversions, which require the use of Euclid's algorithm and are

consequently considerably more expensive for large numbers.

The Montgomery trick performs field inversions in parallel, i.e., given , it

calculates

. The advantage over doing a simple calculation of every inversion by

itself is substituting the inversion operations with ( ) multiplications and a single

inversion. The trick works by calculating accumulating products: for each we

calculate ∏ , then given

we calculate ∏

.

3.2 Distinguished points

As evident by its definition, the Montgomery trick requires running several instances of

ECDLP in parallel, so they can all perform their inversions simultaneously. While probabilistic

analysis shows that running parallel instances does result in an expected overall speedup

factor of √ (so that we end up slowing the calculation "only" by a factor of √ and not ),

there is a much better method that gives a speedup factor of .

Algorithm 1: Montgomery trick for inversion

ParallelInversion(a[1..k]):

c[1] a[1]

for i = 2 to k:

c[i] c[i-1] * a[i]

u = inverse(c[k])

for i = k downto 2:

result[i] u * c[i-1]

u u * a[i]

result[1] u


6

The distinguished points method [5] relies on the following observation: the previously

described Pollard's Rho "chains" may intersect.

Figure 2: Pollard's Rho with intersecting chains

Each chain still contains a cycle, but if we can keep track of the intersections as well we will

gain a significant speedup. This is achieved by keeping a central hash table with encountered

points. Of course, we can't just put every single point in a hash table, because of memory

considerations. So what we do is find some way to distinguish some fraction of the space

(e.g., points whose coordinate's least bits are all 0), and only insert distinguished points

into the hash table. This saves a factor of in memory, but still effectively guarantees that

we'll find the intersection, since if the chain is long enough (if it isn't, we'll find the cycle

quickly) and the function is random enough, we will eventually reach a distinguished point.

The distinguished points method, in conjunction with the Montgomery trick, gives all the

benefits of the reduced number of inversions with none of the disadvantages of running

several unrelated parallel instances.

3.3 Nivasch method for cycle detection

While Floyd's algorithm is the simplest and most widely known method for cycle detection, it

is by no means the most efficient. An alternative method proposed by Gabriel Nivasch in

2004 [6] uses a stack to find the smallest value on the cycle1.

The idea is to keep a stack of values encountered so far, and for each new value

encountered, remove from the stack all values that are strictly larger than it. If, at any time,

the value at the top of the stack is the same as the value currently being handled, we have

found our collision.

Nivasch also proposes a further improvement: if we partition the space into disjoint

subsets (as illustrated in section ‎2.3) and use a separate stack for each subset, we only need

to go on until we reach a collision in one of the stacks. Probabilistic analysis provided in [6]

1 For some predefined order, e.g., according to the value of the coordinate.


7

shows that this results in an expected running time of (

( ))√ , with a memory

overhead of ( (√ )). For comparison, as mentioned previously, the expected running

time of Floyd's algorithm is about √ .

3.4 Negation Map

While point addition in elliptic curve is an expensive operation, negating a point is relatively

very cheap: in prime curves, for example, if ( ) then ( ). Thus, if we treat

every pair of points ( ) as a single element | |, for example by using the point of the

pair which has an even coordinate, we can effectively reduce the size of the search space

by a factor of 2, resulting in an overall speedup factor of √ .

3.4.1 Fruitless cycles

The use of the negation map presents a problem when using additive walks (described in

section ‎2.3). Suppose that for a given point | |, where for some (where

are the partition subsets of the space used to build the additive walk function), we

have ( ) which satisfies | | and . Then we would have:

(| |) ( )

so that | (| ( )|)| .

This is called a fruitless cycle of length 2, and is actually quite common, with a probability of

for each step in the walk. The cycle is called fruitless because it does not represent an

actual collision, just a byproduct of our "twisting" of the space, and since the coefficients of

the linear combination are identical, it can't be used to find the discrete logarithm. There can

also be fruitless cycles of larger even lengths, although they will of course be rarer.

3.4.2 Eliminating fruitless cycles

One method for eliminating fruitless cycles has been proposed by Bernstein, Lange and

Schwabe in [7], and relies on periodically checking for such cycles. For example, to eliminate

fruitless cycles of length 2, every iterations, when calculating ( ), first we check

if . In that case, instead of continuing the walk as planned, we define

| { }|

thus "breaking" the cycle. Similarly, for some larger interval we check periodically for cycles

of length 4, and for an even larger interval we check for cycles of length 6, etc. Since the

probability of a fruitless cycle of a given length increases exponentially as the length

increases, it is enough to check for lengths up to the smallest even length that exceeds

.

Analysis provided in [7] suggest that the optimal interval in which to check for cycles of

length is approximately . In this project, the parameters used (for ) were

.


8

4 Implementation

The project's code was written entirely in C++, and consists of 3 parts: an elliptic curve

arithmetic library, a generic collision-finding library, and a wrapper that connects the two

and supplies the actual definitions of the partitioning of the space, the additive walks, etc.

4.1 Elliptic curve arithmetic library

The EC library is a generic (templated) C++ library for elliptic curve arithmetic, allowing for

different curve types and different methods of point addition and multiplication. It supplies

a main template Point class, which is a convenient interface for all EC operations and

supports validation (i.e., checking whether given coordinates ( ) actually represent a

point on the curve), standard C++ operators, and output. Additional classes (most of which

provide the template arguments for the Point class) decide the type of the curve and the

methods for adding and multiplying points. The current curve being used is kept as a global

context variable.

The only curve type currently available in the library is the prime field curve, with several

addition methods (the regular method, the Montgomery trick detailed earlier, and the

Jacobian projective coordinates described in chapter 3.2.1 of [1]) and some basic

multiplication algorithms. However, the library is designed so that it is very easy to add new

curve types and new addition and multiplication methods, and mix-and-match them as

necessary.

Since curve operation performance depends heavily on efficient field arithmetic, and that is

far beyond the scope of this project, two external open source libraries were used to achieve

this: the basic, brass tacks modular arithmetic is handled by GMP [8], a widely used library

for arbitrary-precision (big-number) arithmetic, while the higher-level number theory

algorithms are done by NTL [9], a Number Theory Library that can handle polynomials,

binary fields and matrices as well as simple prime fields, and supplies an easy-to-use C++

operator-based interface.

4.2 Collision library

Another generic, templated library handles collision-finding algorithms. It supplies an

abstract CollisionFinder interface and several concrete subclasses for various methods

described in this document – Floyd's and Nivasch's algorithms as well as the distinguished

points method. The interface is, as mentioned, completely generic and all that is required,

apart from the actual function to find a collision in, is a few problem-specific definitions (e.g.,

how to generate a random element, partition the space, or a predicate for distinguishing

points).

4.3 Putting it all together

The last part is the ECDLP code itself. It is relatively small, and includes the pseudorandom

additive walk function (as well as the negation map variant), the Pohlig-Hellman reduction,

the required functions for partitioning, distinguishing etc., and of course a main function to

run it all.


9

5 Challenges

In order to test the performance of the ECDLP code, a few increasingly-difficult challenges

were used:

Challenge #1 – 30 bits2

Field order ( ): 754526683

Curve parameters ( ): 72537189, 706168557 (Curve equation is )

= (391592639, 187105396), = (699400157, 82294806)

ECDLP order ( ( )) = 754564781

Challenge #2 – 40 bits

1018754028791

( ) = 871520218049, 1007486607871

= (215868580937, 253947951840), = (283497910998, 322669501418)

= 1018752667981


= 862310130873029

( ) = 753664604275938, 359867061355737

= (396159790871760 299026752948778), = (594219074514457, 329535619534883)

= 862310073862651


= 17778887349362591359

( ) = 9532767884830205561, 17350004542596651438

= (2072463837547420019, 13457440224317410775)

= (14290627919077884680, 8330780035610710830)

= 17778887341958652103

Challenge #5 – special challenge

= 414507122857381699247

= (215672232155085007005, 176420948314972445409)

= (49818534942346740253, 67908233076804365605)

This challenge was used to test the Pohlig-Hellman reduction, since it is the only one of the

five in which the order of the point is not prime. It was given exactly in this form.

The curve parameters can be deduced from the values of and according to the curve

equation, and the order can be found using the fact that in some families of curves (one of

which the given curve belongs to), the order of the curve satisfies ( ) for

some very small integer , so that | .

The actual value of here is about 62 bits, but its largest prime factor is only 25-bit.

2 The hardness of the challenge is measured by the order of the point P, so for example "30 bits"

means .


10

1

4

16

64

256

1024

4096

16384

65536

30 40 50 64

Ru

nti

me

(sec

on

ds)

Challenge bits

Average time

6 Results

The main result of this project is of course the actual attack, specifically for the 64-bit

challenge, but the modular nature of the libraries allowed for some experimenting with the

optimizations and parameters, and the results from these experiments are summarized here

as well.

6.1 The basic challenges

These are the running times for the various challenges, excluding the last special one, for the

standard version of the code, i.e., with all the improvements described in section ‎3. The

statistics are taken from 10 runs (except for the 64-bit challenge statistics, which include

only 1 run for obvious reasons). All runs were made on a single core of an Intel P8700 CPU.

Times are measured both in actual running time (seconds) and in distinct calls to the

function.

30 bits

Running time: min 1 sec, max 2 secs, average 1.5 secs3.

Function calls: min 2944, max 68160, average 38115.

40 bits

Running time: min 10 secs, max 57 secs, average 21.8 secs.

Function calls: min 440256, max 2538464, average 943411.

50 bits

Running time: min 117 secs, max 993 secs (16.5 min.), average 566 secs (9.5 min.)

Function calls: min 5323328, max 46201472, average 23051004

64 bits

Running time: 57540 secs (≈ 16 hours).

Function calls: 1993844576.

3 The measurement used the built in C time() function, which measures full seconds, so the actual

running times are probably considerably shorter.

0

5

10

15

20

25

30

35

30 40 50 64

log 2

(#ca

lls)

Challenge bits

Average function calls


11

6.2 Previous results

The best result achieved by a previous group in this project is solving a 60-bit ECDLP in 5-6

days. Said group used a distributed attack, but without all the optimizations detailed in this

document and without the power of GMP and NTL.

The best result achieved in the entire world to date is solving a 112-bit ECDLP [10]. This

attack used a cluster of 218 PlayStation 3 game consoles with heavy optimizations (including

low-level integer operations using the PS3's Single-Instruction Multiple-Data architecture),

and took about 3.5 months or elliptic curve operations.

6.3 Optimization efficiency tests

There are 3 optimizations described in section ‎3: the Montgomery trick, the negation map,

and the Nivasch cycle-detection algorithm. These are the results of testing each of these

against the corresponding simpler method, using the 40-bit challenge as a benchmark (since

the 30-bit challenge is too quick to measure the difference in the runtime, and the 50-bit

one takes too long to allow multiple test runs).

The Nivasch algorithm for cycle detection is provably asymptotically better than Floyd's

algorithms, and indeed, using Floyd's algorithm instead results in an average number of

function calls that is more than twice that of Nivasch's method (the exact factor is 2.16).

However, the time per call is obviously shorter (Floyd's algorithm practically has no

overhead) and the runtime factor is 1.4 (still in favor of Nivasch, though).

The negation map is quite controversial, and hasn't been conclusively proved to improve

runtime, since the iterations it saves might be wasted on fruitless cycles and the checking

thereof. The experiment shows that while it does save an average factor of about 1.1 in the

total iteration count, it comes with a more significant added cost per iteration, so that the

total average runtime with the negation map is about 1.07 times longer.

Lastly, the Montgomery trick, while, as expected, yields no noticeable change in the average

number of calls (a factor of 1.008 that may as well be attributed to the random element),

does result in a speedup factor of 1.43. A further test for 30 bits shows a speedup factor of

1.33, which is to be expected, as the efficiency of the Montgomery trick is tied to the

difference of computation time between multiplication and inversion.


12

7 Schedule

The schedule proposed in the project kickoff:

Basic study and reading: in 2 weeks (by 12/11).

Basic implementation of EC library: by mid-project meeting.

Optimizations of EC library: by end of year (1/1).

General attack implementation: by end of semester (1/2).

Further research and optimization: until submission of final report.

Actual retrospective schedule:

Basic study and reading: 2 weeks (12/11).

Familiarization and installation of GMP and NTL: 1 week (19/11).

Basic design of EC library: 2 weeks (2/12).

Basic implementation of EC library: 3 weeks (24/12 – mid-project meeting).

Testing and debugging EC library: 3 weeks (13/1).

First milestone: EC library finished – 13/1.

Design and implementation of collision library (incl. Nivasch): 3 weeks (10/2).

Important milestone: first challenge solved (30-bit) – 10/2.

Montgomery trick and distinguished points method: 2 days (12/2).

Negation map and Pohlig-Hellman reduction: 3 weeks (4/3).

Final milestone: 64-bit challenge solved – 8/3.

The major discrepancy is that the original schedule hadn't taken into consideration the

(unexpectedly heavy) course load of the semester and a quite busy exam period in the

second half of February. Other than that, the proposed schedule was pretty much spot-on.


13

8 Bibliography

[1] D. Hankerson, A. Menezes and S. Vanstone, Guide to Elliptic Curve Cryptography, New

York: Springer, 2004.

[2] J. M. Pollard, "Monte Carlo Methods for Index Computation (mod p)," Mathematics of

Computation, vol. 32, no. 143, pp. 918-924, Jul. 1978.

[3] R. W. Floyd, "Nondeterministic Algorithms," Journal of the ACM, vol. 14, no. 4, pp. 636-

644, October 1967.

[4] S. Pohlig and M. Hellman, "An Improved Algorithm for Computing Logarithms over

GF(p) and its Cryptographic Significance," IEEE Transactions on Information Theory, vol.

24, no. 1, pp. 106-110, 1978.

[5] P. Van OOrschot and M. J. Wiener, "Parallel collision search with cryptanalytic

applications," Journal of Cryptology, vol. 12, pp. 1-28, 1999.

[6] G. Nivasch, "Cycle Detection Using a Stack," Information Processing Letters, vol. 90, pp.

135-140, 2004.

[7] D. J. Bernstien, T. Lange and P. Schwabe, "On the correct use of the negation map in the

Pollard rho method," Public Key Cryptography - PKC 2011, pp. 128-146, 6-9 March 2011.

[8] "The GNU Multiple Precision Arithmetic Library," GNU, [Online]. Available:

http://gmplib.org/.

[9] V. Shoup, "NTL: A Library for doing Number Theory," [Online]. Available:

http://www.shoup.net/ntl/.

[10] J. W. Bos, M. E. Kaihara, T. Kleinjung, A. K. Lenstra and P. L. Montgomery, "PlayStation 3

computing breaks 2^60 barrier; 112-bit prime ECDLP solved," 2009. [Online]. Available:

http://lacal.epfl.ch/112bit_prime.

Project report – General Attacks on Elliptic Curve Based ... · PDF fileProject report – General Attacks on Elliptic Curve Based ... General Attacks on Elliptic Curve Based...

Documents