Study of Extended Euclidean and Itoh-Tsujii Algorithms in ...

Study of Extended Euclidean and Itoh-Tsujii Algorithms in GF (2m) using

polynomial bases

by

Fan Zhou

B.Eng., Zhejiang University, 2013

A Report Submitted in Partial Fulfillment of the

Requirements for the Degree of

MASTER OF ENGINEERING

in the Department of Electrical and Computer Engineering

c© Fan Zhou, 2018

University of Victoria

All rights reserved. This report may not be reproduced in whole or in part, by

photocopying or other means, without the permission of the author.

ii

Study of Extended Euclidean and Itoh-Tsujii Algorithms in GF (2m) using

polynomial bases

by

Fan Zhou

B.Eng., Zhejiang University, 2013

Supervisory Committee

Dr. Fayez Gebali, Supervisor

(Department of Electrical and Computer Engineering)

Dr. Watheq El-Kharashi, Departmental Member

(Department of Electrical and Computer Engineering)

iii

ABSTRACT

Finite field arithmetic is important for the field of information security. The inversion

operation consumes most of the time and resources among all finite field arithmetic

operations. In this report, two main classes of algorithms for inversion are studied.

The first class of inverters is Extended Euclidean based inverters. Extended Euclidean

Algorithm is an extension of Euclidean algorithm that computes the greatest common

divisor. The other class of inverters is based on Fermat’s little theorem. This class

of inverters is also called multiplicative based inverters, because, in these algorithms,

the inversion is performed by a sequence of multiplication and squaring. This report

represents a literature review of inversion algorithm and implements a multiplicative

based inverter and an Extended Euclidean based inverter in MATLAB. The experi-

mental results show that inverters based on Extended Euclidean Algorithm are more

efficient than inverters based on Fermat’s little theorem.

iv

Contents

Supervisory Committee ii

Abstract iii

Table of Contents iv

List of Tables vi

List of Figures vii

List of Acronyms viii

Acknowledgements ix

Dedication x

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Preliminaries: Binary Finite Field Arithmetic . . . . . . . . . . . . . 2

1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Project Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Report Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Extended Euclidean algorithm 6

2.1 Extended Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . 6

2.2 An example of Extended Euclidean algorithm . . . . . . . . . . . . . 9

3 Itoh-Tsujii algorithm 11

3.1 Inversion based on Fermat’s little theorem . . . . . . . . . . . . . . . 11

3.2 Itoh-Tsujii algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 An example of Itoh-Tsujii algorithm . . . . . . . . . . . . . . . . . . 15

v

4 MATLAB Implementation 17

4.1 MATLAB results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Analysis and comparison . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Conclusion 20

Appendix A 22

Bibliography 26

vi

List of Tables

Table 2.1 An example of binary polynomial division . . . . . . . . . . . . 8

Table 2.2 An example of EEA . . . . . . . . . . . . . . . . . . . . . . . . . 10

Table 3.1 Inverse of a ∈ GF (2233) using an addition chain [1] . . . . . . . . 15

Table 4.1 Execution time of EEA and Itoh-Tsujii Algorithms on a quad-

core processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Table 4.2 Execution time of EEA and Itoh-Tsujii Algorithms on a dual-core

processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

vii

List of Figures

Figure 1.1 ECC Arithmetic Architecture . . . . . . . . . . . . . . . . . . 2

Figure 3.1 Flowchart of Itoh-Tsujii Algorithm . . . . . . . . . . . . . . . . 14

viii

List of Acronyms

EEA Extended Euclidean Algorithm

ECC Elliptic Curve Cryptography

FLT Fermat’s Little Theorem

GCD Greatest Common Divisor

SM scalar multiplication

VLSI Very Large Scale Integration

ix

ACKNOWLEDGEMENTS

I would like to thank my supervisor Dr. Gebali, who provided my valuable gudance

and advice throughout my graduate study. Besides my supervisor, I would like to

thank Ibrahim Hazmi for helping me improve my project. My gratitude also goes to

my parents and my roommate who constantly support me when I am in need.

x

DEDICATION

To my parents

Chapter 1

Introduction

1.1 Background

Elliptic Curve Cryptography (ECC) is a public-key cryptosystem based on the alge-

braic structure of elliptic curves over finite fields, which can be used to create faster

and more efficient cryptographic schemes.

The hierarchy of the computations involved in the implementation of ECC cryptosys-

tems is in a pyramid of four levels of operations. Finite field or modular arithmetic

is the foundation of the pyramid, as it is the basic building block of elliptic curve

point addition and point doubling. Whereas the scalar multiplication (SM) is per-

formed by repeating point addition and point doubling operations and is used by all

ECC cryptographic protocols. Figure 1.1 illustrates the arithmetic architecture of

SM computational processes.

An elliptic curve E(K) over a field K is defined by an equation [2]:

y2 + a1xy + a3y = x3 + a22x + a4x + a6 (1.1)

where a1, a2, a3, a4, a6 ∈ K, and the discriminant of E is 4 6= 0. In the binary field,

E(K) could be simplified as:

2

y3 + xy = x3 + ax2 + b (1.2)

where a, b ∈ K.

Figure 1.1: ECC Arithmetic Architecture [3]

1.2 Preliminaries: Binary Finite Field Arithmetic

The finite field GF (2m) of order 2m is called binary finite field. The element a(x) ∈GF (2m) can be expressed as a binary polynomial of degree m− 1 [2]:

a(x) = am−1xm−1 + am−2x

m−2 + · · ·+ a2x2 + a1x

1 + a0 (1.3)

where ai = 0 or 1.

A polynomial f(x) of degree m is said to be irreducible in GF (2m) if there does not

exist two polynomials g(x) and h(x) of lesser degree in GF (2m) such that f(x) =

g(x)h(x). In polynomial arithmetic, as the coefficients ai of the polynomial can be

either 0 or 1, an irreducible polynomial f(x) is used to reduce the result of any

operation if its degree is greater than m− 1. For instance, the operations defined in

field GF (25) are on an irreducible polynomial f(x) = x5 + x2 + 1.

3

Computing point multiplication requires point doubling and point addition, which

can be implemented using four basic operations, namely, addition, subtraction, mul-

tiplication and division.

Addition and subtraction in binary fields can be achieved by adding or subtracting

two of these polynomials together, and reducing the result modulo 2. For instance,

let a(x) = am−1xm−1 + · · · + a1x

1 + a0, b(x) = bm−1xm−1 + · · · + b1x

1 + b0 and

c(x) = a(x) + b(x) = cm−1xm−1 + · · ·+ c1x

1 + c0. If ak, bk and ck are the coefficients

of a(x), b(x) and c(x) respectively, then:

ck = (ak + bk) mod 2 (1.4)

The computational complexity of addition and subtraction in binary field is usually

neglected.

Multiplication in a finite field is multiplication modulo an irreducible polynomial. Let

a(x) and b(x) be the elements of GF (2m) and let modular multiplication c(x) also

be an element of the field. c(x) might be accomplished in two steps, by performing

first a polynomial product of the two operands a(x) and b(x), followed by a modular

reduction step using the irreducible polynomial f(x). Then, we have:

c(x) = a(x) · b(x) mod f(x) (1.5)

A great deal of work has been done in studying aspects of inversion in a finite field

since inversion is the most time-consuming of the four basic operations. The inverse

of a polynomial a(x) in GF (2m) is defined as the computation process to find a

polynomial a−1(x) in GF (2m), such that:

a(x) · a−1(x) mod f(x) = 1 (1.6)

Inversion algorithms can be classified into two main categories, the Extended Eu-

clidean Algorithm, and the Fermat’s Little Theorem based algorithm. These two

algorithms will be discussed in chapters 2 and 3.

4

1.3 Related Work

Several algorithms for computing the Extended Euclidean based algorithms have been

proposed in the literature [3-5]. In [4], a class of bit serial unidirectional systolic ar-

chitectures for inversion and division in polynomial basis has been proposed. They

also presented a variant of Extended Euclidean Algorithm (EEA) optimized for uni-

directional systolization with no carry propagation structure. Also, in this design, a

simpler distributed counter structure which is suitable for applications where the field

dimension may be large or variable is introduced. Yan [5] presents two-dimensional

systolic architectures for inversion based on a modified extended Euclidean algorithm.

The new architecture uses a distributed control mechanism for a variety of field sizes

and is suitable for Very Large Scale Integration (VLSI) implementation. In compari-

son to similar architectures, their architectures have smaller critical path delays and

use considerably fewer hardware costs. An optimized inversion algorithm that can

be applied very well in hardware was proposed in [6]. A two-dimensional multipli-

cation/inversion systolic architecture and a one-dimensional multiplication/inversion

systolic architecture was implemented and can apply very well to an Elliptic Curve

arithmetic unit required in elliptic curve cryptography.

In terms of the Itoh-Tsujii inverse algorithm in GF (2m), Rebeiro [7] proposed a mod-

ification of the Itoh-Tsujii algorithm called quad-Itoh-Tsujii algorithm which was

implemented on field-programmable gate-array platforms. That adapted algorithm

requires shorter addition chains and reduces the clock cycles significantly by using a

parallel architecture. A modified Itoh-Tsujii algorithm algorithm for inversion with

polynomial basis was proposed in [8]. An optimal addition chain was used for inver-

sion to reduce the operation time by the parallel computation between part of mul-

tiplications and squarings. Their inversion architecture with a digit serial multiplier

experimentally obtained 61% timing improvement and 69% less resources on average

than previous designs with normal basis. Another parallel version of the Itoh-Tsujii

algorithm was proposed in [9]. It used a special class of irreducible trinomials, namely,

P (x) = xm + xk + 1 to achieve its best performance. This special class of irreducible

trinomials reduces the computation complexity and yields a 30% timing improvement

on average compared to the standard version of it. In [10], a high-performance and

high-speed FPGA implementation of polynomial basis ITA over GF (2m) generated

by irreducible trinomials and pentanomials has been proposed. The structures are

5

designed by efficient digit-serial multiplier and k-times squarer blocks, where k is a

small positive integer. Their design provides a comparable improvement compared

with other implementations of the polynomial basis Itoh-Tsujii inversion algorithm.

1.4 Project Contributions

This project aims at finding an effective algorithm to perform inversion. Below are

several contributions of this project:

1) A literature review of finite field arithmetic and the related work of inversion

algorithm.

2) Introduce Extended Euclidean algorithm and Itoh-Tsujii algorithm.

3) Implement Extended Euclidean algorithm and Itoh-Tsujii algorithm on MATLAB.

4) Compare Extended Euclidean algorithm and Itoh-Tsujii algorithm.

1.5 Report Organization

This report is organized as follows. Chapter 2 introduces the Extended Euclidean

based algorithm in a polynomial field GF (2m). The application of an inverter based

on Fermat’s little theorem is presented in Chapter 3. The MATLAB implementation

of these two algorithms is in Chapter 4. Chapter 5 is the conclusion of these two

algorithms.

6

Chapter 2

Extended Euclidean algorithm

Euclidean algorithm is to calculate the greatest common divisor (GCD) of two inte-

gers. It makes use of the fact that GCD(m,n) = GCD(m−n, n) and GCD(m, 0) = m

and simply repeats the operation until n is zero. A more efficient way of doing this

is to use

GCD(m,n) = GCD(n,m mod n), (2.1)

and repeat until n is zero. For example, to calculate the GCD (38,8), we write GCD

(38, 8) = GCD (8, 6) = GCD (6, 2) = GCD (2, 0) = 2. Since modulo basically is

repeated subtractions, this is very much the same algorithm, but several subtractions

are done at once. In the case of prime fields, a great number of variants on Euclidean

algorithm have been developed for use in cryptographic applications, as in [11]. The

Extended Euclidean Algorithm may also be used to find the multiplicative inverse of

polynomials over GF (2m).

2.1 Extended Euclidean algorithm

Let f(x) be the irreducible polynomial over GF (2m). Also, let a(x) be the polynomial

representation in this basis. Obviously, since f(x) is irreducible and since degree

a(x) < degree f(x) holds, a(x) and f(x) are relatively prime. If we initiate the

7

Euclidean algorithm with f(x) and a(x), then the extended algorithm generates two

polynomials, s(x) and t(x), with degrees degree s(x) < m and degree t(x) < m − 1.

These polynomials satisfy:

a(x)s(x) + f(x)t(x) = GCD(a(x), f(x)). [12] (2.2)

Since f(x) is an irreducible polynomial in GF (2m), we have GCD(a(x), f(x)) = 1.

Hence, we find that a(x)s(x)+f(x)t(x) = 1. Over the finite field GF (2m), f(x)t(x) =

0. Then a(x)c(x)+f(x)d(x) = 1 could be simplified to a(x)s(x) = 1. Then the inverse

element a−1(x) has the polynomial representation s(x). Therefore, we can use the

EEA for inversion in GF (2m) using a polynomial basis.

Algorithms 2.1 and 2.2 show the EEA algorithm.

Algorithm 2.1 Binary Polynomial Division (PloyDivide) [13]

Input: Polynomial a(x) of m− 1 degree & f(x) of m degree.Output: r&q.1: a← a(x), f ← f(x), r ← 1, q ← 12: while (fdeg ≥ adeg) do3: a← a << (fdeg − adeg)4: r ← a⊕ f5: if (rdeg ≥ adeg) then6: q ← (q << (fdeg − rdeg)) + 17: else8: q ← (q << (fdeg − adeg))9: end if10: f ← r11: end while

Algorithm 2.2 Extended Euclidean Algorithm (EEA) [13]

Input: Polynomial a(x) of m− 1 degree & f(x) of m degree.Output: a(x)−1 mod f(x).1: a← a(x), f ← f(x), t← 0, s← 1.2: while r 6= 0 (gcd 6= 1) do3: Perform PolyDivide to find r & q (f = a× q + r)4: f ← a, a← r5: t← s, s← t− q × s6: end while

8

Algorithm 2.1 is the binary division algorithm. The loop from line 2 to 11 computes

the remainder and the quotient when f(x) is divided by a(x). In line 3, we firstly left

shift a(x) by the amount of fdeg − adeg and use the new value of a(x) to do a(x) xor

f(x) to obtain r(x) in line 4. From line 5 to 9, we compare the degree of r(x) with

the degree of a(x). If the degree of r(x) is greater than or equals to da, we left shift

q(x) by the amount of fdeg − rdeg and add 1 to the last bit of q(x). If the degree of

r(x) is smaller than da, we left shift q(x) by the amount of fdeg − adeg. And then we

assign the value of r(x) to f(x) in line 10. This process is repeated until the degree

of f(x) is smaller than da.

For instance, let f(x)=1111010000, a(x)=11011, da is the degree of the initial value of

a(x) which is 4, df is the degree of f(x), dr is the degree of r(x), dfa=df -da, dfr=df -dr

and q(x) is the quotient. From Table 2.1, the initial value of dfa is 5, so we left shift

a(x) by 5 and the new value of a(x) is 1101100000. The value of r(x) is computed

by a(x) xor f(x). Since dr is greater than da and dfr is 2, we left shift q(x) by 2 and

add 1 to the last bit. After the third iteration, dr is 3 which is smaller than 4, the

quotient is 101100 with the remainder 100.

Table 2.1: An example of binary polynomial division

Iteration a(x) r(x) dfa dfr q(x)0 11011 1 5 11 1101100000 10110000 3 2 1012 11011000 1101000 2 1 10113 1101100 100 101100

Algorithm 2.2 use the remainder and the quotient produced in Algorithm 2.1 and

iteratively compute the s coefficient which is the inverse of a. The computational

complexity of EEA is O(m2) [10].

The proof of Algorithm 2.2 relies on the fact that for two nonzero polynomials a and

f , the EEA produces the unique pair of polynomials (s, t) such that:

ft + as = GCD(f, a). (2.3)

If we replace f and a by a and (f mod a) and let t be the previous value of t and s

be the previous value of s, we have:

9

at + (f mod a)s = GCD(a, f mod a) (2.4)

From Equation 2.1, it follows that:

GCD(f, a) = GCD(a, f mod a). (2.5)

By using Equation 2.5, it can be observed that the right part of Equation 2.3 and 2.4

are equal, therefore we can write that:

ft + as = at + (f mod a)s (2.6)

Based on the fact that the Euclidean division of f by a may be written f = aq + r

and r = f mod a = f − aq, we rearrange Equation 2.6 as:

ft + as = at + (f mod a)s

= at + (f − aq)s

= fs + a(t− qs)

Hence,

t = s

s = t− qs(2.7)

In this recursive function, the new value of s which is the output of Algorithm 2.2 can

be computed directly from its current values and its previous value by the formulas

s = t − qs. After iteratively computing the s coefficient by using the the quotient

obtained in Algorithm 2.1, we can get the value of s which is the inverse of a.

2.2 An example of Extended Euclidean algorithm

Table 2.2 shows an example of how the EEA algorithm works, where m=25, f(x) =

x25+x3+1 = x”2000009”, a(x) = x24+x8+x7+x6+x = x”10000E2”. The variables

10

in this example are displayed in hexadecimal representation.

All variables are initialized as follows: r = 1, q = 1, t = 0, s = 1.

Each variable is computed as follow:

f(i) = a(i− 1)

a(i) = r(i− 1)

r(i) and q(i) are calculated by using Algorithm 2.1. and the value of f(i) and a(i).

t(i) = s(i− 1)

s(i) = t(i− 1)⊕ (q(i)× s(i− 1)).

At the fifth iteration, r is 1 and the new value of r to the next iteration would be 0.

Thus the value of s in this iteration is the invese of a(x) mod f(x).

As shown in Table 2.2, the multiplicative inverse of (x10000E2 mod x2000009) is

(x054ED9E).

Table 2.2: An example of EEA

It# a f r q t s

0 10000E2 2000009 1 1 0 11 1CD 10000E2 1CD 2 1 22 D0 1CD D0 1B8CA 2 371953 6D D0 6D 2 37195 6E3284 A 6D A 2 6E328 EB7C55 1 A 1 E EB7C5 54ED9E

11

Chapter 3

Itoh-Tsujii algorithm

3.1 Inversion based on Fermat’s little theorem

The simple and primary dividers based on Fermat’s little theorem are also known

as multiplicative based dividers because in Fermat’s little theorem, the division is

performed by a sequence of squarings and multiplication.

The Itoh-Tsujii algorithm based on Fermat’s little theorem was originally proposed to

be applied in [14] using Normal Basis representation. Since its publication, however,

several improvements and variations of it have been reported in [6-8] showing that it

can also be used in other field representations such as the polynomial representation.

To compute inverse using normal bases representation, basis conversion between poly-

nomial and normal bases is needed at the beginning and end of the operation. The

algorithm to convert polynomial bases to normal bases is complicated and takes a

lot of computational work which influence the speed of compute inversion in normal

bases [15].

Binary field multiplication using normal bases representation is more complicated

and more costly in time and implementation area compared to multiplication using

polynomial bases [16]. Therefore, normal bases are competitive only with very few

multiplications [17]. The Itoh-Tsujii computes the multiplicative inverse using a series

of multiplications and squarings. Although squaring in normal bases is computed by

a cyclic shift of the binary representation [18], the higher computational complexity

12

of multiplication leads to efficiency decrease.

Therefore, this project implements the Itoh-Tsujii algorithm using a polynomial basis

and compares the performance of Itoh-Tsujii algorithm with EEA in the same bases.

Let p be a prime and let a be an integer satisfying GCD(a, p) = 1. Then:

ap−1 ≡ 1( mod p) (3.1)

or

a× ap−2 ≡ 1( mod p) (3.2)

Hence we can conclude the inversion of any integer a over GF (p) is ap−2.

For example, in GF (5), the numbers chosen for a is 3. Then the inversion of 3 over

GF (5) is 35−2 = 2 ( mod 5) and 2× 3 = 1 ( mod 5).

Expanding this technique to GF (2m), we can write a2m−1 = a× a2

m−2= 1.

Hence:

a−1 = a2m−2. (3.3)

3.2 Itoh-Tsujii algorithm

Itoh-Tsujii algorithm is based on Fermat’s little theorem, by which the inverse of an

element a ∈ GF (2m) is computed by a−1 = a2m−2.

A straightforward implementation of Equation 3.3 requires m−2 multiplications and

m− 1 squarings. The Itoh-Tsujii algorithm reduces the number of multiplication to

log2(m − 1) + HW (m − 1) − 1, where HW (m − 1) is the Hamming weight of the

binary representation of m−1 and the number of required squaring is m−1[10]. This

remarkable saving on the number of multiplications is based on the observation that

the inverse can be rewritten from [19] as:

13

a−1 = [Sm−1(a)]2 (3.4)

where Sk = a2k−1 ∈ GF (2m) and k ∈ N . Let k, j be two positive integers. Then, the

element Sk+j ∈ GF (2m) can be expressed as:

Sk+j = Sk2j · Sj = Sj

2k · Sk (3.5)

In Itoh-Tsujii algorithm, an addition chain is used to reduce the number of multi-

plications required and perform this field exponentiation more efficiently. Addition

chain for an integer value such as m−1 is a series of positive integers with t elements

such that, C={c1, c2, · · · , ct}. Algorithm 3.1 and the flowchart in Figure 3.1 show

how to compute the addition chain. Given f(x) of m degree, we have c1 = 1 and

ct = m− 1. If ci is even, ci−1 = ci/2 and if ci is odd, ci−1 = ci − 1.

Hence, to compute a−1, we should use the Equation 3.3 and an addition chain con-

structed using Algorithm 3.1 to achieve Sm−1(a) = a2m−1.

Itoh-Tsujii Algorithm is illustrated in Algorithm 3.2 and the flowchart of it is shown in

Figure 3.2. Considering Equation 3.4, we can compute the inverse of a by calculating

the square of Sm−1(a). Therefore, Algorithm 3.2 iteratively computes the Si coeffi-

cients in the order stipulated by the addition chain. In the final iteration, after having

computed the coefficient St = a2m−1−1, the algorithm returns the required multiplica-

tive inversion by performing a regular field squaring, namely, St2 = a2

m−2= a−1. The

inverse of a is St2.

It has been shown that the maximum number of multiplication in this method is t

and the required number of square operation is m − 1, where t is the step-length of

the addition chain for m− 1 [9].

The advantage of Fermat’s little theorem based inversion algorithm is that it can be

implemented just by using multiplication and squaring. This eliminates the need to

add any extra components, such as dividers.

14

Algorithm 3.1 Finding the addition chain

Input: Polynomial a(x) of m− 1 degree & f(x) of m degree.Output: An addition C of length t.1: i← 1, C(i)← m− 1.2: while C(i) >1 do3: if C(i) mod 2 == 0 then4: C(i + 1)← C(i)/25: else6: C(i + 1)← C(i)− 17: end if8: i← i + 19: end while

Algorithm 3.2 Itoh-Tsujii Algorithm [19]

Input: Polynomial a(x) of m− 1 degree & f(x) of m degree.Output: a(x)−1 mod f(x).1: S0 ← a(x).2: for i from 1 to t do3: Si = (Si1)

2Ci2 × Si2

4: end for5: a−1(x)← (St)

2

Figure 3.1: Flowchart of Itoh-Tsujii Algorithm

15

3.3 An example of Itoh-Tsujii algorithm

The inversion operation for GF (2233) has been illustrated with an example.

To calculate a−1 in GF (2233) with m = 233, we use the addition chain

C = {C(t), · · · , C(2), C(1)}

with t elements.

From Algorithm 3.1 , we have C(1) = m − 1 = 232. Since C(1) = 232 is an even

number, then, C(2) = C(1)/2 = 116. If C(i) is odd, C(i + 1) follows the rule that

C(i + 1) = C(i)− 1.

Therefore, we obtain the addition chain with length t = 10 :

C={1, 2, 3, 6, 7, 14, 28, 29, 58, 116, 232 }.

Table 3.1: Inverse of a ∈ GF (2233) using an addition chain [1]

Step SV i(a) SVj+Uk(a) Exponentiation

1 S1(a) a

2 S2(a) S1+1(a) (S1)21S1 = a2

2−1

3 S3(a) S2+1(a) (S2)21S1 = a2

3−1

4 S6(a) S3+3(a) (S3)23S3 = a2

6−1

5 S7(a) S6+1(a) (S6)21S1 = a2

7−1

6 S14(a) S7+7(a) (S7)27S7 = a2

14−1

7 S28(a) S14+14(a) (S14)214S14 = a2

28−1

8 S29(a) S28+1(a) (S28)21S1 = a2

29−1

9 S58(a) S29+29(a) (S29)229S29 = a2

58−1

10 S116(a) S58+58(a) (S58)258S58 = a2

116−1

11 S232(a) S116+116(a) (S116)2116S116 = a2

232−1

The computational process is illustrated in Table 3.1, Vi are the integers in the addi-

tion chain and Vj = Vi−1, Vi = Vj + Uk.

From Equations 3.4 and 3.5, we have:

SVj+Uk=SVj

2j · SUk, where SVi

= a2Vi−1

.

Thus, we can rewrite SV i(a) as:

SVi(a) = SVj+Uk

=SVi−1+Uk= SVj

2VjSUk= a2

Vi−1.

16

As shown in Figure 3.2, we obtain the value of S233 and the inverse of a is (S233)2.

It may be noted that the Itoh-Tsujii Algorithm for field GF (2m) requires a high

number of squarings. The large number of squarings required leads to efficiency

decrease.

17

Chapter 4

MATLAB Implementation

4.1 MATLAB results

Inverters based on EEA and Itoh-Tsujii Algorithms are implemented in MATLAB

as shown in Appendix A. In MATLAB, a polynomial is represented as a vector.

For instance, to calculate the inverse of a(x) = x4 + x2 + 1 with an irreducible

polynomial f(x) = x5 + x2 + 1 in GF (25), we input a(x)=[(MSB) 1 0 1 0 1 (LSB)]

and f(x)=[(MSB) 1 0 0 1 0 1 (LSB)]. After implementing the MATLAB code, the

results of EEA and Itoh-Tsujii algorithms are both [(MSB) 1 1 0 1 0 (LSB)] which

implies the inverse a−1(x) of a(x) is x4 + x3 + x.

To compare the performance of inverters based on EEA and Itoh-Tsujii Algorithms,

the timeit function and the stopwatch timer function, namely, tic and toc functions

are used to time how long the MATLAB code of EEA and Itoh-Tsujii algorithms take

to run.

Table 4.1 and 4.2 presents the MATLAB implementation time of the EEA and Itoh-

Tsujii algorithms in two different processor. The platform used in Table 4.1 is a Dell

Optiplex 9020 computer with a 4th generation Intel Core-i7-4790 3.6 GHz quad-core

processor, 16 GB of RAM. The platform used in Table 4.2 is a Sony SVF142171SCW

computer with a 3nd generation Intel Core-i5-3337 1.8 GHz dual-core processor, 8

GB of RAM. The results show that performance of both algorithms on a quad-core

processor is more satisfied than that on a dual-core processor.

18

With the key length increases the execution time increases within an acceptable range.

The performance of the EEA based inverters implemented in MATLAB is considered

promising from the result.

The execution time of Itoh-Tsujii algorithm increases as the key size m increases. For

m smaller than 23, the Itoh-Tsujii algorithm performs efficiently. However, when the

key size becomes greater than 23, the performance of Itoh-Tsujii algorithm is not as

efficient as EEA.

Table 4.1: Execution time of EEA and Itoh-Tsujii Algorithms on a quad-core proces-sor

m 7 20 23 25 27 31

EEA/Time(ms) 2 11 14 15 16 18

Itoh-Tsujii/Time(ms) 3.2 28 70 160 356 1007

Table 4.2: Execution time of EEA and Itoh-Tsujii Algorithms on a dual-core processor

m 7 20 23 25 27 31

EEA/Time(ms) 3 25 29 33 34 35

Itoh-Tsujii/Time(ms) 12 49 118 375 845 2345

4.2 Analysis and comparison

For an efficient implementation of ECC, it is very important to carry out finite field

operations faster and use lesser resources. The inversion operation consumes most

of the time and resources. Therefore, the speed of inversion has a great impact on

the computation time of ECC. EEA and Itoh-Tsujii algorithms have been effective

in achieving fast inversion.

The Extended Euclidean Algorithm finds the inverse in binary fields using repeated bi-

nary polynomial division operations. Since performing the division is time-consuming,

19

the EEA replaces the division with shifts and subtractions which can be implemented

efficiently.

The Itoh-Tsujii algorithm performs inversion by a sequence of multiplication and

squaring. In order to reduce the number of multiplications, an addition chain can be

used to carry out the computation of the multiplicative inversion. With the addition

chain, the Itoh-Tsujii algorithm computes the inverse in less time using a recursive

re-arrangement of finite field operations.

In relation to speed, EEA based inverters yield an efficient way to compute inverse in

the binary field since EEA based inverters mainly use shifts and subtractions. Itoh-

Tsujii algorithm has a higher computational complexity than EEA since it requires

many multiplication and squaring operations to compute inverse. Therefore, EEA

based inverters take less computational work in polynomial bases. The results reveal

that they both are very efficient for the key size smaller than 23. But when the key

size becomes greater than 23, EEA based inverters are much faster.

20

Chapter 5

Conclusion

Finite field arithmetic is used in a variety of applications, including in coding theory

and cryptography. Compared to other arithmetic operations in finite fields, the in-

version is the most time-consuming operation. Efficient implementation of inversion

would therefore be a challenging problem. In general, the most common methods to

compute inversion are based on Itoh-Tsujii algorithm and EEA.

The Euclidean Algorithm is a set of instructions for finding the greatest common

divisor of any two positive numbers. EEA is an extension of Euclidean Algorithm

that computes the greatest common divider and finds the multiplicative inverse using

repeated division operations.

The Itoh-Tsujii Algorithm is based on Fermat’s little theorem. This algorithm per-

forms the inversion by a series of multiplications and squarings. In order to reduce

the number of multiplications, Itoh-Tsujii Algorithm uses addition chain to perform

inverse more efficiently.

To perform inversion in finite field, some other schemes have been proposed such as

Wiener-Hopf equation based inverters. Morii [20] proved that solving the discrete

time Wiener-Hopf equation is equivalent to performing division over finite fields. The

hardware efficiency of these inverters is not comparable with Itoh-Tsujii and Extended

Euclidean based inverters.

This report provides a literature review of inverters base on EEA and Itoh-Tsujii

Algorithm. These two common classes of inverters which are widely used for the

21

cryptographic purpose are illustrated with examples. The MALAB implementation

of EEA and Itoh-Tsujii Algorithm has been presented in this report. The execution

time in MATLAB shows that the EEA is more efficient than Itoh-Tsujii algorithm in

polynomial bases.

For the future work, the optimization of inverters based on Itoh-Tsujii Algorithm

with large key size might be a reasonable starting point. Finding the parallel version

of the Itoh-Tsujii algorithm or using an optimal addition chain might be useful to

speed up performance.

22

Appendix A

function [q,r]=func_divide(A,F)

% F is fivided by A

% obtain the quotient and the remainder

q=1;

r=1;

da=func_poly_degree(A);

df=func_poly_degree(F);

if da>df

q=0;

r=F;

elseif da==0

q=F;

r=0;

else

while (df>=da)

if r==0

break

end

%Caculate the remaider

23

dfa=df-da;

B=[A,zeros(1,dfa)]; % left shift A by amount of dfa

r=xor(B,F);

r=deletezeros(r);

%Caculate the quotient

dfr=func_poly_degree(F)-func_poly_degree(r);

if(func_poly_degree(r)>=da)

q=[q,zeros(1,dfr)];

q(end)=1;

else

q=[q,zeros(1,dfa)];

end

F=r

df=func_poly_degree(F);

end

end

end

function inv_a=func_EEA(A,F)

% Extended Euclidean algorithm

% Algorithm finds the inverse of an element A in F_{2^m}.

% F is the primitive polynomial

FF=F;

AA=A;

g1=0;

g2=1;

r=1;

C=F;

24

while 1

[q,r]=func_divide(A,F)

if r==0

break

end

g3=g1;

g1=g2;

B=func_poly_mult(g2,q);

delta=func_poly_degree(B)-func_poly_degree(g3);

g3=[zeros(1,delta),g3];

g2=xor(B,g3);

F=A;

A=r;

end

inv_a=g2; % The inverse of A

% test if A mutiplied by the inverse of A equals 1

mul=func_poly_mult(inv_a,AA);

[q,r]=func_divide(FF,mul); % mod F

if r==1

fprintf(’\n inv_a*a=1, the answer is correct\n’)

end

end

function inv_a=main_inv(A,F)

%Itoh-Tsuji Algorithm

m=length(F)-1;

i=1;

25

b=cell(20);

%Generate addition chain c(i)

c(i)=m-1;

while c(i)>1

if (mod(c(i),2)==0)

c(i+1)=c(i)/2;

i=i+1;

else

c(i+1)=c(i)-1;

i=i+1;

end

end

b{c(i)}=A;

while i>1

l= c(i-1)-c(i);

p=func_square( b{c(i)} , l,F );

b{c(i-1)} = func_poly_mult(p,b{l},m,F);

i=i-1;

end

%inverse of A is

inv_a=func_poly_mult(b{c(i)},b{c(i)},m,F);

% test if A mutiplied by the inverse of A (inv_a) equals to 1

p=func_poly_mult(inv_a,A,m,F); % p=a*a^(-1)

flag = find(p~=0 );

p = p(flag:end) % remove leading zeros

if(p==1)

fprintf(’\n inv_a*a=1, the answer is correct\n’)

end

26

Bibliography

[1] A. A. Zadeh, “Division and inversion over finite fields,” in Cryptography and

Security in Computing. InTech, 2012.

[2] D. Hankerson, A. J. Menezes, and S. Vanstone, Guide to elliptic curve cryptog-

raphy. Springer Science & Business Media, 2006.

[3] I. H. Hazmi, F. Zhou, F. Gebali, and T. F. Al-Somani, “Review of elliptic curve

processor architectures,” in Communications, Computers and Signal Processing

(PACRIM), 2015 IEEE Pacific Rim Conference on. IEEE, 2015, pp. 192–200.

[4] A. K. Daneshbeh and M. A. Hasan, “A class of unidirectional bit serial sys-

tolic architectures for multiplicative inversion and division over GF(2m),” IEEE

Transactions on Computers, vol. 54, no. 3, pp. 370–380, 2005.

[5] Z. Yan, D. V. Sarwate, and Z. Liu, “High-speed systolic architectures for finite

field inversion,” Integration, the VLSI Journal, vol. 38, no. 3, pp. 383–398, 2005.

[6] A. P. Fournaris and O. Koufopavlou, “Applying systolic multiplication–inversion

architectures based on modified Extended Euclidean algorithm for GF(2k) in el-

liptic curve cryptography,” Computers and Electrical Engineering, vol. 33, no. 5,

pp. 333–348, 2007.

[7] C. Rebeiro, S. S. Roy, D. S. Reddy, and D. Mukhopadhyay, “Revisiting the Itoh-

Tsujii inversion algorithm for FPGA platforms,” IEEE Transactions on Very

Large Scale Integration (VLSI) Systems, vol. 19, no. 8, pp. 1508–1512, 2011.

[8] L. Li and S. Li, “Fast inversion in GF(2m) with polynomial basis using optimal

addition chains,” in Circuits and Systems (ISCAS), 2017 IEEE International

Symposium on. IEEE, 2017, pp. 1–4.

27

[9] F. Rodrıguez-Henrıquez, G. Morales-Luna, N. A. Saqib, and N. Cruz-Cortes,

“Parallel Itoh–Tsujii multiplicative inversion algorithm for a special class of tri-

nomials,” Designs, Codes and Cryptography, vol. 45, no. 1, pp. 19–37, 2007.

[10] B. Rashidi, R. R. Farashahi, and S. M. Sayedi, “High-performance and high-

speed implementation of polynomial basis itoh–tsujii inversion algorithm over gf

(2 m),” IET Information Security, vol. 11, no. 2, pp. 66–77, 2017.

[11] J. Vliegen, N. Mentens, J. Genoe, A. Braeken, S. Kubera, A. Touhafi, and I. Ver-

bauwhede, “A compact FPGA-based architecture for elliptic curve cryptography

over prime fields,” in Application-specific Systems Architectures and Processors

(ASAP), 2010 21st IEEE International Conference on. IEEE, 2010, pp. 313–

316.

[12] M. Olofsson, VLSI Aspects on Inversion in finite fields. Department of Electrical

Engineering, Linkopings university, 2002.

[13] I. H. Hazmi, “Project: EEA-based polynomial inversion over GF(2m): FPGA

design and implementation,” ECE, University of Victoria, Tech. Rep., 2015.

[14] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicative inverses in

GF(2m) using normal bases,” Information and computation, vol. 78, no. 3, pp.

171–177, 1988.

[15] A. Ibrahim, F. Gebali, and T. F. Al-Somani, “Systolic array architectures for

Sunar–Koc optimal normal basis type ii multiplier,” IEEE Transactions on Very

Large Scale Integration (VLSI) Systems, vol. 23, no. 10, pp. 2090–2102, 2015.

[16] F. Gebali and T. Al-Somani, “Finite field multiplication using reordered normal

basis multiplier,” in Broadband and Wireless Computing, Communication and

Applications (BWCCA), 2011 International Conference on. IEEE, 2011, pp.

320–326.

[17] D. J. Bernstein and T. Lange, “Type-II optimal polynomial bases,” in Interna-

tional Workshop on the Arithmetic of Finite Fields. Springer, 2010, pp. 41–61.

[18] B. Sunar and C. K. Koc, “An efficient optimal normal basis type II multiplier,”

IEEE Transactions on Computers, vol. 50, no. 1, pp. 83–87, 2001.

28

[19] F. Rodriguez-Henriquez, N. Cruz-Cortes, and N. Saqib, “A fast implementation

of multiplicative inversion over GF(2m),” in Information Technology: Coding and

Computing, 2005. ITCC 2005. International Conference on, vol. 1. IEEE, 2005,

pp. 574–579.

[20] M. Morii, M. Kasahara, and D. L. Whiting, “Efficient bit-serial multiplication

and the discrete-time Wiener-Hopf equation over finite fields,” IEEE Transac-

tions on Information Theory, vol. 35, no. 6, pp. 1177–1183, 1989.

Study of Extended Euclidean and Itoh-Tsujii Algorithms in ...

Documents