buell/Public_Data/reconfigurable_papers/... · Web viewReconfigurable Computer. By. Pradeep Kancharla. Bachelor of Engineering. Osmania University, 2001...

The Advanced Encryption Standard on aReconfigurable Computer

By

Pradeep Kancharla

Bachelor of EngineeringOsmania University, 2001

-------------------------------------------------------------------

Submitted in Partial Fulfillment of the

requirements for the Degree of Master of Science

in the Department of Computer Science and Engineering

University of South Carolina

2003

____________________________ ____________________________

Department of Computer Science Department of Computer Scienceand Engineering and EngineeringDirector of Thesis First Reader

____________________________ ____________________________

Department of Computer Science Dean of the Graduate School and Engineering Second Reader

ACKNOWLEDGEMENTS

I would like to express my deepest gratitude to my advisor Dr. Duncan A Buell for his untiring guidance and encouragement which made this thesis possible. I would like to thank my research group, the Reconfig, for their support during the preparation of the thesis. Last, but not the least, I wish to express my deepest appreciation and gratitude to my parents, sister in India for all their love and unfailing support throughout this years.

ii

Table of Contents

1. The Advanced Encryption Standard . . . . . . . . . . . . 01

2. The HC 36m – A Reconfigurable Computer. . . . . . .12

3. VHDL Implementation. . . . . . . . . . . . . . . . . . . . . . . 16

4. Viva Implementation . . . . . . . . . . . . . . . . . . . . . . . . 25

5. Results and Conclusions. . . . . . . . . . . . . . . . . . . . . . 43

6. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52

7. Appendix A: C code used for testing. . . . . . . . . . . . 57

8. Appendix B: VHDL Implementation . . . . . . . . . . . 62

iii

List of Figures

1. Example of 128 bit State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 02

2. Pseudo-C Code for Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 03

3. Pseudo-C Code for Decryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 04

4. Affine transformation in ByteSub. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 05

5. Polynomial multiplication using Matrices. . . . . . . . . . . . . . . . . . . . . . . . . 06

6. Pseudo-C Implementation of Key Schedule. . . . . . . . . . . . . . . . . . . . . . . . 08

7. Quad Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

8. Architecture of HC 36m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

9. Corelib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

10. Snapshot of Viva. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

11. Input from a file to an input horn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

12. Format of a file input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

13. Lookup table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

14. Mux with pathnames given manually. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

15. Mux with pathnames pointing to a pointer. . . . . . . . . . . . . . . . . . . . . . . . . 28

16. Setting a file pointer to a specific path. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

17. Design of a round in Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

18. Design of a round in Decryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

19. Design of a round in Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

20. Multiplication by x of ‘02’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

21. cmmix object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

22. Design of a round in Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

23. Design of a round in Decryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

iv

List of Tables

1. Number of rounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 03

2. Offsets of rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 05

3. Par Report for VHDL implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4. Results of rounds and stages in iterative approach . . . . . . . . . . . . . . . . . . . 44

5. Results of rounds and stages in Non -iterative approach . . . . . . . . . . . . . . . 45

6. Architectures implemented on HC 36m. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7. Results of architectures in Viva 2.2 and Viva 2.3. . . . . . . . . . . . . . . . . . . . . 47

8. Throughput for different architectures ……………………………………..49

9. Other Implementations on Virtex family chips ……………………………50

v

Chapter 1: The Advanced Encryption Standard

Introduction:

In a world where many transactions are done over networks, attacks on the security of the

data over the network have become a major concern. Cryptography is used as a tool to

counter these attacks. With ever expanding technology and the increase in speeds of

microprocessor chips, DES (the Data Encryption Standard) had, by the late 1990s

become obsolete.

In 1997, the United States National Institute of Standards and Technology (NIST)

initiated a new Encryption standard, called the Advanced Encryption Standard, which

was to replace DES as the Federal Information Processing Standard (FIPS). In October

2002, after an extensive search process, a block cipher algorithm Rijndael was accepted

as the new Advanced Encryption Standard. The algorithm was designed by Vincent

Rijmen and Joan Daemen.

vi

The Rijndael Algorithm:

Rijndael is an iterated block cipher and can have variable block and key lengths. The

block length and the key length can each be any of 128, 192, or 256 bits. The block and

the intermediate cipher can be envisioned as a two-dimensional array of four rows called

the State. The number of columns varies depending on the bit length.

a1 a5 a9 a13

a2 a6 a10 a14

a3 a7 a11 a15

a4 a8 a12 a16

Fig 1: Example of 128 bit State

All the operations in Rijndael are performed either on bytes or on 4-byte words, where

bytes represent elements in the finite field, or Galois field, GF (28). The 4-byte words are

the columns of the State. The key is also viewed in the format above. The input to the

cipher, also known as plaintext, is a one-dimensional array of 16, 24, or 32 bytes

depending on the block size. These bytes are mapped into the States in column order.

For example, in the case of a 128-bit block, the bytes of the plaintext are filled into the

cells in the order a1, a2, a3, a4, a5 … a16. The key is also filled into a two-dimensional

array in the same manner.

Rijndael is an iterative algorithm. A different key derived from initial key is used in each

of the iterations, called a round. The number of rounds depends on the key and block

vii

lengths. The following table gives the number of rounds to be performed based on the

block length (BL) and key length (KL) in terms of bits.

Number of rounds BL = 128 BL = 192 BL = 256KL = 128 10 12 14KL = 192 12 12 14KL = 256 14 14 14

Table 1: Number of Rounds

Each round except the final round consists of four different transformations. They are

ByteSub, ShiftRow, MixColumn and the Round Key Addition. The final

round does not contain MixColumn. The Round Key, which is used in the Round

Key Addition, is derived from the cipher key through a process called Key

Schedule. This can be done initially before the rounds or in parallel with the rounds.

The algorithm for Encryption and Decryption is given in pseudo-C code below. The

number of rounds in the code depends on the bit lengths of key and plaintext.

Key Schedule (Initial Cipher Key, Expanded Round Key);Round Key Addition (State, Round Key);For (I = 0; I < Number of Rounds; I ++)

{ ByteSub (State); ShiftRow (State); if (! Final Round) MixColumn (State); Round Key Addition (State, Round Key); }

Fig 2: Pseudo-C code for Encryption

viii

Key Schedule (Initial Cipher Key, Expanded Round Key) For (I = 0; I < Number of Rounds; I ++) { Round Key Addition (State, Round Key); if (I! = 0) InvMixColumn (State); InvByteSub (State); InvShiftRow (State); } Round Key Addition (State, Round Key);

Fig 3: Pseudo-C code for Decryption

The Key Schedule can be done either before the rounds or in parallel with the rounds.

In the Key Schedule the initial key is expanded to the length of block length

multiplied by one greater than the number of rounds. This will produce a different set of

key for each round which is used in Round Key Addition. As the Decryption is just

an inverse of Encryption, our emphasis will be on Encryption with a further explanation

of the differences for Decryption whenever required.

ByteSub Transformation:

This transformation works independently on each of the cells of the State. The

transformation consists of two parts. First, the multiplicative inverse of the byte is

calculated, followed by an affine transformation. The affine transformation to be applied

is given below:

ix

=

+

Fig 4: Affine transformation in ByteSub [4]

All the operations are done in GF (28). The multiplicative inverse is taken as ‘00’ mapped

onto itself. In the case of Decryption, called InvByteSub, an inverse of the affine

mapping done above is applied followed by taking the multiplicative inverse.

Since the bitwise operations in GF(28) are hard to implement in software, a different

approach is used in the actual implementation.

ShiftRow Transformation:

This transformation is applied independently to all the four rows. Each row is cyclically

shifted left by a different offset. The first row is not shifted at all. The offsets of each

row are determined by the block length. The following table gives the offsets in terms of

columns to be moved for varying block sizes.

Shift offsets Row 2 Row 3 Row 4BL = 128 1 2 3

x

BL = 192 1 2 3BL = 256 1 3 4

Table 2: Offsets of rows based on block lengths

In case of Decryption, called InvShiftRow, the rows are shifted back to nullify the

effect. That is, the rows are cyclically shifted left with offset equal to number of columns

of State minus the offset for Encryption.

MixColumn Transformation:

This transformation is applied independently on each column of the State. Each column

of the State is treated as a polynomial. For example, the first column in Fig1 can be

treated as a1x +a2x +a3x+a4. This polynomial is multiplied by a fixed polynomial

given by e(x)=03x +02x +01x+01, modulo x +1, in GF(28).

This can be done in matrix multiplication as follows:

=

Fig 5: Polynomial multiplication using matrices [4]

In the case of Decryption, called InvMixcolumn, each column is multiplied by the

polynomial d(x)=0Bx +0Dx +09x+0E, so that e(x) d(x) = 1.

xi

Round Key Addition:

In this transformation, the Round Key is added to the State. Addition in GF(28) is a

simple bit wise XOR. The round key is of the same length of the State. It is derived from

the initial cipher by means of Key Schedule.

Key Schedule:

The Key Schedule is the process of deriving the Round Key for each round from the

initial cipher key. This involves expansion of the initial key followed by selection of the

key for each round. The Round Key Addition is done once every round and an

additional Round Key Addition is done, before the rounds in the case of

Encryption, and after in Decryption. Since Round Key should be the same length as

Block, the total number of Round Key bits, called the Expanded Round Key, must

be the block length times one greater than the number of rounds. A pseudo-C

implementation of Key Schedule is explained below. The expanded key can be viewed as

an array of 32-bit words represented as W[nb*(nr+1)], where nb is the number of

columns in the State and nr is the number of rounds.

Key expansion is done differently for different key sizes. Let nk be the number of 32 bit

words in the key. The functions subbyte takes the 32-bit word and does a byte

xii

substitution on each of the bytes and returns a 32-bit word. The rotbyte performs a left

cyclic permutation by bytes on the input. The Col function returns a 32 bit words

packed from the bytes given as input. We can see that the Expanded Key also contains

the initial cipher key in its original form.

The function rcon(i) is Col(Rc[i],‘00’, ‘00’, ‘00’). Rc[i], also called

the round constant, is given by the following formula

Rc[1] = ‘01’

Rc[i] = ‘02’i-1

For (i = 0; i < nk; i ++) W[i] = Col(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]); For( i = nk; i < nb * ( nr + 1) ; i ++) { temp = W[i – 1]; if (nk <=6) if (i % nk == 0) temp = subbyte(rotbyte(temp)) ^ rcon(i/nk); else { if (i % nk == 0) temp = subbyte(rotbyte(temp)) ^ rcon(i/nk); else if (i % nk == 4) temp = subbyte(temp); } W[i] = W[i – nk] ^ temp;}

Fig 6: Pseudo-C implementation of Key Schedule [4]

For Encryption the necessary round bits are taken from W starting from index i = 0. For

Decryption it is the reverse. The round key taken for the last round will be used in the

first round in Decryption in the same order of bits.

xiii

The Galois Field GF(2 m ) :

A Galois Field GF (q) is a field with q elements, also called a finite field because there is

a finite number (q) of elements. A Primitive Element of GF(q) is an element ‘a‘ such that

every field element except zero can be expressed as a power of a . Each Galois Field has

at least one primitive element. If q = 2m, where m is any integer and 2m-1 is prime, the

elements of the field can be represented by polynomials whose coefficients are elements

of the field GF(2) that is 0 and 1. The primitive element of such a field would itself be

such a polynomial.

Arithmetic in GF (2 8 ):

As we see above, all the arithmetic is done at the byte level. In GF (28), the addition of

bits 1 and 1 is 0. This arithmetic cannot be implemented in software using standard

functions such as multiplication and division for finding the product and other values like

the multiplicative inverse. Manipulation of bits in software is complex and hard to debug.

Fortunately, however, since the Galois Field is represented as 8-bit values, the possible

input values for any unary operation will be one of 256 values. This can be utilized in

dividing the complex operation into a series of unary operations and performing the

unary operations using lookup tables instead of performing the actual arithmetic itself.

xiv

For example, we can do the multiplication by using the logarithm and antilogarithm

functions. Taking the logarithm of the multiplicand and multiplier can be done in one

step using a linear array of 256 values. Then these values can be added bitwise, which is

an XOR operation. The antilog can then be obtained by using another lookup table.

Arithmetic can thus be done much more easily at the cost of extra memory.

C code:

We are using a C implementation of the algorithm to test the results of the VHDL and

Viva implementation. The code is taken from Daemen and Rijnmen [3]. This

implementation is done using look-up tables. The lookup tables are stored as linear

arrays. The lookup tables are used in the ByteSub, Key Schedule and

MixColumn transformations. The independent operations on different elements in each

stage are done iteratively. The State is stored in a two dimensional array.

In the case of ByteSub, a for loop is used to iterate over all rows and columns of the

State. The transformation is done using a linear array of 256 elements from which the

input is used as an index to the array containing the transformed values. Thus the whole

transformation, of finding a multiplicative inverse and applying an affine transformation,

is done using a single lookup.

xv

The ShiftRow transformation is done based on the bit lengths. The shifts of each row

are stored in an array. Based on whether the algorithm is in an Encryption or Decryption

stage, appropriate shifts are fetched from the array and the rows are shifted accordingly.

The MixColumn transformation is implemented by running a for loop on the number

of columns. The multiplications are done using the log and antilog lookup tables. The log

and antilog values of all of the possible 256 inputs are stored in two linear arrays. The

appropriate value is retrieved using the index. Thus the multiplication can be done by

using three lookups (two log values and one antilog value) and an addition. This will

avoid doing the complex bit wise manipulations involved in actual Galois Field

arithmetic. The polynomial to be multiplied is stored as constants in the program.

The Round Key Addition is done using an XOR. The Key is passed as a two-

dimensional array and a for loop is used to iterate on all the cells of the State.

The Key Schedule uses two lookup tables for doing the rotation and substitution. A

three-dimensional array is used to store the Expanded Key. Key selection is done on the

primary index. The Key schedule is done before the rounds, and all the key bits are

stored in an array which is used in Encryption as well as Decryption.

The C code is used just to check the results of our other implementations in VHDL and

Viva®. We implemented the algorithm in different ways to evaluate their resource usage

and timing. The Code is added as an appendix A.

xvi

Chapter 2: HC36m – A Reconfigurable Computer

The platform we are targeting is an HC 36m Hypercomputer® developed by Star Bridge

Systems. The reconfigurable resources on the Hypercomputer comprise five Xilinx

Virtex-II 6000 and two Virtex-II 4000 FPGA chips organized in a proprietary manner.

The processing capability of this architecture is built upon four Processing Elements

(PEs). Each PE is a Xilinx Virtex-II 6000 FPGA chip connected to four DDR RAM

modules each of 512MB with a 90-bit wide communication link. The four PEs are

arranged in a “Quad Structure” passing through a cross-point, which is another Virtex-II

6000 chip with a 50-bit wide communication link to each PE. The Virtex-II 4000 chips

serve as a bus controller and a router. The 2.4GHz Xeon Processors on the host are

connected to the FPGA interface through a 64-bit bidirectional PCIX bus running at

66MHz. If the data to be sent is more than the available bit width, the PCIX bus muxes

the data to be sent.

xvii

Fig 7: Quad Structure [20]

.

Fig 8: Architecture of HC 36m [20]

The HC 36m comes with a development environment called

Viva®. Viva provides a graphical editor for designing

applications, which are then synthesized by Viva and mapped

onto hardware using Xilinx tools. The design need not be

constrained to a single chip, since Viva is capable of mapping

designs onto more than one chip. Viva also comes with a rich

library of objects which can be used in the design of

xviii

applications. A snapshot of the library objects is shown in the figure on the right. The

current version of the library comes in a sheet called corelib.

Fig 9: Corelib

The I2ADL editor provides a graphical interface for creating applications. A design can

be stored as a sheet. The sheets can be made into objects to be reusable in other designs.

Thus a user can create his/her own library of objects and reuse them just by loading the

sheet with its objects and dragging the objects onto the new sheet.

xix

Fig 10: Snapshot of Viva.

There are three more editors in Viva: the Data Set Editor, used to create new data sets;

the Resource Editor used for allocating resources; and the System Editor used to

manipulate constraints such as the EDIF file to be compiled, the system descriptions, the

clock period, and so forth. The object-oriented paradigm allows one to build designs

hierarchically, thus decreasing the complexity.

xx

The most important concept for any programming language, however, is debugging, and

debugging in Viva can be very difficult. The error messages given by Viva, for example,

have not been very useful. This makes programming difficult if something goes wrong.

The “widget interface” is not really sufficient for hardware designs. It would be more

useful if there were a way to see the timing diagram for a design on the hardware.

The current Viva version is Viva 2.3. This version has some enhancements over previous

versions in terms of synthesis time, but many of the existing designs that synthesized and

executed under previous versions are not compatible with this new version. We have had

to make some changes in our designs in order to migrate to the new version.

Chapter 3: VHDL Implementation

xxi

A VHDL implementation of the algorithm for 128-bit key and block size has been done

to compare with Viva the results in terms of silicon usage and delay. In this

implementation, lookup tables were used rather than doing the actual GF(28) arithmetic.

The code is added as Appendix B.

The lookup tables are stored as RAMs. There are a total of four lookup tables used in the

algorithm. Those lookup tables are stored in the files sbox_ram.vhd,

alogtable_ram.vhd, logtable_ram.vhd and rbox_ram.vhd. All lookup

tables take an index as input and output the value corresponding to that index. All the

lookup tables mentioned above except that in the rbox_ram.vhd file store 256 values

needed for a unary operation in GF (28). The rbox_ram.vhd file contains a lookup

table having 30 values required for the rotation operation in Key Schedule. They are

indexed starting from zero.

Since the algorithm works in GF(28), all the variables are defined to be large_int, a

subtype of integer that allows values in the range 0 to 255 only. Other packages and type

definitions are stored in the file packages.vhd.

Key Schedule is done before the rounds start and the Expanded Key is stored in arrays.

All the operations in the transformations of the round are implemented in parallel, in

contrast to the iterated approach used in the C implementation. This uses a great deal of

silicon resource but will have a minimum delay. The code is simulated using ModelSim

xxii

[15] and was synthesized using the Xilinx ISE [26] compiler. The various entities of the

algorithm are explained in order of complexity and hierarchy.

shiftrow:

Since we are dealing with only one bit-size, this transformation can be implemented

simply by routing the inputs to the appropriate outputs, and no silicon will be used for

this transformation. The code for this is in the shiftrow.vhd file. The inputs for this

entity are sixteen values of type large_int and the outputs are merely in a shuffled

order.

roundkey:

This entity is used to do an 8-bit XOR. Since in VHDL we have only a bit wise XOR

function, the functions conv_std_logic_vector and conv_integer, available

in the ieee.std_logic_arith package, are used for conversion between an

integer and a std_logic_vector. This entity has two inputs of type large_int

and outputs a single value of same type. The implementation is in the file

roundkey.vhd.

round_roundkey:

xxiii

The round_roundkey entity takes the State and key in the form of 32 inputs of type

large_int and performs an 8-bit XOR using one of the State and key inputs. For this,

sixteen roundkey entities are used. All the XORs are implemented in parallel. The

output is 16 large_int values, which comprise the State. This entity performs the

Round Key Addition transformation in the round. The implementation is in the

round_roundkey.vhd file.

round_sbox:

This entity does the ByteSub transformation using the lookup tables (RAM_sbox). The

round_sbox entity takes State in the form of sixteen large_int inputs and passes

them through sixteen RAM_sbox entities in parallel. The output is again the State. The

code corresponding to this entity is in round_sbox.vhd file.

addcmp:

This takes two inputs of type large_int and adds them modulo 255. The output is also

a large_int. The implementation is in addcmp.vhd file. This is used primarily in

multiplication, as explained below.

multiply:

xxiv

This entity takes two values to be multiplied as input and produces the product. All the

inputs and outputs are of type large_int. The entities used for this are the two lookup

tables RAM_logtable and RAM_alogtable. One of the inputs is given as an input to

the RAM_logtable entity. The output would be the log value of the input. The other

input is itself the log value, since it is always constant. The log value is given as an input

to avoid another lookup. These two values are given as inputs to the addcmp entity. The

output of addcmp is passed to RAM_alogtable, which provides the product. Now

the inputs are checked for zeroes. If any of the inputs is zero, then the output is returned

as zero, or else the product is passed as the output. The code is in the Multiply.vhd

file.

mix:

This entity takes a column of the State shifted in different offsets. It multiplies these four

values with the constant polynomial used in the algorithm and adds the results. The

output is one cell of the State after the MixColumn transformation. For this entity the

inputs are the four large_ints and the output is a large_int. The other entities

used here are the multiply and roundkey. Since the polynomial used in the

multiplication has two coefficients of 1, multiplication with them is redundant. Thus,

only two multiplications are used to get the other two products. Later these four values

are added using the roundkey entities and the result is passed out. The code is in the

Mix.vhd file.

xxv

mixcolumn:

This entity performs the MixColumn transformation. It takes the State in the form of

sixteen large_ints and outputs the same after the transformation. For this purpose it

uses sixteen mix entities. This takes one column at a time and shifts them appropriately

and passes to them to the mix entities. The outputs of these entities are placed in the

corresponding places of the State. All the operations are done in parallel. The

implementation is in the MixColumn.vhd file.

keyshedule:

This entity takes key values for a round as an input and produces the key values for the

next round. The input is taken in the form of sixteen large_ints and the outputs are

stored in a key array. The different entities used here are roundkey,

RAM_sbox,RAM_rbox. The inputs are routed through these entities such that they

produce the desired output. The implementation is in the keyschedule.vhd file.

round:

xxvi

This constitutes a round of the algorithm. The inputs are the State and the key in the form

of 32 inputs, and the output is the State. Both inputs and outputs are of type

large_int. The State inputs are first routed through the round_sbox entity followed

by shiftrow, mixcolumn and roundkey. The key inputs are directly routed to

roundkey. The output would be the transformed State after applying one round

transformation. The implementation is in the round.vhd file.

lround:

This actually implements the last round of the algorithm, which is slightly different from

the remaining rounds. The only difference between round and lround is that the latter

does not have the mixcolumn entity. The output of the shiftrow is directly routed to

roundkey. The implementation is found in the lround.vhd file.

aes:

This entity connects all the pieces to complete the algorithm. The roundkey, round,

lround, keyschedule entities are used here. First, the key is passed as input to

keyschedule. There will be a series of ten keyschedule entities the output of each

of which is fed to the next. The initial key is fed as input to the first keyschedule. At

the end, the outputs of all the entities of keyschedule hold the Expanded Key for the

entire algorithm. The State is first passed through sixteen roundkey entities. This is the

initial Round Key Addition transformation performed prior to the rounds. Then we

xxvii

have nine round entities and one lround the output of one is passed to the other. The

output of lround is the required encrypted block. The code can be found in the file

aes.vhd.

Decryption:

Decryption is similar to Encryption, with minor differences explained below in terms of

entities for each transformation.

The first difference is the InvByteSub transformation. Instead of the RAM_sbox used

in Encryption, we use in Decryption an entity RAM_dsbox that contains the inverse of

the RAM_sbox values. The entity can be found in the file dsbox_ram.vhd file.

The InvShiftRow is the transformation that is applied to nullify the ShiftRow

transformation applied in Encryption. For this we use the entity dshiftrow which is in

the file dshiftrow.vhd. This is similar to shiftrow in the sense that it just routes

the inputs to the appropriate output to produce the effect of shifting. The shifting is done

such a way that it nullifies the shifting done in shiftrow.

The InvMixColumn differs from the MixColumn in two ways. First, there is a different

polynomial being multiplied times the State. Although the polynomial differs, it is stored

in terms of constants similar to the way done in MixColumn. The entity is

invmixcolumn and is implemented in the file invmixcolumn.vhd. The second

xxviii

difference is the mix in the Encryption. In Encryption we use a polynomial which has

two coefficients as ones. But here the polynomial does not contain coefficients as ones.

So we cannot avoid the multiply objects as in Encryption. The variation is shown in

the dmix entity in the file dmix.vhd.

The Round Key Addition transformation has no difference in the Encryption and

Decryption. We therefore use the same entities used in the Encryption for Decryption

also.

The order of the transformations in the round also changes in the Decryption. First the

input is routed to round_roundkey entity, which is followed by invmixcolumn,

dshiftrow and then by round_dsbox. The entity used is dround, and the

implementation can be seen in the file dround.vhd.

In Decryption, it is the first round, and not the last round, that differs from the other

rounds. The first round does not have invmixcolumn. The input is passed through

round_roundkey and then through dshiftrow and round_dsbox. The entity

representing this is the fround and the implementation can be seen in fround.vhd.

The key generation is similar to that of Encryption, but the keys are used in reverse order

compared to Encryption. The keys that are used for the first round are routed to the last

round in Decryption. Similarly, the key in used in the second round is used in the ninth

xxix

round in Decryption and the key used in the lround in the Encryption goes to fround

in Decryption. The entity used for this is the daes entity and is in the file daes.vhd.

All the designs are simulated using ModelSim and synthesized using Xilinx ISE.

The results of the implementation are given and analyzed in Chapter 5.

xxx

Chapter 4: Viva Implementation

Implementation of the algorithm is started by using lookup tables for multiplication.

Since the on-board memory has not been supported up to this point, we have used on-

chip memory to store the lookup tables. All the values of the lookup tables are read from

files stored on the host. These constants can be read to an input horn from files by adding

the following attributes.

Fig 11: Input from a file to an input horn

A file should exist at the location given beside the attribute Constant in the following

format.

xxxi

Fig 12: Format of file input

The value corresponding to the index attribute is used as an index to fetch the required

value from the file. The values in the files are synthesized as CONSTANTS into the

executable. In the example above, the value 99 is stored as a constant at the input horn. In

order to implement a lookup table, we can read all the values to the input horns and use a

multiplexer to get the required values. This approach has many problems, however.

There is no parameterized generate function to create all these in one step, and

opening the attributes list and adding a different value for the index and hard coding the

path name is a tedious job.

The index problem can be countered by using sixteen Mux(17,1) objects for storing

the values instead of a single Mux(257,1) object. This will allow us to use the horns

with the indexes given 0 to 15 for each Mux object instead of using all the values from 0

to 255 for each input horn.

xxxii

Fig 13: Lookup table

The 8-bit input is exposed and split into its most significant and least significant 4-bit

quantities. The LSBs are routed to all the sixteen Mux(17,1) objects. Only one of the

Muxes has the required output; this mux is selected by the other Mux object given to the

MSBs as the selective index.

The other problem faced is providing the path name to all the input horns. Initially we did

it for all the input horns as below.

xxxiii

Fig 14: Mux with pathnames given manually

Later we were made aware of an easier method for providing the file names to the

required input horns in the object. For this we create a Mux with input horns that has

Constant attributes initialized to *ROM_FILE.DTA. This will look like

Fig 15: Mux with pathnames pointing to a pointer

This Mux is then made into an object. To make this object point to a file, the object is

right-clicked and the attributes are changed as follows:

xxxiv

Fig 16: Setting the file pointer to a specific path

Initially the input values from the files were stored into registers. Although each lookup

table worked individually, there were problems with more than three lookup tables. When

we tried to synthesize more than three lookup tables, we got a C++ Exception error

followed by the corruption of the project. The frequency of this error diminished as new

versions of Viva were released, and later the mistake was corrected, resulting both in

decreased silicon usage and compilation time. Due to increasing problems with the

lookup tables we thought to import an EDIF module for some basic operations in the

algorithm. However, the EDIF generated using VHDL was not compatible with Viva. We

were later provided with a php script to do the conversion, but this did not seem to be

sufficient for our needs.

Iterative approach:

The initial implementation of an iterative approach of the algorithm was targeted at

minimal usage of silicon on the chip. There are two reasons for this. First, there was no

multi-chip communication available in Viva at that time. Our VHDL implementation

showed that a full parallel version would take two chips if Viva synthesis tool was as

xxxv

efficient as the standard Xilinx tools. Second, there were some problems encountered in

using many lookup tables. The Encryption was implemented by doing the Key

Schedule on the fly. For Decryption, the Key Schedule was done at first before

the rounds and the Expanded Key stored to be used later.

Encryption:

An iterative approach was used in the ByteSub and MixColumn transformations

inside the round and on the round also. The lookup tables required for the Encryption are

the substitution box represented by the object sbox, the Logarithm table represented as

ltable, the Antilogarithm table represented as atable, and the Rotation Box

represented as rbox. All these tables except rbox have 255 values and all are

constructed as explained above. The values are read from the files in the directories

sbox, ltable and atable placed under the directory C:\Pradeep\ on Odo

respectively. The files must be placed at that location only, since the path must be hard

coded in the design in the early versions of Viva.

Since implementation is done in an object oriented paradigm, the explanation below is

given in terms of objects created. The Encryption is a loop on the object round, which

represents a single round of the algorithm. The initial values of the key and block are

passed through the roundkey object. The output of the roundkey object and the

initial key are passed to the round object and is then looped ten times using the For

object of the Viva library. The feedback is done using the reginit objects. The

xxxvi

appropriate input to the round object for the first round and the subsequent rounds are

selected by using the N value of the For object.

round:

In essence, the round routes the data from one stage to other stage. The Key Schedule

is done on the fly as a part of the round. The inputs for the round are the block and key of

the previous round. The substituted values required for the Key Schedule are

calculated in the round_sbox object only. The N value of the outer For loop is used to

eliminate the MixColumn stage in the tenth round. It is also incremented and used as a

pointer for the rotation box of the Key Schedule. ShiftRow is implemented by

simply routing the outputs of the round_sbox object to appropriate inputs of

round_mixcolumn object.

Fig 17: Design of a round in Encryption

xxxvii

round_sbox:

The round_sbox is a loop over sbox4 that calculates the substituted values for a

column of the State. The outputs of all iterations are registered. The appropriate set of

registers is selected using the decode object. The N value of the For object is passed

into the decode which compares it with values form 0 to 4 and sets the corresponding

output bit high. The `done’ of the sbox4 object is used to give a pulse to the next input

of the For object. The inputs are muxed and passed into the sbox4 based on the N

value of the For object. The additional four values calculated are for Key Schedule.

sbox4:

This object is a loop around the sbox object and gives the substituted value for its input.

The outputs of all iterations are registered similarly as explained above.

round_mixcolumn:

This object is a loop around the mixcolumn object; it multiplies the column of the State

with a constant polynomial. The inputs are muxed and passed into the mixcolumn

object and the outputs are registered using the For object.

xxxviii

mixcolumn:

The mixcolumn object shifts the column by one for every iteration and passes them

into the mix objects, which multiply the given input with a polynomial. The output of

iteration corresponds to a cell of the output column. The output is registered using the

RegEn object based on the iteration.

mix:

This object is a loop around the multiply object. The polynomial with which the

column is to be multiplied is stored in terms of constants. In order to eliminate one table

lookup, the logarithmic values of the coefficients of the polynomial are stored instead of

the coefficients themselves. The outputs are registered and XORed after all the iterations

are completed to get the desired value.

multiply:

Two values, the coefficient of the polynomial and the other value of the State are the

inputs for multiply. The State value is passed through the ltable and the output is

added with the other input. The ADC object is used for this purpose. We need addition

modulo 255, which requires that we adjust the ADC output with the overflow bit to

obtain the desired results in all instances. The resulting value is passed through atable

xxxix

to get the product. The inputs are checked for zero. If any input is zero, then the output of

the atable is neglected and zero is passed as output.

roundkey:

The roundkey object is a collection of XOR gates that XOR the key for this particular

round with the State. All the XORs are done in parallel.

keyschedule:

Key Schedule is done on the fly in the case of Encryption. The index for the rotation box

is calculated based on the iteration. The substituted values required are calculated in the

round_sbox object itself and the values are passed to the key schedule.

The decode objects is used in almost all of the above objects. It functions as a DeMux.

A Value is passed through the Equal objects from the Viva libraries, which are initialized

to all the possible values of the input. The appropriate output based on the input is set

high.

Decryption:

The basic difference between Encryption and Decryption is the Key Schedule. The

Key Schedule is done before the rounds in this instance. The keys for all the rounds

xl

are stored in a stack-like structure, from which the key for the round is retrieved in every

iteration. All the other stages of the Decryption are similar to Encryption and require little

explanation. The round_isbox has only four iterations, since the values required for

the Key Schedule need not be calculated. The imix of the round_imixcolumn

takes a different polynomial from the one used in Encryption.

The keysh object is a loop around the keyschedule object explained above. The

outputs are packed and registered for every iteration. Later they are routed in reverse

order (since we require the keys in reverse order in Decryption) into a Mux. The selection

in the Mux is given the N value of the For loop. The rounds are started after the Key

Schedule is done.

The files corresponding to isbox of round_isbox object are stored at C:\

Pradeep\isbox on Odo.

Fig18: Design of round in Decryption

xli

Expanding the loop on the round:

Since our main aim is to use maximum resources in terms of silicon, we started by

expanding the loop on the round in order to check the efficiency of Viva in synthesizing a

larger design. For this, some changes were made to the round object explained above.

The object described above uses a Mux to eliminate the MixColumn transformation in

the final round in the Encryption and InvMixColumn transformation in the first round

for Decryption. Since we are using different objects for every round, a round object was

created with a mixcolumn object and without any Mux for all the rounds except the

last one in Encryption and the first one in Decryption. Another object lround was

created for Encryption; this is a round without a mixcolumn, and similarly in the case

for Decryption. The same approach used as above in case of Key Schedule. Key is

calculated on the fly in case of Encryption; for Decryption we used the keysh object

explained above. Both designs worked, and the results are given in the next chapter.

Non-iterative Approach:

A non-iterative approach is started by expanding the loops in ByteSub stage and also in

the MixColumn stage. Given the fact that a single lookup table took 160 slices, which is

a little less thrice the number needed in the VHDL implementation, the whole algorithm

using lookup tables cannot be done in four chips if we were to expand ByteSub and

MixColumn completely. We have thus settled for iteration on these stages. The

ShiftRow is done prior to the ByteSub to accommodate this. Then for the first

xlii

iteration the first two columns will be to plsbox8 object that has eight sbox objects.

Then the output of plsbox8 is passed to two plmix4 objects. The plmix4 object

multiplies a column with a polynomial and outputs the transformed column. The

plmix4 object has 4 plmix objects. The input for plmix4 is routed to each of these

objects by shifting them one at a time. The plmix object has four plmult objects that

multiply the coefficients. The outputs of the plmult objects are XORed to produce the

desired result. The outputs of the two iterations done on these stages are registered using

RegEn objects. The N value of the For loop which is used for iterations is used to

enable the appropriate set of registers. The Key Schedule is done in parallel to this

operation. The object corresponding to this is plkeyschedule. It used four sbox

objects for obtaining the substituted values. Once the iterations are finished, the

registered values and the output of plkeyschedule are passed to the roundkey

object to complete the round transformation. There are a total of seventy six lookup

tables in total in this round.

Fig 19: Design of round in Encryption

xliii

Implementing the multiplication in arithmetic:

Since Viva was not able to synthesize the design with many lookup tables, the

implementation was changed by replacing the lookup tables with the actual arithmetic.

Actually, as per the algorithm, we are not required to implement the whole multiplication

in the arithmetic. Since the polynomial used in Encryption and Decryption is a constant,

two objects were designed that multiply a column of the State with the polynomial used

in Encryption and Decryption.

The multiplication is done in the Galois Field GF(28). In polynomial representation, the

multiplication corresponds to a product of polynomials modulo an irreducible binary

polynomial of degree 8. The polynomial used in the algorithm is x8+x4+x3+x+1, which

can be represented in hexadecimal notation as ‘11B’.

Multiplication by the polynomial x, which can be represented in hexadecimal notation as

‘02’, is a left shift followed by a conditional XOR. If the left shift results in a carry, then

the result of the shift is XORed with ‘1B’. The polynomial used in Encryption has

coefficients ‘03’, ‘01’, ‘01’ and ‘02’. Multiplication with ‘02’ is done as explained above.

Multiplication with ‘01’ is the number itself. Multiplication with ‘03’ is split into

multiplication with ‘02’ plus multiplication ‘01’. The addition is again an XOR. The

polynomial used in Decryption has the coefficients ‘09’, ‘0B’, ‘0D’, and ‘0E’. All these

are also split in terms of powers of two and XORed at the end. For example,

xliv

multiplication with ‘09’ is split into multiplication by ‘08’ XORed with multiplication by

‘01’. Multiplication by ‘08’ is achieved by three successive multiplications by ‘02’. Since

all the coefficients are multiplied in parallel and XORed, the maximum number of shifts

done in succession is equal to three in Decryption and one in Encryption. This eliminates

a number of lookup tables, thus reducing the chip resources used.

The left shift in Viva is implemented using the RCL objects available in the corelib

library. The carryover is fed as an input to the Mux to do the conditional XOR. The

irreducible polynomial with which the result of the shift is XORed is given as a constant.

The object is named mulbyx.

Fig 20: Multiplication by x or ‘02’

The object cmmix is used to multiply a column with the polynomial to produce one

coefficient of the result. The multiplication with the polynomial in Encryption is

implemented as follows.

xlv

Fig 21: cmmix object

The complete multiplication of the polynomial with the column of the State is

implemented by shifting the column and passing it as an input to the cmmix object. The

object corresponding to that is the cmmix4 object. The MixColumn transformation is

accomplished by using four cmmix4 objects in parallel.

Since the use of arithmetic to do the MixColumn transformation reduces the silicon

usage, a full-fledged parallel implementation can be done in the round. Previously, in

case of lookup tables, both MixColumn and ByteSub stages were iterated once in

order to make two-and-one-half rounds fit on a single chip. But in that case much of the

chip is used in the MixColumn stage due to the excessive usage of lookup tables. When

these tables are eliminated, a round needs only a little more than a tenth of a chip in case

of Encryption when implemented with no iteration.

xlvi

Fig 22: Design of round in Encryption

Fig 23: Design of round Decryption

Due to enormous synthesis times, however, the whole algorithm could not be synthesized

onto one chip. Therefore, we attempted to use two chips by placing five rounds on each

chip. Although the synthesis completed, the design did not produce correct output.

Debugging was difficult as the synthesis time was about two days.

xlvii

Viva 2.3:

The initial problem with Viva 2.3 was that it did not handle files for constants. The initial

work-around proposed by Star Bridge would have required relabelling all the input horns.

Since there were 256 such horns in our initial design, this was viewed as an unacceptable

“solution.” We therefore decided to import an EDIF file generated by a VHDL

implementation. A single lookup table done in this manner took 72 slices, compared to

the 160 slices taken previously by a Viva object. Given that we had 200 lookup tables in

the entire implementation, the silicon usage was reduced by 17,600 slices, and as a result

the whole algorithm synthesized into less than half of one chip.

The implementation of the lookup table in VHDL is done using an array. The EDIF file is

generated using the fc2 compiler. This EDIF file is ported into Viva using a script written

by Heather A. Wake [25]. There are some problems with the EDIFs generated using

Synopsys, but these problems did not appear in this particular use of the Synopsys tool.

xlviii

Chapter 5: Results and Conclusions

Results on VHDL:

We used ModelSim [15] to simulate the algorithm and the Xilinx ISE tools [26] to

synthesize the code. The results for independent blocks are tabulated below. The

synthesis has been done for a Virtex2 device xc2v6000, package ff1152, speed -

4. The par statistics are generated by the Xilinx tools.

Entity Slices Percentage IOB’s Max Pin DelayLookup Table 68 1 16 6.932 nsround_sbox 1088 3 254 16.324 nsshiftrow 0 0 256 8.871 nsmixcolumn 5056 14 272 25.510 nsroundkey 128 1 384 9.989 ns

keyschedule 356 1 272 9.486 nslround 1216 3 384 14.089 nsround 6150 18 400 22.888 ns

Table 3: Par Report

Based on these results, each round would require 6150 slices. If we did a complete

parallel implementation inside each round and used different instances for each round so

that we could stream the data to pipeline different blocks of data, the total number of

slices should be 9 * 6150 + 1216 = 56566 slices, plus some overhead for data movement.

Given that one ff1152 chip contains 33792 slices, the whole algorithm when done in

parallel using lookup tables should be implementable using two chips provided that the

xlix

synthesis tools in Viva are as efficient as the standard tools. If Viva were only half as

efficient at synthesis as the standard tools, then the two chips with standard synthesis

might expand to four chips using Viva and still be feasible on the four-chip HC 36m. The

results proved otherwise, however, as will be explained later in the chapter.

Results on Viva:

Viva uses muxing to transfer data from the host to the chip and from the FPGA chip back

to the host. Since the input and output are large (around 256 bits for many of the designs),

there will be some overhead in terms of slices for moving the data from the host to the

chip. All the slice numbers listed below include overhead for input and output data

transfer.

Results for Iterative Architectures:

The following are the results for stages and rounds of iterative versions done initially. All

the results were done using a 25ns clock.

Block Slices Clock CyclesByteSub 794 50MixColumn 983 392

Round Key Addition 329 1Key Schedule 2895 130

Encryption round (iterative) 2082 444Decryption round (iterative) 1715 433Encryption round (expanded) 2050 444Decryption round(expanded) 1592 433

Table 4: Results of round and stages in Iterative approach

l

In the Encryption round (iterative), “iterative” corresponds to the loop on the round. To

be iterative means the loop is iterated in the round; “expanded” means the loop is

unrolled.

Results of Expanded Architectures:

The following are the results for of the stages and rounds used for expanded

architectures. Results for stages that use lookup tables are given for both Viva lookup

tables and VHDL imports. The reason for doing VHDL imports is that Viva 2.3 has not

yet supported File constants as did previous versions. However, importing a VHDL

module for look-up tables proved to be very advantageous in terms of silicon as well as in

making designs work. All the results use a 15ns clock. Since no design has any iteration,

they can all be completed in one clock cycle.

Block Comments SlicesByteSub Viva lookup tables 2559ByteSub VHDL lookup tables 1174MixColumn Using arithmetic 342

InvMixColumn Using arithmetic 651Round Key Addition Parallel XOR gates 329

Key Schedule For one set of keys 481Encryption round * 1 iteration, multiplication using

arithmetic, Viva lookup tables12564

Encryption round Multiplication using arithmetic, Viva lookup tables

3592

Decryption round 433 Multiplication using arithmetic, Viva lookup tables

3497

Encryption round Multiplication using arithmetic, VHDL lookup tables 1883

Decryption round 433 Multiplication using arithmetic, VHDL lookup tables 1866

Table 5: Results of round and stages in Non-Iterative approach

li

The Encryption round (denoted with an asterisk in the table) done with iteration takes

four clock cycles. A single object of the aforementioned round synthesized with no

problem in Viva, requiring 37% of the chip. Since we cannot fit three complete rounds on

a single chip, we thought to split two rounds into halves to accommodate all ten rounds

on the available four chips. Considering that each round was taking 37%, which also

includes the slices for input and output, the whole 2-1/2 rounds should have taken about

90% of the chip.

However, when we tried to synthesize two rounds on a single chip, Viva was unable to do

the synthesis, responding with an out of memory error. The diagnosis from Star

Bridge Systems was that Viva was running out of memory in search of XOR gates, which

were used extensively in the design. For this reason, this architecture cannot be expanded

for the complete algorithm.

Architectures:

Table 6: Architectures implemented on HC 36m

lii

ArithmeticNon IterativeNon IterativeDec 1 chipA7ArithmeticNon IterativeNon IterativeEnc 2 chipsA8

ArithmeticNon IterativeNon IterativeEnc 1 chipA6LookupNon IterativeOne IterationEncryptionA5 LookupNon IterativeIterativeDecryptionA4LookupNon IterativeIterativeEncryptionA3LookupIterativeIterativeDecryptionA2LookupIterativeIterativeEncryptionA1MultiplicationOn the RoundInside StagesModuleArchitecture

ArithmeticNon IterativeNon IterativeDec 1 chipA7ArithmeticNon IterativeNon IterativeEnc 2 chipsA8

ArithmeticNon IterativeNon IterativeEnc 1 chipA6LookupNon IterativeOne IterationEncryptionA5 LookupNon IterativeIterativeDecryptionA4LookupNon IterativeIterativeEncryptionA3LookupIterativeIterativeDecryptionA2LookupIterativeIterativeEncryptionA1MultiplicationOn the RoundInside StagesModuleArchitecture

The architectures A1 to A4 were done with iterations inside the stages. The A5 iteration

was actually aimed at implementing the architecture used in VHDL to compare the

resource usage and timing. But since the synthesis tool in Viva is not as efficient as

standard synthesis tools, the algorithm cannot be implemented without iterations. Worse

yet, we could not complete the full algorithm in the architecture, since Viva failed to

synthesize more than one round on a single chip (even though one round takes much less

than half a chip).

Architecture Slices Clock cycles

Comments

A1 2285 4069 Works on Viva 2.2 but not on Viva 2.3A2 4656 4480 Works both on Viva 2.2 and 2.3A3 16393 4056 Works on Viva 2.2 but not on Viva 2.3A4 14395 4077 Works both on Viva 2.2 and 2.3A5 --- --- Only one iteration works on Viva 2.2.A6 15470

1Does not synthesize in Viva2.2 but by replacing the Viva lookup tables with VHDL lookup tables synthesized in

Viva 2.3A7 18653 1 Does not synthesize in Viva2.2 but by

replacing the Viva lookup tables with VHDL lookup tables synthesized in

Viva 2.3A8 --- --- Synthesizes on Viva 2.2 but does not

give correct results. Not required on Viva 2.3

Table 7: Results of various architectures in Viva2.2 and Viva2.3

The architecture A8 was implemented when A6 and A7 failed to synthesize in Viva 2.2.

Considering the fact that a single round of this architecture took around 10% of a chip,

the whole algorithm might be synthesizable on a single chip if we consider the overhead

for input and output. For the Viva 2.3 implementation, the lookup tables were replaced by

liii

VHDL modules. Since the architectures A6 and A7 synthesized on Viva 2.3, the

architecture A8 was not tested on Viva 2.3.

It has been a source of great frustration that we have not been able to test Viva on a

reasonable full AES design. Based on the synthesis of parts of AES using Viva and on

the synthesis of part and all of AES using standard synthesis tools, there should be no

fundamental obstacle to a complete AES implementation on the HC 36m. However, the

use of Viva to implement AES in its entirety will have to wait for a later and corrected

version of the software.

Throughput:

The problem with calculating the throughput of all the architectures on the HC 36m is the

inability of Viva to support what Star Bridge Systems refers to as FILE I/O, the transfer

of data from and to files on the host through the HC 36m hardware. Also, the hardware is

presently limited to a very slow speed due to the use of a rather primitive core doing the

communication on the PCIX bus.

But if we consider the core itself as we have implemented it, rather than considering the

limitations of the machine on it is implemented, we would achieve a significant increase

in throughput in Non-iterative architectures over the basic iterative architectures.

liv

Architecture Throughput (Gbps) Frequency of the clock(MHz)

A1 0.0012 40

A2 0.0011 40

A6 8.5334 66

A7 8.5334 66

Table 8: Throughput for different architectures

The throughputs listed in the table for architectures A6 and A7 do not reflect their actual

speeds since the HC 36m cannot be run faster than 66 MHz. In order to get an estimate of

actual throughput, we decided to run both A6 and A7 on a single chip routing the output

of A6 to A7 without any intermediate registers. The design took 33,790 slices, two less

than the total slices available on a single chip, and the design ran at a 15ns clock.

Theoretically, then, both A6 and A7 should have no more than an 8 ns delay. Based on

this, the throughput of A6 and A7 can be estimated to 16 Gbps at a 125 MHz clock

frequency.

Comparisons:

In any demonstration of technology, it is necessary to compare new results against those

already achieved by others. Listed below are some of the other commercial and

academic implementations of AES done on Virtex chips in a non-iterative approach.

lv

Design Device Throughput Slices BRAMs Frequency

P. Chodowiec et al [2] Virtex XCV1000 -6 12.16 12600 80 95

SIG-AES-E [13] Virtex-E XCV1000E -8 16.54 11719 0 129.2

SIG-AES-E [13] Virtex-II XC2V2000 -5 17.80 10750 0 139.1

Helion, Pipelined [12] unknown >16 unknown

Kris Gaj [11] Virtex-E XCV1000E -8 16.00 9199 80 134.5

North Pole Engg. [18] unknown 12.8 5840 160 100

M. McLoone et al [14] Virtex-E XCV812 -8 6.95 2222 100 54.35

Table 9: Other Implementations on the Virtex family chips

The throughput above is listed in Gbps and the Clock frequency in MHz. The SIG

implementations above do the GF(28) arithmetic using a quadratic extension of a field

GF(24); the authors claim a significant improvement in the ability of the synthesis tools to

extract efficient logic. Although all the implementations listed above exploited the

parallelism and pipelining inherent in the algorithm, they differ in many aspects, making

it difficult to obtain a straight forward comparison with our implementation.

One difficulty is with the amount of pipelining, which directly affects the throughput

since our implementation has no pipelining. Second, the chip on which they are

implemented differs in the number of slices, maximum frequency, Block RAMs, and so

forth. Other aspects to be considered include the synthesis tool when compared against

Viva’s synthesis tool, since Viva’s synthesis cannot be expected to compete with more

established commercial synthesis tools in terms of efficiency.

lvi

Finally, when we consider that the results corresponding to throughput for our

implementation are an estimate but have not been tested practically, making comparisons

becomes more difficult and highlights testing as an important part of future work.

Future Work:

The architectures above can be further enhanced for varying bit lengths of Key. All the

architectures above can be directly used for an ECB (Electronic Code Book) mode of

encryption. Since the ECB mode is not regarded as secure compared to the other modes

such as CBC (Cipher Block Chaining), OFB (Output Feedback) and Counter mode, it

would be good if we could embed some of these modes into the architecture and give the

user flexibility in terms of security required as well as data rate. One more enhancement

would be to combine the encryption and decryption cores into a single core, thus making

it possible to shift between encryption and decryption by means of a select bit. Though

much of the silicon cannot be reused in the round, the Key Schedule is the same for

Encryption and Decryption.

If we consider a strictly FPGA implementation instead of an HC 36m implementation,

the first order of business would be to ascertain the actual throughputs of the existing

cores. It may be useful to try to produce an efficient AES core that took less silicon and

yet had a considerable throughput. Since we have already looked into different

approaches for the implementations of different transformations in Rijndael, it would be

easy to try different architectures for cheaper implementation in terms of silicon.

lvii

References

[1] Kazumaro Aoki and Helger Lipmaa, Fast Implementations of AES Candidates,

The Third Advanced Encryption Standard Candidate Conference, New York,

NY, April 13-14, 2000 .

[2] P. Chodowiec, P. Khuon, and K. Gaj, Fast Implementations of Secret-Key

Block Ciphers Using Mixed Inner- and Outer-Round Pipelining, ACM/SIGDA

Ninth International Symposium on Field Programmable Gate Arrays,

Monterey, California, February 11-13, 2001.

[3] J. Daemen and V. Rijmen. The Design of Rijndael: AES- The Advanced

Encryption Standard (Information Security and Cryptography). Springer

Verlag, Berlin, 2001.

[4] Joan Daemen and Vincent Rijmen. AES Proposal: Rijndael, Mar 09, 1999,

<http://csrc.nist.gov/CryptoToolkit/aes/rijndael/Rijndael.pdf>, as referenced on

Nov 15th , 2003.

lviii

[5] Joan Daemen and Vincent Rijmen, Rijndael for AES, AES Candidate

Conference, NY, April 13-14, 2000.

[6] J. Daemen and V. Rijmen, AES Public Comment from the Rijndael Team,

1999, <http://csrc.nist.gov/CryptoToolkit/aes/round1/comments/990414-jdae

men.pdf>, as referenced on Nov 15th, 2003.

[7] A.J. Elbirt, W. Yip, B. Chetwynd and C. Paar, An FPGA Implementation and

Performance Evaluation of the AES Block Cipher Candidate Algorithm

Finalists, The Third Advance Encryption Standard (AES3) Candidate

Conference, New York, NY, April 13-14,2000.

[8] Brain Gladman’s Homepage, Implementations of AES, as referenced on Nov

15th 2003, <http://fp.gladman.plus.com/cryptography _technology/ rijndael/>

[9] K. Gaj and P. Chodowiec. Fast implementation and fair comparison of the final

candidates for Advanced Encryption Standard using Field Programmable Gate

Arrays. Proc. RSA Security Conference, Cryptographer's Track, San

Francisco, April 9, 2001.

[10] K.Gaj and P.Chodowiec, Comparison of the hardware performance of the

AES candidates using reconfigurable hardware, Third Advanced Encryption

Standard (AES) Candidate Conference, New York, NY, April 13-14, 2000.

lix

[11] Kris Gaj’s Website, Implementation of AES cores,

<http://ece.gmu.edu/crypto/rijndael.htm> , as referenced on Nov15th , 2003.

[12] Helion Technologies Inc., Website, AES (Rijndael) Cores, as referenced on

Nov 15th, 2003, < http://www.heliontech.com/core2.htm>.

[13] Kimmo U. Jarvinen, Matti T. Tommiska and Jorma O. Skytta, A Fully

Pipelined Memoryless 17.8 Gbps AES-128 Encryptor, International

Symposium on Field Programmable Gate Arrays, Monterey, California,

February 23-25, 2003.

[14] Maire McLoone and J.V McCanny, High Performance Single-Chip FPGA

Rijndael Algorithm Implementations, Workshop on Cryptographic Hardware

and Embedded Systems, May 14-16, 2001

[15] ModelSim, Inc., website, <http://www.model.com/>, as referenced on Nov 15 th,

2003.

[16] S. Murphy and M.J.B. Robshaw, Essential Algebraic Structure within the AES,

Information Security Group, University of London, Surrey, U.K, 2002.

[17] National Institute for Standards and Technology, AES Home Page,

<http://csrc.nist.gov/CryptoToolkit/aes>, as referenced on Nov 15th, 2003.

lx

http://ece.gmu.edu/crypto/rijndael.htm

[18] North Pole Engineering, Inc., Website, AES Core User’s Manual,

<http://www.northpoleengineering.com/Documents/AES%20Manual.pdf> , as

referenced on Nov 15th , 2003.

[19] Rijmen’s personal page, <http://www.esat.kuleuven.ac.be/~rijmen/ rijndael>,

as referenced on Nov 15th, 2003.

[20] Star Bridge Systems, Inc., web site, < http://www.starbridgesystems .com>, as

referenced on Nov 15th , 2003.

[21] Star Bridge Systems, Viva Tutorials, as referenced on Nov 15 th 2003,

<http://www.starbridgesystems.com/support/tutorials.html>

[22]Star Bridge Systems, HC36m, as referenced on Nov 15 th 2003,

<http://www.starbridgesystems.com/products/hc36.html>

[23] The Rijndael official page, <http://www.rijndael.com/>, as referenced on

Nov15th, 2003.

[24] Bryan Weeks, Mark Bean, Tom Rozylowicz and Chris Ficke, Hardware

Performance Simulations of Round 2 Advanced Encryption Standard

Algorithms, AES Candidate Conference, New York, NY, April 13-14, 2000.

lxi

http://www.northpoleengineering.com/Documents/AES%20Manual.pdf

[25] Heather A. Wake, Translating EDIF using Perl, CSE Department Technical

Report, University of South Carolina, 2003.

[26] Xilinx Inc., Xilinx ISE, as referenced on Nov 15th, 2003,

<http://www.xilinx.com/xlnx/xil_prodcat_landingpage.jsp?

title=ISE+Foundation >

lxii

Appendix A: C code used for testing[3]

#include<stdio.h>typedef unsigned char word8;typedef unsigned int word32;

word8 Logtable[256] ={0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 51, 238, 223, 3, 100, 4, 224, 14, 52, 141, 129, 239, 76, 113, 8, 200, 248, 105, 28, 193, 125, 194, 29, 181, 249, 185, 39, 106, 77, 228, 166, 114, 154, 201, 9, 120, 101, 47, 138, 5, 33, 15, 225, 36, 18, 240, 130, 69, 53, 147, 218, 142, 150, 143, 219, 189, 54, 208, 206, 148, 19, 92, 210, 241, 64, 70, 131, 56,102, 221, 253, 48, 191, 6, 139, 98, 179, 37, 226, 152, 34, 136, 145, 16,126, 110, 72, 195, 163, 182, 30, 66, 58, 107, 40, 84, 250, 133, 61, 186, 43, 121, 10, 21, 155, 159, 94, 202, 78, 212, 172, 229, 243, 115, 167, 87,175, 88, 168, 80, 244, 234, 214, 116, 79, 174, 233, 213, 231, 230, 173, 232,44, 215, 117, 122, 235, 22, 11, 245, 89, 203, 95, 176, 156, 169, 81, 160,127, 12, 246, 111, 23, 196, 73, 236, 216, 67, 31, 45, 164, 118, 123,183,204, 187, 62, 90, 251, 96, 177, 134, 59, 82, 161, 108, 170, 85, 41, 157,151, 178, 135, 144, 97, 190, 220, 252, 188, 149, 207, 205, 55, 63, 91, 209, 83, 57, 132, 60, 65, 162, 109, 71, 20, 42, 158, 93, 86, 242, 211, 171, 68, 17, 146, 217, 35, 32, 46, 137, 180, 124, 184, 38, 119, 153, 227, 165, 103, 74, 237, 222, 197, 49, 254, 24, 13, 99, 140, 128, 192, 247, 112, 7};

word8 Alogtable[256] = { 1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19, 53, 95, 225, 56, 72, 216, 115, 149, 164, 247, 2, 6, 10, 30, 34, 102, 170, 229, 52, 92, 228, 55, 89, 235, 38, 106, 190, 217, 112, 144, 171, 230, 49, 83, 245, 4, 12, 20, 60, 68, 204, 79, 209, 104, 184, 211, 110, 178, 205, 76, 212, 103, 169, 224, 59, 77, 215, 98, 166, 241, 8, 24, 40, 120, 136, 131, 158, 185, 208, 107, 189, 220, 127, 129, 152, 179, 206, 73, 219, 118, 154, 181, 196, 87, 249, 16, 48, 80, 240, 11, 29, 39, 105, 187, 214, 97, 163, 254, 25, 43, 125, 135, 146, 173, 236, 47, 113, 147, 174, 233, 32, 96, 160, 251, 22, 58, 78, 210, 109, 183, 194, 93, 231, 50, 86, 250, 21, 63, 65, 195, 94, 226, 61, 71, 201, 64, 192, 91, 237, 44, 116, 156, 191, 218, 117, 159, 186, 213, 100, 172, 239, 42, 126, 130, 157, 188, 223, 122, 142, 137, 128, 155, 182, 193, 88, 232, 35, 101, 175, 234, 37, 111, 177, 200, 67, 197, 84, 252, 31, 33, 99, 165, 244, 7, 9, 27, 45, 119, 153, 176, 203, 70, 202, 69, 207, 74, 222, 121, 139, 134, 145, 168, 227, 62, 66, 198, 81, 243, 14, 18, 54, 90, 238, 41, 123, 141, 140, 143, 138, 133, 148, 167, 242, 13, 23, 57, 75, 221, 124, 132, 151, 162, 253, 28, 36, 108, 180, 199, 82, 246, 1, };

lxiii

word8 S[256] = { 99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118, 202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192, 183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21, 4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117, 9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132, 83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207, 208, 239, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168, 81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210, 205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115, 96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219, 224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121, 231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8, 186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138, 112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158, 225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223, 140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22, };

word8 Si[256] = { 82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251, 124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203, 84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78, 8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37, 114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146, 108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132, 144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6, 208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107, 58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115, 150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110, 71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27, 252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244, 31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95, 96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239, 160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97, 23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125, };

word32 RC[30] = { 0x00, 0x01,0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36, 0x6C, 0xD8, 0xAB,

0x4D, 0x9A, 0x2F, 0x5E, 0xBC, 0x63, 0xC6, 0x97, 0x35, 0x6A, 0xD4, 0xB3, 0x7D, 0xFA, 0xEF, 0xC5 };

#define MAXBC 8#define MAXKC 8#define MAXROUNDS 14

static word8 shifts[5][4] ={ 0,1,2,3, 0,1,2,3, 0,1,2,3,

0,1,2,4, 0,1,3,4 }; static int numrounds[5][5]={

10,11,12,13,14, 11,11,12,13,14, 12,12,12,13,14, 13,13,13,13,14, 14,14,14,14,14};

lxiv

int BC,KC,ROUNDS;

word8 mul(word8 a, word8 b) {

if (a && b) return Alogtable[(Logtable[a] + Logtable[b])%255];else return 0;

}

void AddRoundKey(word8 a[4][MAXBC], word8 rk[4][MAXBC]) {

int I, j;for(I = 0; I < 4; i++)

for(j = 0; j < BC; j++) a[i][j] ^= rk[i][j];}

void SubBytes(word8 a[4][MAXBC], word8 box[256]) {

int I, j;for(I = 0; I < 4; i++)

for(j = 0; j < BC; j++) a[i][j] = box[a[i][j]] ;}

void ShiftRows(word8 a[4][MAXBC], word8 d) { word8 tmp[MAXBC]; int I,j; if (d==0){

for(i=1;i<4;i++){ for(j=0;j<BC;j++)

tmp[j]=a[i][(j+shifts[BC-4][i])%BC]; for(j=0;j<BC;j++) a[i][j]=tmp[j];

} } else{

for(i=1;i<4;i++){ for(j=0;j<BC;j++)

tmp[j]=a[i][(BC+j-shifts[BC-4][i]) %BC]; for(j=0;j<BC;j++) a[i][j]=tmp[j];

} } }

void Mixcolumns(word8 a[4][MAXBC]){

word8 b[4][MAXBC]; int I ,j; for(j = 0; j < BC; j++)

for(I = 0; I < 4; i++)b[i][j] = mul(2,a[i][j])

^ mul(3,a[(I + 1) % 4][j])^ a[(I + 2) % 4][j]^ a[(I + 3) % 4][j];

for(I = 0; I < 4; i++)for(j = 0; j < BC; j++) a[i][j] = b[i][j];

}

void InvMixColumn(word8 a[4][MAXBC]) {

word8 b[4][MAXBC];int I, j;for(j = 0; j < BC; j++)

lxv

for(I = 0; I < 4; i++) b[i][j] = mul(0xe,a[i][j])

^ mul(0xb,a[(I + 1) % 4][j]) ^ mul(0xd,a[(I + 2) % 4][j])^ mul(0x9,a[(I + 3) % 4][j]);

for(I = 0; I < 4; i++)for(j = 0; j < BC; j++) a[i][j] = b[i][j];

}

int KeyExpansion(word8 k[4][MAXKC], word8 W[MAXROUNDS+1][4][MAXBC]){

int I, j,t, Rcpointer=1; word8 tk[4][MAXKC];

for(j = 0; j < KC; j++)for(I = 0; I < 4; i++)

tk[i][j] = k[i][j];t = 0;for(j = 0; (j < KC) && (t < (ROUNDS+1)*BC); j++, t++)

for(I = 0; I < 4; i++) W[t / BC][i][t % BC] = tk[i][j];while (t < (ROUNDS+1)*BC) {

for(I = 0; I < 4; i++)tk[i][0] ^= S[tk[(i+1)%4][KC-1]];

tk[0][0] ^= RC[Rcpointer++];if (KC <= 6)

for(j = 1; j < KC; j++)for(I = 0; I < 4; i++) tk[i][j] ^= tk[i][j-1];

else {for(j = 1; j <4; j++)

for(I = 0; I < 4; i++) tk[i][j] ^= tk[i][j-1];for(I = 0; I < 4; i++) tk[i][KC/2] ^= S[tk[i][KC/2 – 1]];for(j = 5; j < KC; j++)

for(I = 0; I < 4; i++) tk[i][j] ^= tk[i][j-1];}for(j = 0; (j < KC) && (t < (ROUNDS+1)*BC); j++, t++)

for(I = 0; I < 4; i++) W[t / BC][i][t % BC] = tk[i][j];}return 0;

}

int Encrypt(word8 a[4][MAXBC], word8 rk[MAXROUNDS+1][4][MAXBC]) {

int r; AddRoundKey(a,rk[0]); for(r=1;r<ROUNDS; r++){

SubBytes(a,S); ShiftRows(a,0); Mixcolumns(a); AddRoundKey(a,rk[r]);

}

SubBytes(a,S); ShiftRows(a,0); AddRoundKey(a,rk[ROUNDS]); return 0;

}

int Decrypt (word8 a[4][MAXBC], word8 rk[MAXROUNDS+1][4][MAXBC]) {

int r; AddRoundKey(a,rk[ROUNDS]); SubBytes(a,Si);

lxvi

ShiftRows(a,1); for(r=ROUNDS-1; r>0;r--){

AddRoundKey(a,rk[r]); InvMixColumn(a); SubBytes(a,Si); ShiftRows(a,1);

}

AddRoundKey(a,rk[0]); return 0;

}

int main(){

int I, j; word8 a[4][MAXBC], rk[MAXROUNDS+1][4][MAXBC], sk[4][MAXKC]; for(KC=4; KC<=8; KC++)

for(BC=4; BC<=8; BC++){ ROUNDS=numrounds[KC-4][BC-4]; for(j=0;j<BC;j++)

for(i=0;i<4;i++) a[i][j]=0; for(j=0;j<KC;j++)

for(i=0;i<4;i++) sk[i][j] =0;KeyExpansion(sk,rk);Encrypt(a,rk);printf(“blocklenght %d keylenght %d\n” ,32*BC,32*KC);for(j=0;j<BC;j++)

for(i=0;i<4;i++)printf(“%02d “,a[i][j]);

printf(“\n”); Decrypt(a,rk); for(j=0;j<4;j++)

for(i=0;i<4;i++) printf(“%02d”, a[i][j]);

printf(“\n\n”); } return 0;

}

lxvii

Appendix B: VHDL Implementation

packages.vhd

library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;package int_types is subtype large_int is integer range 0 to 255 ;end package;

library ieee;use ieee.std_logic_arith.all;use ieee.std_logic_1164.all;use work.int_types.all;

package key_types istype keyarray is array (0 to 15) of large_int;end package;

sbox_ram.vhd

library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;

entity RAM_sbox isport( ma :in large_int ; mb :out large_int);end entity;

architecture behav of RAM_sbox isbeginprocess (ma)type box is array (0 to 255 ) of large_int; constant s:box:=(99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118, 202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192, 183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21, 4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117, 9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132, 83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207, 208, 239, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168, 81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210, 205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115, 96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219, 224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121, 231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8, 186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138, 112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158, 225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223, 140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22, );

lxviii

beginmb<=s(ma);end process;end architecture;

logtable_ram


entity RAM_logtable isport( ma :in large_int ; mb :out large_int);end entity;

architecture behav of RAM_logtable isbeginprocess (ma)type box is array (0 to 255 ) of large_int; constant logtable:box:=(0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 51, 238, 223, 3, 100, 4, 224, 14, 52, 141, 129, 239, 76, 113, 8, 200, 248, 105, 28, 193, 125, 194, 29, 181, 249, 185, 39, 106, 77, 228, 166, 114, 154, 201, 9, 120, 101, 47, 138, 5, 33, 15, 225, 36, 18, 240, 130, 69, 53, 147, 218, 142, 150, 143, 219, 189, 54, 208, 206, 148, 19, 92, 210, 241, 64, 70, 131, 56,102, 221, 253, 48, 191, 6, 139, 98, 179, 37, 226, 152, 34, 136, 145, 16,126, 110, 72, 195, 163, 182, 30, 66, 58, 107, 40, 84, 250, 133, 61, 186, 43, 121, 10, 21, 155, 159, 94, 202, 78, 212, 172, 229, 243, 115, 167, 87,175, 88, 168, 80, 244, 234, 214, 116, 79, 174, 233, 213, 231, 230, 173, 232,44, 215, 117, 122, 235, 22, 11, 245, 89, 203, 95, 176, 156, 169, 81, 160,127, 12, 246, 111, 23, 196, 73, 236, 216, 67, 31, 45, 164, 118, 123,183,204, 187, 62, 90, 251, 96, 177, 134, 59, 82, 161, 108, 170, 85, 41, 157,151, 178, 135, 144, 97, 190, 220, 252, 188, 149, 207, 205, 55, 63, 91, 209, 83, 57, 132, 60, 65, 162, 109, 71, 20, 42, 158, 93, 86, 242, 211, 171, 68, 17, 146, 217, 35, 32, 46, 137, 180, 124, 184, 38, 119, 153, 227, 165, 103, 74, 237, 222, 197, 49, 254, 24, 13, 99, 140, 128, 192, 247, 112, 7);

beginmb<=logtable(ma);end process;end architecture;

alogtable.vhd


entity RAM_alogtable isport( ma :in large_int ; mb :out large_int);end entity;

architecture behav of RAM_alogtable isbeginprocess (ma)type box is array (0 to 255 ) of large_int; constant alogtable:box:=

lxix

(1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19, 53, 95, 225, 56, 72, 216, 115, 149, 164, 247, 2, 6, 10, 30, 34, 102, 170, 229, 52, 92, 228, 55, 89, 235, 38, 106, 190, 217, 112, 144, 171, 230, 49, 83, 245, 4, 12, 20, 60, 68, 204, 79, 209, 104, 184, 211, 110, 178, 205, 76, 212, 103, 169, 224, 59, 77, 215, 98, 166, 241, 8, 24, 40, 120, 136, 131, 158, 185, 208, 107, 189, 220, 127, 129, 152, 179, 206, 73, 219, 118, 154, 181, 196, 87, 249, 16, 48, 80, 240, 11, 29, 39, 105, 187, 214, 97, 163, 254, 25, 43, 125, 135, 146, 173, 236, 47, 113, 147, 174, 233, 32, 96, 160, 251, 22, 58, 78, 210, 109, 183, 194, 93, 231, 50, 86, 250, 21, 63, 65, 195, 94, 226, 61, 71, 201, 64, 192, 91, 237, 44, 116, 156, 191, 218, 117, 159, 186, 213, 100, 172, 239, 42, 126, 130, 157, 188, 223, 122, 142, 137, 128, 155, 182, 193, 88, 232, 35, 101, 175, 234, 37, 111, 177, 200, 67, 197, 84, 252, 31, 33, 99, 165, 244, 7, 9, 27, 45, 119, 153, 176, 203, 70, 202, 69, 207, 74, 222, 121, 139, 134, 145, 168, 227, 62, 66, 198, 81, 243, 14, 18, 54, 90, 238, 41, 123, 141, 140, 143, 138, 133, 148, 167, 242, 13, 23, 57, 75, 221, 124, 132, 151, 162, 253, 28, 36, 108, 180, 199, 82, 246, 1, );

beginmb<=alogtable(ma);end process;end architecture;

rbox_ram.vhd


entity RAM_rbox isport( ma :in large_int ; mb :out large_int);end entity;

architecture behav of RAM_rbox isbeginprocess (ma)

type TRC is array(0 to 29) of large_int;constant RC:TRC:=(16#00#, 16#01#,16#02#, 16#04#, 16#08#, 16#10#, 16#20#,16#40#, 16#80#, 16#1B#, 16#36#, 16#6C#, 16#D8#,16#AB#, 16#4D#, 16#9A#, 16#2F#, 16#5E#,16#BC#, 16#63#, 16#C6#, 16#97#, 16#35#, 16#6A#, 16#D4#, 16#B3#,16#7D#, 16#FA#, 16#EF#, 16#C5# );

beginmb<=RC(ma);end process;end architecture;

addcmp.vhd

library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use ieee.numeric_std.all;use work.int_types.all;

lxx

entity addcmp isport( ma1,ma2 :in large_int; mb1:out large_int);

end entity;

architecture behav of addcmp isbeginprocess(ma1,ma2)

beginif ma1 + ma2 > 255 then mb1<=ma1 + ma2 - 255;else mb1<=ma1 + ma2;end if;end process;end architecture;

shiftrow.vhd


entity shiftrow is

port( sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end entity;

architecture behav of shiftrow isbeginstorage: process(sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16)

beginsb1<=sa1;sb2<=sa6;sb3<=sa11;sb4<=sa16;sb5<=sa5;sb6<=sa10;sb7<=sa15;sb8<=sa4;sb9<=sa9;sb10<=sa14;sb11<=sa3;sb12<=sa8;sb13<=sa13;sb14<=sa2;sb15<=sa7;sb16<=sa12;end process;end architecture;

lxxi

roundkey.vhd

library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use ieee.std_logic_arith.all;use work.int_types.all;

entity roundkey isport( a,b :in large_int;c: out large_int);end entity;

architecture behav of roundkey is begin storage: process(a,b)

beginc<=conv_integer(conv_std_logic_vector(a,8) xor conv_std_logic_vector(b,8) );end process;end architecture;

round_roundkey.vhd

library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;

entity round_roundkey is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16 : in large_int;b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;

architecture struct of round_roundkey iscomponent roundkey port( a,b:in large_int; c:out large_int);end component ;

beginblock18:roundkeyport map(aa1,k1,b1);block19: roundkeyport map(aa2,k2, b2);block20:roundkeyport map(aa3,k3, b3);block21:roundkeyport map(aa4,k4, b4);block22:roundkeyport map(aa5,k5, b5);block23:roundkeyport map(aa6,k6, b6);block24:roundkeyport map(aa7,k7, b7);block25:roundkeyport map(aa8,k8, b8);block26:roundkeyport map(aa9, k9,b9);

lxxii

block27:roundkeyport map(aa10,k10, b10);block28:roundkeyport map(aa11,k11, b11);block29:roundkeyport map(aa12,k12, b12);block30:roundkeyport map(aa13,k13, b13);block31:roundkeyport map(aa14,k14, b14);block32:roundkeyport map(aa15,k15, b15);block33:roundkeyport map(aa16,k16, b16);end architecture;

round_sbox.vhd

use work.int_types.all;

entity round_sbox is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16 : in large_int;b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;

architecture struct of round_sbox iscomponent RAM_sbox port( ma:in large_int; mb:out large_int);end component ;

beginblock18:RAM_sboxport map(aa1,b1);block19: RAM_sboxport map(aa2, b2);block20:RAM_sboxport map(aa3, b3);block21:RAM_sboxport map(aa4, b4);block22:RAM_sboxport map(aa5, b5);block23:RAM_sboxport map(aa6, b6);block24:RAM_sboxport map(aa7, b7);block25:RAM_sboxport map(aa8, b8);block26:RAM_sboxport map(aa9, b9);block27:RAM_sboxport map(aa10, b10);block28:RAM_sboxport map(aa11, b11);block29:RAM_sboxport map(aa12, b12);block30:RAM_sboxport map(aa13, b13);block31:RAM_sboxport map(aa14, b14);block32:RAM_sboxport map(aa15, b15);

lxxiii

block33:RAM_sboxport map(aa16, b16);end architecture;

multiply.vhd

library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use ieee.numeric_std.all;use work.int_types.all;

entity multiply isport( ma1,ma2 :in large_int; mb1:out large_int);

end entity; architecture behav of multiply is

component RAM_logtable isport( ma :in large_int ; mb :out large_int); end component;

component RAM_alogtable isport( ma :in large_int ; mb :out large_int); end component;

component addcmp isport ( ma1,ma2: in large_int; mb1:out large_int);end component;signal ap,bp,dp,ep,hp,ip:large_int;

begindut1: RAM_logtable port map(ma1,bp);

dut3: addcmp port map(bp,ma2,dp);dut4: RAM_alogtable port map(dp,ip);

process(ma1,ma2,ip)beginif ma1 = 0 or ma2 = 0thenmb1 <= 0;else

mb1<=ip;end if;end process; end architecture;

mix.vhd


lxxiv

entity mix is port( ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4 :in large_int; mb1:out large_int);end entity;architecture behav of mix is

component multiplyport( ma1,ma2 :in large_int;mb1: out large_int);end component;

component roundkey port( a,b :in large_int;c: out large_int);end component;

signal aa1,aa2,aa3,aa6 :large_int;begin

block1: multiply port map(ma1,pc1,aa1);block2: multiply port map(ma2,pc2,aa2);block3: roundkey port map(aa1,aa2,aa3);

block6: roundkey port map(ma4,ma3,aa6);block7: roundkey port map(aa6,aa3,mb1);end architecture;

mixcolumn.vhd


entity mixcolumn is

port( ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int; mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);end entity; architecture behav of mixcolumn is

component mixport( ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4 :in large_int;mb1:out large_int);end component;

beginblock1: mix port map(ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4,mb1);block2: mix port map(ma2,ma3,ma4,ma1,pc1,pc2,pc3,pc4,mb2);block3: mix port map(ma3,ma4,ma1,ma2,pc1,pc2,pc3,pc4,mb3);block4: mix port map(ma4,ma1,ma2,ma3,pc1,pc2,pc3,pc4,mb4);

lxxv

block5: mix port map(ma5,ma6,ma7,ma8,pc1,pc2,pc3,pc4,mb5);block6: mix port map(ma6,ma7,ma8,ma5,pc1,pc2,pc3,pc4,mb6);block7: mix port map(ma7,ma8,ma5,ma6,pc1,pc2,pc3,pc4,mb7);block8: mix port map(ma8,ma5,ma6,ma7,pc1,pc2,pc3,pc4,mb8); block9: mix port map(ma9,ma10,ma11,ma12,pc1,pc2,pc3,pc4,mb9);block10: mix port map(ma10,ma11,ma12,ma9,pc1,pc2,pc3,pc4,mb10);block11: mix port map(ma11,ma12,ma9,ma10,pc1,pc2,pc3,pc4,mb11);block12: mix port map(ma12,ma9,ma10,ma11,pc1,pc2,pc3,pc4,mb12); block13: mix port map(ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4,mb13);block14: mix port map(ma14,ma15,ma16,ma13,pc1,pc2,pc3,pc4,mb14);block15: mix port map(ma15,ma16,ma13,ma14,pc1,pc2,pc3,pc4,mb15);block16: mix port map(ma16,ma13,ma14,ma15,pc1,pc2,pc3,pc4,mb16);end architecture;

lround.vhd


entity lround is port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;

architecture struct of lround issignaltap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16:large_int;signaltap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;signalttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16:large_int;

component round_sbox

port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16:in large_int;

b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ; component round_roundkey

port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16:in large_int;

b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ;

lxxvi

component shiftrow

port(sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end component;

beginblock1: round_sboxport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);block34:shiftrowport map(tap1, tap2, tap3, tap4,tap5, tap6, tap7, tap8, tap9,tap10, tap11, tap12,tap13, tap14,tap15, tap16,ttap1, ttap2, ttap3, ttap4,ttap5, ttap6,ttap7,ttap8, ttap9,ttap10,ttap11, ttap12, ttap13,ttap14,ttap15, ttap16);

block4: round_roundkeyport map(ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);end architecture;

round.vhd


entity round is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of round is

signaltap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;signalttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16:large_int;signaltttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16:large_int;component round_sbox



component round_roundkey


lxxvii


component shiftrow

port( sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end component;

component mixcolumn

port(ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int; mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);end component; beginblock1: round_sboxport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);block34:shiftrowport map(tap1, tap2, tap3, tap4,tap5, tap6, tap7, tap8, tap9,tap10, tap11, tap12,tap13, tap14,tap15, tap16,ttap1, ttap2, ttap3, ttap4,ttap5, ttap6,ttap7,ttap8, ttap9,ttap10,ttap11, ttap12, ttap13,ttap14,ttap15, ttap16);block3: mixcolumnport map(ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16,pc1,pc2,pc3,pc4,tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16); block4: round_roundkeyport map(tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);end architecture;

keyschedule.vhd


entity keyschedule is port ( a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,RCpointer :in large_int; owk: out keyarray;RCp: out large_int);end entity;

architecture behav of keyschedule iscomponent RAM_rbox isport(ma: in large_int; mb :out large_int);end component;component RAM_sbox isport( ma :in large_int ; mb :out large_int);

lxxviii

end component;component roundkey port( a,b :in large_int;c: out large_int);end component;signalaa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,out1,out2,out3,out4,out5,out6,out7,out8,out9,out10,out11,out12,out13,out14,out15,out0:large_int;

beginblock1: RAM_sbox port map(a14,aa1);block2: roundkey port map(a1,aa1,aa2);block3: RAM_sbox port map(a15,aa3);block4: roundkey port map(a2,aa3,out1);block5: RAM_sbox port map(a16,aa4);block6: roundkey port map(aa4,a3,out2);block7: RAM_sbox port map(a13,aa5);block8: roundkey port map(aa5,a4,out3);block9: RAM_rbox port map(RCpointer,aa6);block10: roundkey port map(aa6,aa2,out0);block11: roundkey port map(a5,out0,out4);block12: roundkey port map(a6,out1,out5);block13: roundkey port map(a7,out2,out6);block14: roundkey port map(a8,out3,out7); block15: roundkey port map(a9,out4,out8);block16: roundkey port map(a10,out5,out9);block17: roundkey port map(a11,out6,out10);block18: roundkey port map(a12,out7,out11); block19: roundkey port map(a13,out8,out12);block20: roundkey port map(a14,out9,out13);block21: roundkey port map(a15,out10,out14);block22: roundkey port map(a16,out11,out15); process(RCpointer,out1,out2,out3,out4,out5,out6,out7,out8,out9,out10,out11,out12,out13,out14,out15,out0) begin RCp<=RCpointer + 1; owk(0)<=out0 ; owk(1)<=out1 ; owk(2)<=out2 ; owk(3)<=out3 ; owk(4)<=out4;

lxxix

owk(5)<=out5 ; owk(6)<=out6 ; owk(7)<=out7 ; owk(8)<=out8 ; owk(9)<=out9 ; owk(10)<=out10 ; owk(11)<=out11 ; owk(12)<=out12 ; owk(13)<=out13 ; owk(14)<=out14; owk(15)<=out15 ; end process; end architecture;

aes.vhd


entity aes is port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,RCpointer,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;

architecture struct of aes issignalap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16:large_int;signaltap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;

signal sap1, sap2, sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16:large_int;signalaap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16:large_int;signal bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16:large_int;signal cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16:large_int;signal dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16:large_int;signal fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16:large_int;signal gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16:large_int;signal hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16:large_int;signal wpk1,wpk2,wpk3,wpk4,wpk5,wpk6,wpk7,wpk8,wpk9,wpk10: keyarray;signal RC2,RC3,RC4,RC5,RC6,RC7,RC8,RC9,RC10,RC11:large_int;

lxxx

component keyscheduleport( a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,RCpointer : in large_int ;owk : out keyarray ; RCp: out large_int);end component;component round port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component;component roundkey port( a,b :in large_int;c: out large_int);end component;component lround port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component;

beginblock01: keyscheduleport map(k1, k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,RCpointer,wpk1,RC2);block02: keyscheduleport map(wpk1(0),wpk1(1),wpk1(2),wpk1(3),wpk1(4),wpk1(5),wpk1(6),wpk1(7),wpk1(8),wpk1(9),wpk1(10),wpk1(11),wpk1(12),wpk1(13),wpk1(14),wpk1(15),RC2,wpk2,RC3);block03:keyscheduleport map(wpk2(0),wpk2(1),wpk2(2),wpk2(3),wpk2(4),wpk2(5),wpk2(6),wpk2(7),wpk2(8),wpk2(9),wpk2(10),wpk2(11),wpk2(12),wpk2(13),wpk2(14),wpk2(15),RC3,wpk3,RC4);block04:keyscheduleport map(wpk3(0),wpk3(1),wpk3(2),wpk3(3),wpk3(4),wpk3(5),wpk3(6),wpk3(7),wpk3(8),wpk3(9),wpk3(10),wpk3(11),wpk3(12),wpk3(13),wpk3(14),wpk3(15),RC4,wpk4,RC5);block05:keyscheduleport map(wpk4(0),wpk4(1),wpk4(2),wpk4(3),wpk4(4),wpk4(5),wpk4(6),wpk4(7),wpk4(8),wpk4(9),wpk4(10),wpk4(11),wpk4(12),wpk4(13),wpk4(14),wpk4(15),RC5,wpk5,RC6);block06:keyscheduleport map(wpk5(0),wpk5(1),wpk5(2),wpk5(3),wpk5(4),wpk5(5),wpk5(6),wpk5(7),wpk5(8),wpk5(9),wpk5(10),wpk5(11),wpk5(12),wpk5(13),wpk5(14),wpk5(15),RC6,wpk6,RC7);block07:keyscheduleport map(wpk6(0),wpk6(1),wpk6(2),wpk6(3),wpk6(4),wpk6(5),wpk6(6),wpk6(7),wpk6(8),wpk6(9),wpk6(10),wpk6(11),wpk6(12),wpk6(13),wpk6(14),wpk6(15),RC7,wpk7,RC8);block08:keyscheduleport map(wpk7(0),wpk7(1),wpk7(2),wpk7(3),wpk7(4),wpk7(5),wpk7(6),wpk7(7),wpk7(8),wpk7(9),wpk7(10),wpk7(11),wpk7(12),wpk7(13),wpk7(14),wpk7(15),RC8,wpk8,RC9);block09:keyscheduleport map(wpk8(0),wpk8(1),wpk8(2),wpk8(3),wpk8(4),wpk8(5),wpk8(6),wpk8(7),wpk8(8),wpk8(9),wpk8(10),wpk8(11),wpk8(12),wpk8(13),wpk8(14),wpk8(15),RC9,wpk9,RC10);block010:keyschedule

lxxxi

port map(wpk9(0),wpk9(1),wpk9(2),wpk9(3),wpk9(4),wpk9(5),wpk9(6),wpk9(7),wpk9(8),wpk9(9),wpk9(10),wpk9(11),wpk9(12),wpk9(13),wpk9(14),wpk9(15),RC10,wpk10,RC11);

block2: roundkeyport map(aa1,k1, ap1);block3: roundkeyport map(aa2, k2, ap2);block4: roundkeyport map(aa3, k3, ap3);block5:roundkeyport map(aa4,k4, ap4);block6: roundkeyport map(aa5,k5,ap5);block7: roundkeyport map(aa6,k6,ap6);block8: roundkeyport map(aa7, k7,ap7);block9: roundkeyport map(aa8, k8,ap8);block10: roundkeyport map(aa9, k9,ap9);block11: roundkeyport map(aa10, k10,ap10);block12: roundkeyport map(aa11,k11, ap11);block13: roundkeyport map(aa12,k12,ap12);block14: roundkeyport map(aa13, k13,ap13);block15: roundkeyport map(aa14, k14,ap14);block16: roundkeyport map(aa15, k15,ap15);block17: roundkeyport map(aa16, k16,ap16);

block18: roundport map( ap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16,wpk1(0),wpk1(1),wpk1(2),wpk1(3),wpk1(4),wpk1(5),wpk1(6),wpk1(7),wpk1(8),wpk1(9),wpk1(10),wpk1(11),wpk1(12),wpk1(13),wpk1(14),wpk1(15),pc1,pc2,pc3,pc4,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);

block19: roundport map( tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16,wpk2(0),wpk2(1),wpk2(2),wpk2(3),wpk2(4),wpk2(5),wpk2(6),wpk2(7),wpk2(8),wpk2(9),wpk2(10),wpk2(11),wpk2(12),wpk2(13),wpk2(14),wpk2(15),pc1,pc2,pc3,pc4,sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16);

block20: roundport map( sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16,wpk3(0),wpk3(1),wpk3(2),wpk3(3),wpk3(4),wpk3(5),wpk3(6),wpk3(7),wpk3(8),wpk3(9),wpk3(10),wpk3(11),wpk3(12),wpk3(13),wpk3(14),wpk3(15),pc1,pc2,pc3,pc4,aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16);

block21: roundport map( aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,

lxxxii

aap15,aap16,wpk4(0),wpk4(1),wpk4(2),wpk4(3),wpk4(4),wpk4(5),wpk4(6),wpk4(7),wpk4(8),wpk4(9),wpk4(10),wpk4(11),wpk4(12),wpk4(13),wpk4(14),wpk4(15),pc1,pc2,pc3,pc4,bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16);

block22: roundport map( bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16,wpk5(0),wpk5(1),wpk5(2),wpk5(3),wpk5(4),wpk5(5),wpk5(6),wpk5(7),wpk5(8),wpk5(9),wpk5(10),wpk5(11),wpk5(12),wpk5(13),wpk5(14),wpk5(15),pc1,pc2,pc3,pc4,cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16);

block23: roundport map( cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16,wpk6(0),wpk6(1),wpk6(2),wpk6(3),wpk6(4),wpk6(5),wpk6(6),wpk6(7),wpk6(8),wpk6(9),wpk6(10),wpk6(11),wpk6(12),wpk6(13),wpk6(14),wpk6(15),pc1,pc2,pc3,pc4,dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16);

block24: roundport map( dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16,wpk7(0),wpk7(1),wpk7(2),wpk7(3),wpk7(4),wpk7(5),wpk7(6),wpk7(7),wpk7(8),wpk7(9),wpk7(10),wpk7(11),wpk7(12),wpk7(13),wpk7(14),wpk7(15),pc1,pc2,pc3,pc4,fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16);

block25: roundport map( fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16,wpk8(0),wpk8(1),wpk8(2),wpk8(3),wpk8(4),wpk8(5),wpk8(6),wpk8(7),wpk8(8),wpk8(9),wpk8(10),wpk8(11),wpk8(12),wpk8(13),wpk8(14),wpk8(15),pc1,pc2,pc3,pc4,gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16);

block26: roundport map(gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16,wpk9(0),wpk9(1),wpk9(2),wpk9(3),wpk9(4),wpk9(5),wpk9(6),wpk9(7),wpk9(8),wpk9(9),wpk9(10),wpk9(11),wpk9(12),wpk9(13),wpk9(14),wpk9(15),pc1,pc2,pc3,pc4,hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16);

block27: lroundport map( hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16,wpk10(0),wpk10(1),wpk10(2),wpk10(3),wpk10(4),wpk10(5),wpk10(6),wpk10(7),wpk10(8),wpk10(9),wpk10(10),wpk10(11),wpk10(12),wpk10(13),wpk10(14),wpk10(15),b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);end architecture;

Dsbox_ram.vhd

library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;entity RAM_dsbox isport( ma :in large_int ; mb :out large_int);

lxxxiii

end entity;

architecture behav of RAM_dsbox isbeginprocess (ma)type box is array (0 to 255 ) of large_int; constant s:box:=( 82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251, 124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203, 84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78, 8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37, 114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146, 108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132, 144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6, 208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107, 58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115, 150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110, 71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27, 252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244, 31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95, 96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239, 160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97, 23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125, );beginmb<=s(ma);end process;end architecture;

dshiftrow.vhd


entity dshiftrow is

port( sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end entity;architecture behav of dshiftrow isbeginstorage: process(sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16)

beginsb1<=sa1;sb2<=sa14;sb3<=sa11;sb4<=sa8;sb5<=sa5;sb6<=sa2;sb7<=sa15;sb8<=sa12;sb9<=sa9;sb10<=sa6;sb11<=sa3;sb12<=sa16;sb13<=sa13;

lxxxiv

sb14<=sa10;sb15<=sa7;sb16<=sa4;

end process;end architecture;

round_dsbox.vhd


entity round_dsbox is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16 : in large_int;b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of round_dsbox is

component RAM_dsbox port( ma:in large_int; mb:out large_int);end component ;

beginblock18:RAM_dsboxport map(aa1,b1);block19: RAM_dsboxport map(aa2, b2);block20:RAM_dsboxport map(aa3, b3);block21:RAM_dsboxport map(aa4, b4);block22:RAM_dsboxport map(aa5, b5);block23:RAM_dsboxport map(aa6, b6);block24:RAM_dsboxport map(aa7, b7);block25:RAM_dsboxport map(aa8, b8);block26:RAM_dsboxport map(aa9, b9);block27:RAM_dsboxport map(aa10, b10);block28:RAM_dsboxport map(aa11, b11);block29:RAM_dsboxport map(aa12, b12);block30:RAM_dsboxport map(aa13, b13);block31:RAM_dsboxport map(aa14, b14);block32:RAM_dsboxport map(aa15, b15);block33:RAM_dsboxport map(aa16, b16);

lxxxv

end architecture;

dmix.vhd


entity dmix is port( ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4 :in large_int; mb1:out large_int);end entity;architecture behav of dmix is

component multiplyport( ma1,ma2 :in large_int;mb1: out large_int);end component;


signal aa1,aa2,aa3,aa4,aa5,aa6 :large_int;begin

block1: multiply port map(ma1,pc1,aa1);block2: multiply port map(ma2,pc2,aa2);block3: roundkey port map(aa1,aa2,aa3);block4: multiply port map(ma3,pc3,aa4);block5:multiply port map(ma4,pc4,aa5);

block6: roundkey port map(aa4,aa5,aa6);block7: roundkey port map(aa6,aa3,mb1);end architecture;

invmixcolumn.vhd


entity invmixcolumn is

port( ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int; mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);

lxxxvi

end entity; architecture behav of invmixcolumn is

component dmixport( ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4 :in large_int;mb1:out large_int);end component;

beginblock1: dmix port map(ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4,mb1);block2: dmix port map(ma2,ma3,ma4,ma1,pc1,pc2,pc3,pc4,mb2);block3: dmix port map(ma3,ma4,ma1,ma2,pc1,pc2,pc3,pc4,mb3);block4: dmix port map(ma4,ma1,ma2,ma3,pc1,pc2,pc3,pc4,mb4); block5: dmix port map(ma5,ma6,ma7,ma8,pc1,pc2,pc3,pc4,mb5);block6: dmix port map(ma6,ma7,ma8,ma5,pc1,pc2,pc3,pc4,mb6);block7: dmix port map(ma7,ma8,ma5,ma6,pc1,pc2,pc3,pc4,mb7);block8: dmix port map(ma8,ma5,ma6,ma7,pc1,pc2,pc3,pc4,mb8); block9: dmix port map(ma9,ma10,ma11,ma12,pc1,pc2,pc3,pc4,mb9);block10: dmix port map(ma10,ma11,ma12,ma9,pc1,pc2,pc3,pc4,mb10);block11: dmix port map(ma11,ma12,ma9,ma10,pc1,pc2,pc3,pc4,mb11);block12: dmix port map(ma12,ma9,ma10,ma11,pc1,pc2,pc3,pc4,mb12); block13: dmix port map(ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4,mb13);block14: dmix port map(ma14,ma15,ma16,ma13,pc1,pc2,pc3,pc4,mb14);block15: dmix port map(ma15,ma16,ma13,ma14,pc1,pc2,pc3,pc4,mb15);block16: dmix port map(ma16,ma13,ma14,ma15,pc1,pc2,pc3,pc4,mb16); end architecture;

fround.vhd


entity fround is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16

lxxxvii

: in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of fround is

signal tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;signal ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16:large_int;signal tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16:large_int;

component round_dsbox






component dshiftrow


component invmixcolumn

port( ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int; mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);end component;

beginblock4: round_dsboxport map(tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);

block3:dshiftrowport map(tap1, tap2, tap3, tap4,tap5, tap6, tap7, tap8, tap9,tap10, tap11, tap12,tap13, tap14,tap15, tap16,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);

block1: round_roundkeyport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2

lxxxviii

,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16);

end architecture;

dround.vhd


entity dround is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of dround is

signal tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;signal ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16:large_int;signal tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16:large_int;

component round_dsbox






component dshiftrow


component invmixcolumn

port( ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int;

lxxxix

mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);end component;

beginblock4: round_dsboxport map(ttap1, ttap2, ttap3, ttap4,ttap5, ttap6,ttap7,ttap8, ttap9,ttap10,ttap11, ttap12, ttap13,ttap14,ttap15, ttap16,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);

block3:dshiftrowport map(tap1, tap2, tap3, tap4,tap5, tap6, tap7, tap8, tap9,tap10, tap11, tap12,tap13, tap14,tap15, tap16,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);block2: invmixcolumnport map(tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16,pc1,pc2,pc3,pc4,ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16);

block1: round_roundkeyport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16);end architecture;

daes.vhd


entity daes is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,RCpointer,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of daes issignal ap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16:large_int;signal tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;

signal sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16:large_int;signal aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16:large_int;

xc

signal bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16:large_int;signal cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16:large_int;signal dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16:large_int;signal fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16:large_int;signal gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16:large_int;signal hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16:large_int;signal wpk1,wpk2,wpk3,wpk4,wpk5,wpk6,wpk7,wpk8,wpk9,wpk10: keyarray;signal RC2,RC3,RC4,RC5,RC6,RC7,RC8,RC9,RC10,RC11:large_int;

component keyscheduleport( a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,RCpointer : in large_int ;owk : out keyarray ; RCp: out large_int);end component;

component dround port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component;


component fround port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component;

begin block01: keyscheduleport map(k1, k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,RCpointer,wpk1,RC2);block02: keyscheduleport map(wpk1(0),wpk1(1),wpk1(2),wpk1(3),wpk1(4),wpk1(5),wpk1(6),wpk1(7),wpk1(8),wpk1(9),wpk1(10),wpk1(11),wpk1(12),wpk1(13),wpk1(14),wpk1(15),RC2,wpk2,RC3);block03:keyscheduleport map(wpk2(0),wpk2(1),wpk2(2),wpk2(3),wpk2(4),wpk2(5),wpk2(6),wpk2(7),wpk2(8),wpk2(9),wpk2(10),wpk2(11),wpk2(12),wpk2(13),wpk2(14),wpk2(15),RC3,wpk3,RC4);block04:keyschedule

xci

port map(wpk3(0),wpk3(1),wpk3(2),wpk3(3),wpk3(4),wpk3(5),wpk3(6),wpk3(7),wpk3(8),wpk3(9),wpk3(10),wpk3(11),wpk3(12),wpk3(13),wpk3(14),wpk3(15),RC4,wpk4,RC5);block05:keyscheduleport map(wpk4(0),wpk4(1),wpk4(2),wpk4(3),wpk4(4),wpk4(5),wpk4(6),wpk4(7),wpk4(8),wpk4(9),wpk4(10),wpk4(11),wpk4(12),wpk4(13),wpk4(14),wpk4(15),RC5,wpk5,RC6);block06:keyscheduleport map(wpk5(0),wpk5(1),wpk5(2),wpk5(3),wpk5(4),wpk5(5),wpk5(6),wpk5(7),wpk5(8),wpk5(9),wpk5(10),wpk5(11),wpk5(12),wpk5(13),wpk5(14),wpk5(15),RC6,wpk6,RC7);block07:keyscheduleport map(wpk6(0),wpk6(1),wpk6(2),wpk6(3),wpk6(4),wpk6(5),wpk6(6),wpk6(7),wpk6(8),wpk6(9),wpk6(10),wpk6(11),wpk6(12),wpk6(13),wpk6(14),wpk6(15),RC7,wpk7,RC8);block08:keyscheduleport map(wpk7(0),wpk7(1),wpk7(2),wpk7(3),wpk7(4),wpk7(5),wpk7(6),wpk7(7),wpk7(8),wpk7(9),wpk7(10),wpk7(11),wpk7(12),wpk7(13),wpk7(14),wpk7(15),RC8,wpk8,RC9);block09:keyscheduleport map(wpk8(0),wpk8(1),wpk8(2),wpk8(3),wpk8(4),wpk8(5),wpk8(6),wpk8(7),wpk8(8),wpk8(9),wpk8(10),wpk8(11),wpk8(12),wpk8(13),wpk8(14),wpk8(15),RC9,wpk9,RC10);block010:keyscheduleport map(wpk9(0),wpk9(1),wpk9(2),wpk9(3),wpk9(4),wpk9(5),wpk9(6),wpk9(7),wpk9(8),wpk9(9),wpk9(10),wpk9(11),wpk9(12),wpk9(13),wpk9(14),wpk9(15),RC10,wpk10,RC11);block27: froundport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,wpk10(0),wpk10(1),wpk10(2),wpk10(3),wpk10(4),wpk10(5),wpk10(6),wpk10(7),wpk10(8),wpk10(9),wpk10(10),wpk10(11),wpk10(12),wpk10(13),wpk10(14),wpk10(15),ap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16);block18: droundport map( ap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16,wpk9(0),wpk9(1),wpk9(2),wpk9(3),wpk9(4),wpk9(5),wpk9(6),wpk9(7),wpk9(8),wpk9(9),wpk9(10),wpk9(11),wpk9(12),wpk9(13),wpk9(14),wpk9(15),pc1,pc2,pc3,pc4,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);block19: droundport map( tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16,wpk8(0),wpk8(1),wpk8(2),wpk8(3),wpk8(4),wpk8(5),wpk8(6),wpk8(7),wpk8(8),wpk8(9),wpk8(10),wpk8(11),wpk8(12),wpk8(13),wpk8(14),wpk8(15),pc1,pc2,pc3,pc4,sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16);block20: droundport map( sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16,wpk7(0),wpk7(1),wpk7(2),wpk7(3),wpk7(4),wpk7(5),wpk7(6),wpk7(7),wpk7(8),wpk7(9),wpk7(10),wpk7(11),wpk7(12),wpk7(13),wpk7(14),wpk7(15),pc1,pc2,pc3,pc4,aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16);block21: droundport map( aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16,wpk6(0),wpk6(1),wpk6(2),wpk6(3),wpk6(4),wpk6(5),wpk6(6),wpk6(7),wpk6(8),wpk6(9),wpk6(10),wpk6(11),wpk6(12),wpk6(13),wpk6(14),wpk6(15),pc1,pc2,pc3,pc4,bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16);block22: droundport map( bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,

xcii

bap15,bap16,wpk5(0),wpk5(1),wpk5(2),wpk5(3),wpk5(4),wpk5(5),wpk5(6),wpk5(7),wpk5(8),wpk5(9),wpk5(10),wpk5(11),wpk5(12),wpk5(13),wpk5(14),wpk5(15),pc1,pc2,pc3,pc4,cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16);block23: droundport map( cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16,wpk4(0),wpk4(1),wpk4(2),wpk4(3),wpk4(4),wpk4(5),wpk4(6),wpk4(7),wpk4(8),wpk4(9),wpk4(10),wpk4(11),wpk4(12),wpk4(13),wpk4(14),wpk4(15),pc1,pc2,pc3,pc4,dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16);block24: droundport map( dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16,wpk3(0),wpk3(1),wpk3(2),wpk3(3),wpk3(4),wpk3(5),wpk3(6),wpk3(7),wpk3(8),wpk3(9),wpk3(10),wpk3(11),wpk3(12),wpk3(13),wpk3(14),wpk3(15),pc1,pc2,pc3,pc4,fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16);block25: droundport map( fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16,wpk2(0),wpk2(1),wpk2(2),wpk2(3),wpk2(4),wpk2(5),wpk2(6),wpk2(7),wpk2(8),wpk2(9),wpk2(10),wpk2(11),wpk2(12),wpk2(13),wpk2(14),wpk2(15),pc1,pc2,pc3,pc4,gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16);block26: droundport map( gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16,wpk1(0),wpk1(1),wpk1(2),wpk1(3),wpk1(4),wpk1(5),wpk1(6),wpk1(7),wpk1(8),wpk1(9),wpk1(10),wpk1(11),wpk1(12),wpk1(13),wpk1(14),wpk1(15),pc1,pc2,pc3,pc4,hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16);

block2: roundkeyport map(hap1,k1, b1);block3: roundkeyport map(hap2, k2,b2);block4: roundkeyport map(hap3, k3,b3);block5:roundkeyport map(hap4,k4,b4);block6: roundkeyport map(hap5,k5,b5);block7: roundkeyport map(hap6,k6,b6);block8: roundkeyport map(hap7, k7,b7);block9: roundkeyport map(hap8, k8,b8);block10: roundkeyport map(hap9, k9,b9);block11: roundkeyport map(hap10, k10,b10);block12: roundkeyport map(hap11,k11,b11);block13: roundkeyport map(hap12,k12,b12);block14: roundkeyport map(hap13, k13,b13);block15: roundkeyport map(hap14, k14,b14);block16: roundkeyport map(hap15, k15,b15);

xciii

block17: roundkeyport map(hap16, k16,b16);end architecture;

xciv