EE800 TERM PROJECT Study of AES Encryption/Decription Optimizations Nathan Windels
Feb 24, 2016
EE800 TERM PROJECTStudy of AES Encryption/Decription Optimizations
Nathan Windels
Outline
Introduction AES Algorithm Areas of Optimization Progress/Conclusion
Introduction
Introduction
Three major implementation methods: Software
-Typically, this method is much slower than hardware implementations.
FPGA -Implemented as a hardware module directly to
pins. -Peripheral to a soft-core processor (communicates
via on-chip bus). -Tightly-coupled hardware implemented as an
extended instruction set. Custom Hardware (ASIC)
Introduction (2)
High throughput implementations are mainly used for high-end devices such as accelerator cards for e-commercial service and security trunk communications.
These types of implementations are typically unrolled loops within the AES algorithm with a pipelining of the 128-bit datapath.
Although they typically have a very high throughput, their area is very large.
Introduction (3)
The 32-bit AES implementations mainly multiplex the 128-bit datapath to 32 bits
This reduces circuit area at the expense of lowering speed.
This type of implementation is actually ideal for embedded applications.
My goal is to provide synthesis results for the different implementations as well as simulation/implemented results if time permits.
The AES Algorithm
AES Algorithm: Top Level
Encryptor
Encryption Key
Data
Cypher Data
AES Algorithm: Input
State Cypher Key
2B 28 AB 097E AE F7 CF15 D2 15 4F16 A6 88 3C
32 88 31 E043 5A 31 37F6 30 98 07A8 8D A2 34
to Encryption Process
to Key Schedule
AES Algorithm: Data Path
From Key Schedule
AES Algorithm: Data Path – SubBytes
AES Algorithm: Data Path – ShiftRows
123
AES Algorithm: Data Path – MixColumns
02 03 01 0101 02 03 0101 01 02 0303 01 01 02
X =
AES Algorithm: Data Path – Add Key
Data Round Key
AES Algorithm: Key Schedule Without going into too much detail,
the Key is generated in a ‘similar’ way.
In each Round a new Round Key is generated from the previous key.
This key is added to the dataset at the end of the round.
Areas of Optimization
Physical Layout - Starting Point
Optimization: Key Expansion Pre-calculated in software and then stored in
hardware (loaded when needed) Low area Hardware has to wait if new key is introduced (not good for
continually changing key) Calculated in parallel with the corresponding iteration
This allows for a changing key to be calculated on the fly Extra hardware/area cost (not good for (embedded) fixed
key applications) Calculated in hardware ahead of time and stored
High hardware cost – introduces latency when a new key is introduced
The circuit can be ‘turned off’ in ASIC solution
Optimization: Shift Row
16x8 memory with shifting ability 2 shift registers Rearrangement of wires (requires no
extra area, but may cause congestion in the wiring)
Optimization: Substitute Byte LUT
Easy to implement and understand. Would be a good idea to use the on chip ROM rather than LE’s (depending on application).
Uses lots of resources Combinational logic
No need for memories (XOR circuit could be good in FPGA as we’ve seen earlier in this class)
Slow due to complex circuit.
Optimization: Mix Columns
Multiplication and XOR done in combinational logic Easy to implement Could be slow and cover a large area
Combine the MixCols multiplication with the sbox and leave XOR in the LE’s Uses very few LE’s. Removes
multiplication from the equation. Quadrupals the size of the necessary ROM
- could be a drawback
Conclusion: So Far....
Studied Papers that address several of the optimizations listed above
Decided on an approach to modify and test existing code
Begun modifications on the code that I’ve decided to use as a starting point
...don’t quite have synthesis results yet...
Papers
•“Embedded a Low Area 32-bit AES for Image Encryption/ Decryption Application”•“Exploring HW/SW Co-Design of AES Algorithm Using Custom Instructions”•“Improved Method to Increase AES System Speed”•“An AES Tightly Coupled Hardware Accelerator in an FPGA-based Embedded Processor Core”•“DSP’s, BRAM’s and Pinch of Logic: New Recipes for AES on FPGA’s”