Top Banner
EE800 TERM PROJECT Study of AES Encryption/Decription Optimizations Nathan Windels
23

EE800 Term Project

Feb 24, 2016

Download

Documents

zudora

Study of AES Encryption/ Decription Optimizations. EE800 Term Project. Nathan Windels. Outline. Introduction AES Algorithm Areas of Optimization Progress/Conclusion. Introduction. Introduction. Three major implementation methods: Software - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EE800 Term Project

EE800 TERM PROJECTStudy of AES Encryption/Decription Optimizations

Nathan Windels

Page 2: EE800 Term Project

Outline

Introduction AES Algorithm Areas of Optimization Progress/Conclusion

Page 3: EE800 Term Project

Introduction

Page 4: EE800 Term Project

Introduction

Three major implementation methods: Software

-Typically, this method is much slower than hardware implementations.

FPGA -Implemented as a hardware module directly to

pins. -Peripheral to a soft-core processor (communicates

via on-chip bus). -Tightly-coupled hardware implemented as an

extended instruction set. Custom Hardware (ASIC)

Page 5: EE800 Term Project

Introduction (2)

High throughput implementations are mainly used for high-end devices such as accelerator cards for e-commercial service and security trunk communications.

These types of implementations are typically unrolled loops within the AES algorithm with a pipelining of the 128-bit datapath.

Although they typically have a very high throughput, their area is very large.

Page 6: EE800 Term Project

Introduction (3)

The 32-bit AES implementations mainly multiplex the 128-bit datapath to 32 bits

This reduces circuit area at the expense of lowering speed.

This type of implementation is actually ideal for embedded applications.

My goal is to provide synthesis results for the different implementations as well as simulation/implemented results if time permits.

Page 7: EE800 Term Project

The AES Algorithm

Page 8: EE800 Term Project

AES Algorithm: Top Level

Encryptor

Encryption Key

Data

Cypher Data

Page 9: EE800 Term Project

AES Algorithm: Input

State Cypher Key

2B 28 AB 097E AE F7 CF15 D2 15 4F16 A6 88 3C

32 88 31 E043 5A 31 37F6 30 98 07A8 8D A2 34

to Encryption Process

to Key Schedule

Page 10: EE800 Term Project

AES Algorithm: Data Path

From Key Schedule

Page 11: EE800 Term Project

AES Algorithm: Data Path – SubBytes

Page 12: EE800 Term Project

AES Algorithm: Data Path – ShiftRows

123

Page 13: EE800 Term Project

AES Algorithm: Data Path – MixColumns

02 03 01 0101 02 03 0101 01 02 0303 01 01 02

X =

Page 14: EE800 Term Project

AES Algorithm: Data Path – Add Key

Data Round Key

Page 15: EE800 Term Project

AES Algorithm: Key Schedule Without going into too much detail,

the Key is generated in a ‘similar’ way.

In each Round a new Round Key is generated from the previous key.

This key is added to the dataset at the end of the round.

Page 16: EE800 Term Project

Areas of Optimization

Page 17: EE800 Term Project

Physical Layout - Starting Point

Page 18: EE800 Term Project

Optimization: Key Expansion Pre-calculated in software and then stored in

hardware (loaded when needed) Low area Hardware has to wait if new key is introduced (not good for

continually changing key) Calculated in parallel with the corresponding iteration

This allows for a changing key to be calculated on the fly Extra hardware/area cost (not good for (embedded) fixed

key applications) Calculated in hardware ahead of time and stored

High hardware cost – introduces latency when a new key is introduced

The circuit can be ‘turned off’ in ASIC solution

Page 19: EE800 Term Project

Optimization: Shift Row

16x8 memory with shifting ability 2 shift registers Rearrangement of wires (requires no

extra area, but may cause congestion in the wiring)

Page 20: EE800 Term Project

Optimization: Substitute Byte LUT

Easy to implement and understand. Would be a good idea to use the on chip ROM rather than LE’s (depending on application).

Uses lots of resources Combinational logic

No need for memories (XOR circuit could be good in FPGA as we’ve seen earlier in this class)

Slow due to complex circuit.

Page 21: EE800 Term Project

Optimization: Mix Columns

Multiplication and XOR done in combinational logic Easy to implement Could be slow and cover a large area

Combine the MixCols multiplication with the sbox and leave XOR in the LE’s Uses very few LE’s. Removes

multiplication from the equation. Quadrupals the size of the necessary ROM

- could be a drawback

Page 22: EE800 Term Project

Conclusion: So Far....

Studied Papers that address several of the optimizations listed above

Decided on an approach to modify and test existing code

Begun modifications on the code that I’ve decided to use as a starting point

...don’t quite have synthesis results yet...

Page 23: EE800 Term Project

Papers

•“Embedded a Low Area 32-bit AES for Image Encryption/ Decryption Application”•“Exploring HW/SW Co-Design of AES Algorithm Using Custom Instructions”•“Improved Method to Increase AES System Speed”•“An AES Tightly Coupled Hardware Accelerator in an FPGA-based Embedded Processor Core”•“DSP’s, BRAM’s and Pinch of Logic: New Recipes for AES on FPGA’s”