Top Banner
Final Presentation Encryption on Embedded System Supervisor : Ina Rivkin students : Chen Ponchek Liel Shoshan Spring 2014 Part B
25

Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Jan 19, 2016

Download

Documents

Franklin Quinn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Final PresentationEncryption on

Embedded System

Supervisor: Ina Rivkinstudents: Chen Ponchek

Liel Shoshan

Spring 2014Part B

Page 2: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Motivation• Now days, there are many portable storage systems with

large memories which contains valuable data (such as disk on key, tablets, etc.)

• Therefore there is a concrete need for portable cryptography systems which are suitable for such devices.

• In our project, we will aspire to provide a suitable system which will answer this need.

Page 3: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Project Goalmain goal:

Implementation of efficient data cryptography

embedded system using AES algorithm

and finding the suitable architecture

for portable system.

Page 4: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Project Specifications• Implementing on a Zync SOC by Xilinx.

• Suitable for portable systems (Disk-on-Key, tablets, etc.) - low power system.

• Transparent system (while storing/loading files) - The cryptography system won’t create

traffic bottle necks.

• Finding the best architecture - according to the requirements above:

• Profiling AES algorithm.

• Finding the balance between using the ARM processor and using the FPGA

(the hardware accelerator needs more power).

Page 5: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

AES Algorithm• Advanced Encryption Standard, also known as “Rijndael”, is a block cipher.

• The cipher is iterative, quick and comfortable to implement both by software

and hardware, and it doesn’t have high memory requirements.

• Most of the AES calculations are made through 10 rounds.

• The Key Expansion Schedule creates 10 Round Keys from the initial cipher key.

• In each round the state block is described as a 2D, 4X4 array of bytes.

• Each round consists of 4 steps:

1. SubBytes

2. ShiftRows

3. MixColumns

4. AddRoundKey

KeyExpansion

Key

Page 6: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

System Top View

zedb

oard

DDRARM PSsoftware

ProgrammableLogichardware

UART

AXI4-bus

BRAM

Page 7: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Software Implementation• Each step is implemented as a separate function.• Each function is independent of the other functions.• Code optimizations improved performance significantly.• The encryption rate we achieved was 323 KB/s. • 1.5 times slower than the typical maximum data rate in USB (The typical rates

are around 0.5 MB/s.)

• Conclusion: A hardware accelerator is needed.

Page 8: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Software ProfilingDistribution of software’s running time by functions

Page 9: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Software Implementation ProfilingEncryption Time-Split

KeyExpansion

Key

Page 10: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Hardware/Software Balancing• The most time consuming function is Mix Columns.• Concurrency can be achieved by running Key Expansion and

the encryption process simultaneously.• To minimize data traffic between PS and PL, Add Round Key

should be implemented in hardware.

Page 11: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Integrated System Block diagram

zedb

oard

DDRARM PSsoftware

ProgrammableLogichardware

AXI4-bus

Add Round KeyShift Rows Key Expansion

Mix ColumnsSub BytesUART

Page 12: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

IntegratedSystem Flow Diagram

SubBytes

ShiftRows

AddRoundKey

KeyExpansion

ARM PSsoftware

Programmable Logichardware

x 9

Key

MixColumns

AddRoundKey

SubBytes

ShiftRows

AddRoundKey

State

Page 13: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

zedb

oard

Integrated System Block Diagram

DDR

BRAMAXI4-bus

BRAMAXI4-busKey

ExpansionBRAM

Mixor

MixColumn

Add Round

Key

ARM Processing System

Programmable Logic

UART

Page 14: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Handshake• Synchronizing between ARM processor and hardware modules. • Communication protocol via BRAM.• Processor side:• Processor writes data to BRAM.• Processor rising the flag – designated address on BRAM.

• PL side:• Waiting for flag – continuously reading from designated address.• Executing.• Initiating the flag.

• There is no need for synchronization in the opposite direction – hardware always completes its run before the processor needs the data.

ARM PL

Key Expansion BRAM

MixorMix

Column

Add Round

Key

BRAMAXI4-bus

BRAMAXI4-bus

BRAMAXI4-bus

BRAMAXI4-bus

Page 15: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Hardware Implementation Key Expansion

• The key expansion schedule gets the initial cipher key as its only argument, and outputs the extended key.• It reads the cipher key from the BRAM, written there by the PS.• The output is written to a different BRAM.

• The procedure is independent of the other functions, therefore it can operate as a background task, simultaneously to the rest of the code.• Concurrency of ARM and FPGA was achieved by hardware implementation.

ARM PL

Key Expansion BRAM

MixorMix

Column

Add Round

Key

BRAMAXI4-bus

BRAMAXI4-bus Key Expansion BRAMBRAM

Page 16: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

FINISHaddress_sig 0x0

BRAM_WE_B1111data_out_sig 0x0

Expandena_key 1

SaveCol4address_sig 0x1C

InitFlagaddress_sig 0x0

BRAM_WE_B1111data_out_sig 0x0

flag = 0

flag = 1

RdCol4address_sig 0x1C

RdCol3address_sig 0x18

RdCol2address_sig 0x14

RdCol1address_sig 0x10

idleaddress_sig 0x0

valid = 0

valid = 1 i < 43

Write2BRAMaddress_sig 0x20 + 4i

data_out_sig key_out [1407-32i downto 1407-32(i+1)+1]

BRAM_WE_B1111i := i +1

i = 43

Key Expansion state machine flow

Page 17: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Key Expansion ChipScope waveform

• Reading the cipher key from BRAM

• Expanding the key and writing to BRAM

DATA_IN

ADDRESS

DATA_OUT

DATA_INADDRESSDATA_OUT

DATA_INADDRESSDATA_OUT

Page 18: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Hardware Implementation Mix Columns and Add Round Key

• Mixor is a combined module implements both Mix Columns and Add round Key.• Both round key and state block are the module’s inputs.• Reads the state block from a BRAM, shared with the PS.• Reads the round key from a BRAM, written there by the Key Expansion

module.• The output is written to the shared BRAM, from which the PS reads the current block state.

ARM PL

Key Expansion BRAM

MixorMix

Column

Add Round

Key

BRAMAXI4-bus

BRAMAXI4-bus

MixorMix

Column

Add Round

Key

BRAM

BRAM

Page 19: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

InitFlagADDRESS_DATA 0x0

DATA_OUT_DATA 0x0BRAM_WE_B_{num_col} 1111

MixADDRESS_DATA 0x8

DATA_OUT_DATA ( col_mixed ) xor ( col_in_key )BRAM_WE_B_{num_col} 1111

SaveCol

flag = 1

RdColADDRESS_DATA 0x4

ADDRESS_KEY 0x20 + 4x[ num_col + 4x( round + 1 ) ]

flag= 0

idleADDRESS_DATA 0x0

Mixor state machine flow

Page 20: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Mixor ChipScope waveform

• Mixor’s module execution over the 1st column

data_in_data1

bram_we_1

data_out_data

data_in_key

address_key

col_mixed

address_data

Page 21: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Hardware Blocks ImplementationPerformance

•Mixor• HW implementation - 24 cycles = 0.24 µsec• SW implementation - 2.545 µsec

• ~10 times faster• Key Expansion• HW implementation - 93 cycles = 0.93 µsec• SW implementation - 15 µsec

• ~15 times faster

Page 22: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.
Page 23: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.
Page 24: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Conclusions• The hardware modules are much faster than the software functions.• The data transmission’s overhead between PS and PL significantly

decreases the system’s speed and causes to a sever slowdown in performance - 68% of running time.• Main conclusion• The integrated system is best suitable for executing intensive

calculations, and low data traffic algorithms.• The AES algorithm has high data traffic and therefore the hardware

accelerator did not cause significant performance improvements.

Page 25: Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.

Demonstration