Top Banner
Rei Ueno 1 , Sumio Morioka 2 , Naofumi Homma 1 , and Takafumi Aoki 1 A High Throughput/Gate AES Hardware Architecture by Compressing Encryption and Decryption Datapaths Santa Barbara, CA, USA, 19th August 2016 Cryptographic Hardware and Embedded Systems 1 Tohoku University and 2 NEC Central Laboratories Toward Efficient CBC-Mode Implementation
27

A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Sep 06, 2018

Download

Documents

lambao
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Rei Ueno1, Sumio Morioka2, Naofumi Homma1,

and Takafumi Aoki1

A High Throughput/Gate AES Hardware

Architecture by Compressing Encryption and

Decryption Datapaths

Santa Barbara, CA, USA, 19th August 2016

Cryptographic Hardware and Embedded Systems

1 Tohoku University and 2 NEC Central Laboratories

―Toward Efficient CBC-Mode Implementation

Page 2: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Outline

Introduction

Related works

Proposed architecture

Performance evaluation

Concluding remarks

2

Page 3: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

AES hardware architectures

3

Time for one block encryption

Are

a

Round-

based

Byte-

serial

Un-

rolled

Resource

sharing

Datapath

replication

Page 4: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

AES hardware architectures

4

[Satoh+, AC2001]

[Lutz+, CHES2002]

[Liu+, ESSCIRC2009]

[Mathew+, JSSC2011]

[Moradi+, EC2010]

[Mathew+, JSSC2015]

[Hodjat+, TC2006]

Time for one block encryption

Are

a

Page 5: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

AES hardware architectures

5

Time for one block encryption

Are

a

Round-

based

Byte-

serial

Un-

rolled

Resource

sharing

Datapath

replication

Efficient

hardware

Pipelining

Datapath

optimization

Page 6: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Practical applications

Block-chaining modes

CBC, CMAC, and CCM…

Both encryption and decryption operations

Issue on block-wise pipelining

State-of-the-art AES hardware achieves 53Gbps, but

works only on ECB or CTR mode [Mathew+ JSSC2011]

Higher throughput ≠ Lower-latency6

SSL/TLS 802.11 WLAN

Page 7: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

This work

Most area-time efficient AES HW architecture

Achieve lowest-latency with tower-field inversion

• Can perform CBC mode most efficiently

Support both encryption and decryption

Unified on-the-fly key scheduling datapath

Results

Logic synthesis with three standard CMOS technologies

• 44-72% higher throughput/gate than conventional ones

Power estimation using gate-level dynamic simulation

• Lowest-energy than ever before

7

Page 8: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Outline

Introduction

Related works

Proposed architecture

Performance evaluation

Concluding remarks

8

Page 9: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Conventional architecture 1/2 [Lutz+, CHES 2002]

Enc and Dec datapaths with additional selectors

Overhead of selectors for unification is nontrivial

False paths appear

9www.chesworkshop.org/ches2002/presentations/Lutz.pdf

Page 10: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Conventional architecture 2/2 [Satoh+, AC 2001]

Unify each pair of operation and its inverse

RoundKey requires InvMixColumns

Some MUXs in unified operations

Long critical path

10

Page 11: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Tower-field implementation

Inversion should be performed over tower-field

Tower-field inversion is more efficient than direct

mapping (e.g., table-lookup)

Two types of tower-field implementation

Type-I: only inversion is performed over tower-field

Type-II: all operations are performed over tower-field

11

Inversion

(S-box)

MixColumns

InvMixColumns

Type-I Good Good

Type-II Better Bad

Page 12: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Outline

Introduction

Related works

Proposed architecture

Performance evaluation

Concluding remarks

12

Page 13: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Overall architecture

Round-based architecture

On-the-fly key scheduler13

Round function part

Key scheduling part

Ciphertext/Plaintext

Plaintext/Ciphertext Initial key

Page 14: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Round function part

Compress encryption and decryption datapaths by

register-retiming and operation-reordering

Unify inversion circuits in encryption and decryption

• Without any additional selectors (i.e., overheads)

Merge linear operations to reduce gates and critical delay

• Affine/InvAffine and MixColumns/InvMixColumns

• At most one linear operation for a round

Type-II tower-field implementation

Isomorphic mappings are performed at data I/O

Lower-area tower-field (Inv)Affine and (Inv)MixColumns

14

Page 15: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Resister-retiming and operation-reordering

15Encryption DecryptionOriginal Proposed Original Proposed

Page 16: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Key tricks (of decryption)

16

AddRoundKey InvSubBytes

InvShiftRows

AddRoundKey

InvMixColumns

InvSubBytes

InvShiftRows

AddRoundKey

Pre-round op. Round op. Final op.

Ciphertext

PlaintextData register

Data register

Data register

Data register

Page 17: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Key tricks (of decryption)

17

Decompose InvSubByte to InvAffine and Inversion

Register-retiming to initially perform inversion in

round operations

AddRoundKey

InvShiftRows

AddRoundKey

InvMixColumns

InvShiftRows

AddRoundKey

Pre-round op. Round op. Final op.

Ciphertext

PlaintextData register

Data register Data register

InvAffine

Inversion Inversion

Data register

InvAffine

Page 18: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Key tricks (of decryption)

18

Merge linear operations as Unified affine-1

InvAffine and InvMixColumns

Distinct AddRoundKey to avoid additional selectors or

InvMixColumns

AddRoundKey

InvShiftRows

AddRoundKey

Unified affine-1

InvShiftRows

AddRoundKey

Pre-round op. Round op. Final op.

Ciphertext

PlaintextData register

Data register Data register

InvAffine

Inversion Inversion

Data register

Page 19: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Resulting datapath

19

Unified inversion

without selectorDisable inactive path

At most one linear

operation for round

Only one

4:1 selector

Page 20: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Inversion circuits

Most area-time efficient inversion circuit [CHES 2015]

20

Area

[GE]

Timing

[ns]

Power

[uW]

AT

product

PT

product

Table look-up 1,209.50 0.66 86.9 798.27 57.35

Satoh+,

AC 2001212.25 2.53 35.0 536.99 88.55

Canright,

CHES 2005175.97 2.49 35.6 438.17 88.64

Nekado+,

IWSEC 2012205.81 1.62 33.1 333.41 53.62

Ueno+,

CHES 2015170.00 1.42 19.3 243.10 27.60

Technology: TSMC 65-nm standard CMOS

Power estimation by gate-level timing simulation at 10MHz

Page 21: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Overall architecture

Round-based architecture

On-the-fly key scheduler21

Round function part

Key scheduling part

Ciphertext/Plaintext

Plaintext/Ciphertext Initial key

Page 22: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Key scheduling part

Round key generator is dominant

Unify encryption and decryption datapaths

Shorten critical delay than round function part by

NOT unifying some XOR gates

22

Not unified

XOR gates

Unified

components

Page 23: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Outline

Introduction

Related works

Proposed architecture

Performance evaluation

Concluding remarks

23

Page 24: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Performance evaluation

24

Area (GE) Latency

(ns)

Max. freq.

(MHz)

Throughput

(Gbps)

Efficiency

(Kbps/GE)

Satoh et al. 13,671.75 78.10 140.85 1.64 119.88

Lutz et al. 20,380.50 68.50 145.99 1.87 91.69

Liu et al. 12,538.75 85.25 129.03 1.50 119.75

Mathew et al. 20,639.50 97.68 112.61 1.31 63.49

This work 15,242.75 46.97 234.19 2.73 178.78

All architectures were implemented in round-based manner

Logic synthesis with area optimizations

Logic synthesis: Design Compiler

Include on-the-fly key scheduler

Page 25: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Area (GE) Latency

(ns)

Max. freq.

(MHz)

Throughput

(Gbps)

Efficiency

(Kbps/GE)

Satoh et al. 13,671.75 78.10 140.85 1.64 119.88

Lutz et al. 20,380.50 68.50 145.99 1.87 91.69

Liu et al. 12,538.75 85.25 129.03 1.50 119.75

Mathew et al. 20,639.50 97.68 112.61 1.31 63.49

This work 15,242.75 46.97 234.19 2.73 178.78

Performance evaluation

Logic synthesis with area optimizations

Logic synthesis: Design Compiler

Include on-the-fly key scheduler

Our architecture achieved highest efficiency25

+53%All architectures were implemented in round-based manner

Page 26: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Power consumption estimation

Power estimation by Power Compiler

Gate-level dynamic simulation calculating switching

activities with glitch effects

Our architecture achieved lowest power and

power-time (PT) product

26

Power [mW] @ 10 MHz PT product

Satoh et al. 4.05 316.31

Lutz et al. 3.43 234.96

Liu et al. 4.51 384.48

Mathew et al. 5.49 536.26

This work 2.76 129.63-45%-20%

Page 27: A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing

Concluding remarks

Most area-time efficient AES HW architecture

44-72% higher throughput/gate efficiency compared to

conventional ones

Lowest-energy by Power Compiler with gate-level timing

simulation

Future works

Post-synthesis evaluation

Efficient side-channel-resistant architecture

27