Rei Ueno 1 , Sumio Morioka 2 , Naofumi Homma 1 , and Takafumi Aoki 1 A High Throughput/Gate AES Hardware Architecture by Compressing Encryption and Decryption Datapaths Santa Barbara, CA, USA, 19th August 2016 Cryptographic Hardware and Embedded Systems 1 Tohoku University and 2 NEC Central Laboratories ―Toward Efficient CBC-Mode Implementation
27
Embed
A High Throughput/Gate AES Hardware Architecture by Compressing Encryption … · 2016-09-16 · and Takafumi Aoki1 A High Throughput/Gate AES Hardware Architecture by Compressing
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Rei Ueno1, Sumio Morioka2, Naofumi Homma1,
and Takafumi Aoki1
A High Throughput/Gate AES Hardware
Architecture by Compressing Encryption and
Decryption Datapaths
Santa Barbara, CA, USA, 19th August 2016
Cryptographic Hardware and Embedded Systems
1 Tohoku University and 2 NEC Central Laboratories
―Toward Efficient CBC-Mode Implementation
Outline
Introduction
Related works
Proposed architecture
Performance evaluation
Concluding remarks
2
AES hardware architectures
3
Time for one block encryption
Are
a
Round-
based
Byte-
serial
Un-
rolled
Resource
sharing
Datapath
replication
AES hardware architectures
4
[Satoh+, AC2001]
[Lutz+, CHES2002]
[Liu+, ESSCIRC2009]
[Mathew+, JSSC2011]
[Moradi+, EC2010]
[Mathew+, JSSC2015]
[Hodjat+, TC2006]
Time for one block encryption
Are
a
AES hardware architectures
5
Time for one block encryption
Are
a
Round-
based
Byte-
serial
Un-
rolled
Resource
sharing
Datapath
replication
Efficient
hardware
Pipelining
Datapath
optimization
Practical applications
Block-chaining modes
CBC, CMAC, and CCM…
Both encryption and decryption operations
Issue on block-wise pipelining
State-of-the-art AES hardware achieves 53Gbps, but
works only on ECB or CTR mode [Mathew+ JSSC2011]
Higher throughput ≠ Lower-latency6
SSL/TLS 802.11 WLAN
This work
Most area-time efficient AES HW architecture
Achieve lowest-latency with tower-field inversion
• Can perform CBC mode most efficiently
Support both encryption and decryption
Unified on-the-fly key scheduling datapath
Results
Logic synthesis with three standard CMOS technologies
• 44-72% higher throughput/gate than conventional ones
Power estimation using gate-level dynamic simulation
• Lowest-energy than ever before
7
Outline
Introduction
Related works
Proposed architecture
Performance evaluation
Concluding remarks
8
Conventional architecture 1/2 [Lutz+, CHES 2002]
Enc and Dec datapaths with additional selectors
Overhead of selectors for unification is nontrivial