Top Banner
A 1.5 GHz AWP A 1.5 GHz AWP Elliptic Curve Crypto Chip Elliptic Curve Crypto Chip O. Hauck, S. A. Huss O. Hauck, S. A. Huss ICSLAB TU Darmstadt ICSLAB TU Darmstadt A. Katoch A. Katoch Philips Research Philips Research
31

Outline

Feb 02, 2016

Download

Documents

Kalli

A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research. Outline. Current AWP projects GATS-Chip Elliptic Curve Chip AWPs compared to sync wave pipes SRCMOS circuits Crypto background Architecture and Implementation Conclusion. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Outline

A 1.5 GHz AWPA 1.5 GHz AWPElliptic Curve Crypto ChipElliptic Curve Crypto Chip

O. Hauck, S. A. HussO. Hauck, S. A. Huss

ICSLAB TU DarmstadtICSLAB TU Darmstadt

A. KatochA. KatochPhilips ResearchPhilips Research

Page 2: Outline

2

OutlineOutline

Current AWP projects

GATS-Chip

Elliptic Curve Chip

AWPs compared to sync wave pipes

SRCMOS circuits

Crypto background

Architecture and Implementation

Conclusion

Page 3: Outline

3

Status of AWP ProjectsStatus of AWP Projects 2D-DCT:

0.6µm, being re-designed with self-resetting logic

SRT:

currently on schematics only

64b Giga-Hertz Adder Test Site:

0.6µm, almost complete, tape out in May

Crypto chip:

0.35µm, tape out in July targeted

Page 4: Outline

4

Giga-Hertz Adder Test SiteGiga-Hertz Adder Test Site

AMS 0.6µm 3M CMOS

64b Brent-Kung adder

~10k devices, ~1.3sqmm

latency ~2.5ns

cycle 1.0ns

on-chip test circuitry

Page 5: Outline

5

General Framework for PipelinesGeneral Framework for Pipelines

LogicLogic

Latch/Reg

Latch/Reg

Latch/Reg

Latch/Reg

Data

Clk

i o

Page 6: Outline

6

Some Notations...Some Notations...

register of timehold :

register of timeup-set :

register ofdelay n propagatio :

registerat skewclock eduncontroll :

clockoutput andinput between delay :

registersoutput andinput at skew lintentiona : ,

timecycleor periodclock :

stable be tohas node internal timeminimum : )(

node internal

input to fromdelay logic maximum and minimum : )(),(

delay logic maximum and minimum : ,

logicin nodesoutput gate all ofset :

maxmin

maxmin

hold

setup

d

skew

io

oi

clk

stable

t

t

t

t

T

Giit

Gi

itit

tt

G

Page 7: Outline

7

General RelationsGeneral Relations

(6) )())()((

: allfor respected be tohas width pulse minimum Similarly,

skewclock and overheadregister ation,delay variby bounded timecycle e., I.

(5) 2)( :implies (4) ivity,By transit

(4)

:(3) and (2) Combining

(3) :boundUpper

(2) :boundLower

data beforeoutput at clocks# equals latency´´,clock ``global called is

(1) at timeclock output by latched is Data

minmax

minmax

minmax

min

max

skewstableclk

skewholdsetupclk

skewholdclkdclkskewsetup

skewholddiclk

skewsetupdi

oclk

titititT

Gi

tttttT

tttTtTkttt

ttttTt

ttttt

k

Tkt

Page 8: Outline

8

Synchronous Wave PipelineSynchronous Wave Pipeline

Wave LogicWave Logic

Latch/Reg

Latch/Reg

Latch/Reg

Latch/Reg

Data

Clk1 2

Promise: higher throughput at reduced latency, clock load,

area and power

Drawback: difficult tuning of logic and delay elements1

1,0minmax

k

ttttT

k

ttttk

skewholddclk

skewsetupd

Discrete, distinct valid frequency ranges

Low high narrow frequency range

not suitable for system design

k

1k

Page 9: Outline

9

Throughput determined by longest logic path +

clock/register overhead

Fine-grain pipelining allows high throughput at the cost of

increased clock/register overhead

Synchronous PipelineSynchronous Pipeline

LogicLogic

Latch/Reg

Latch/Reg

Latch/Reg

Latch/Reg

Data

Clk

skewsetupdclk ttttTk max0,1

Page 10: Outline

10

Asynchronous Wave Pipeline (AWP)Asynchronous Wave Pipeline (AWP)

Wave LogicWave Logic

Wave Latch

Wave Latch

Wave Latch

Wave Latch

Data

req_in req_outmatched delaymatched delay

More than one data and request propagating coherently

One-sided cycle time constraint

Delay must track logic over PTV corners skewsetupd

skewholddclk

tttt

ttttTk

max

min0

Page 11: Outline

11

Example: 64-b Brent-Kung Parallel Adder Example: 64-b Brent-Kung Parallel Adder

pg PG PG G

x

o

r

0 1 2 3 4

Buffers provide

for same depth

on every logic

path

All gates in the

same column

must have the

same delay

Page 12: Outline

12

CircuitsCircuits

Logic style used has to minimize delay variation Earlier work focused on bipolar logic (ECL, CML), but

CMOS is mainstream Static CMOS is not well suited for wave piping, fixing the

problem results in more power and slower speed Pass transistor logic gives slopy edges thereby

introducing delay variation Dynamic logic is attractive as only output high transition is

data-dependant, output pulldown is done by precharge What is needed is a dynamic logic family without

precharge overhead: SRCMOS

Page 13: Outline

13

SRCMOSSRCMOS

Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced

Ninputs

output

Page 14: Outline

14

Operation of a 2-ANDOperation of a 2-AND

Page 15: Outline

15

CISCO Data Encryption Service AdapterCISCO Data Encryption Service Adapter

[Cisco Systems]

Page 16: Outline

16

DES Key Exchange using Public-Key DES Key Exchange using Public-Key Cryptosystem based on Elliptic CurvesCryptosystem based on Elliptic Curves

D Key-DES

key) (public

key) (public

key) (privatekey) (private

secret same thehave now Bob and Alice

)( :functionhash )( :functionhash

viakey session compute viakey session compute

compute compute

compute

compute

random choose random choose

Bob Alice

public ,),(

00

0

0

0

0

0

PhDPhD

DD

PkkPPkkP

Pk

Pk

kk

EPbaE

ABBA

BPk

PkA

BA

B

A

Page 17: Outline

17

Security based upon DLP: in a finite Abelian group we can easily compute given

However, is hard to compute out of and DLP extraordinarily hard for point group of elliptic

curve:

Set of solutions of cubic equation over any field is an abelian group

Why is this secure ?Why is this secure ?

GNkGp ,00pkp

k p 0p

baxxxyy 232

Page 18: Outline

18

Elliptic Curve Mathematics and AlgorithmElliptic Curve Mathematics and Algorithm Two types - supersingular and non-supersingular Non-supersingular have the highest security EC equation: baxxxyy 232

Page 19: Outline

19

Adding Two Points Over Elliptic CurvesAdding Two Points Over Elliptic Curves

Page 20: Outline

20

Optimal Normal BasisOptimal Normal Basis

Page 21: Outline

21

Multiplication over ONBsMultiplication over ONBs

Page 22: Outline

22

The Final FormulaThe Final Formula

Page 23: Outline

23

Architecture of MultiplierArchitecture of Multiplier

delay

delay

abx

abx

abx

abx

abx

abx

1

2

3

259

260

261

3_Xor

3_Xor

3_Xor

3_Xor

3_Xor

3_

Xo

r 3

_X

or

3_Xor

123

783782781

1

87

Wa

ve

la

tch

Wa

ve

la

tch

Wa

ve

la

tch

1

87

1

1

9

27

29

Pseudo NMOS SRCMOS

request

Page 24: Outline

24

Dual-rail CircuitsDual-rail Circuits

Dual-rail cross-coupled SRCMOS circuit NMOS trees are designed such that there is only one

conducting path to ground

N N

Out Out

Page 25: Outline

25

Delay Variations at Various StagesDelay Variations at Various Stages

outp uts after first stage

inputs to final stage

final output

Cycle time=666.7ps

Signals after first stage (Data path width = 87)

Page 26: Outline

26

Hierarchy of ControlHierarchy of Control

260 0260 0

alwaysalways

kkxx

left shiftleft shift

Hamming weight = 40Hamming weight = 40

EC doubleEC double EC addEC add

If x=1If x=1

ADDADD MULMUL LOAD/LOAD/STORESTORE

77 1313

1 261 11 261 1

EC arithmetic R * 2347 MUL/sEC arithmetic R * 2347 MUL/s

Finite field arithmetic R * 612567 bit/sFinite field arithmetic R * 612567 bit/s

* 261* 261

Double-and-Add Key generation Double-and-Add Key generation rate Rrate R

*(261*7+40*13)*(261*7+40*13)

Page 27: Outline

27

Control Unit ArchitectureControl Unit Architecture

Request signals trigger the state transitions. Autonomous state transitions are triggered by signal X

X

AWP

Logic

For static operation

req1reqn

Req_out

reset

OUTIN1

IN2

REG

REG

Page 28: Outline

28

High Level Control: Double-and-AddHigh Level Control: Double-and-Add

1

8

34

6

5

7

Start/LoadX, ResetZ

X=1

LoadY

X=0X=1

If K=0

Shift K

If K=1X=1

ShiftK, Double

K=0,DoubleDone

K=1,DoubleDone/Add

X=1

AddDone

X=1

X=0

X=0

If Stop=1/KP_Done

2

Level-based control

Page 29: Outline

29

Middle Level Control: EC Point DoublingMiddle Level Control: EC Point Doubling

Pulse-based control

0

X=0

1

X=1

2

X=1

3

X=1

4

X=0

5

X=1

X=1X=1

X=1X=1

X=0X=1

6362

6160

5958

StartOPAX OPBZ MULT MD

OPAAShift

OPBAMULT

MD

Page 30: Outline

30

Various States in a Pulsed ControlVarious States in a Pulsed Control

Page 31: Outline

31

ConclusionConclusion

k a b X0 Y0

A B D X Y Z

op

A

op

BDD

OUT

A

Oscillator

Counter

Controller

req1 bit

serial indelay line

serial out

A W P

UM L

req