Top Banner
© 2008 The MathWorks, Inc. ® ® Using Matlab to Aid the Implementation of a Fast RSA Cryptocore Carsten Siggaard, Senior Consultant Danish Technological Institute (DTI)
17

day1_siggaard

Nov 28, 2015

Download

Documents

SrbinForLife

day1_siggaar dday1_siggaard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: day1_siggaard

©20

08 T

he M

athW

orks

, Inc

.

® ®

Using Matlab to Aid the Implementation of a Fast RSA Cryptocore

Carsten Siggaard, Senior Consultant Danish Technological Institute (DTI)

Page 2: day1_siggaard

2

® ®

Danish Technological Institute (DTI)

Knowledge Application

Knowledge

DevelopmentKnowledge

Transfer

Page 3: day1_siggaard

3

® ®

Why Implement RSA on a Field Programmable Gate Array?� FPGAs are inherently parallel, that means faster than General

Purpose Processors but at a much lower clock speed.� Consider system using RSA encryption: If you can place the

encryption on a separate FPGA then the CPU on this platform can perform other tasks.

� RSA is a difficult algorithm to implement on FPGAs - much more difficult than the Advanced Encryption Standard (Rijndael, AES) or Blowfish. Therefore if you can implement RSA – virtually any encryption standard can be implemented.

� The core calculations in RSA are the same as those performed inother cryptographic schemes such as Diffie-Hellman key exchange and El-Gamal.

Page 4: day1_siggaard

4

® ®

Major Results

� Theoretical MAX: 3.150.000 Ops/s (Altera Stratix IV E with 1360 16-bit multipliers).

� 50% Usage (On Xilinx XC4SX35)� 1024 bit message� 1024 bit modulo, 5 bit public exponent

� Compare with AMD Opteron 2.8 GHz: 26.000 Ops/s

� @200 MHz 50.000 operations can be performed

� Power consumption 1 W (Xilinx power estimator using simulated data).

� The core can perform 35.000 cryptographic operations per second

MAX 90 W

Page 5: day1_siggaard

5

® ®

Used Toolboxes and Blocksets

� Matlab

� Fixed Point Toolbox – modelling large integers.

� Simulink� Fixed Point Blockset – modelling (large) integers.

� Stateflow was used to implement the controller.

� hdlCoder – Generating generic VHDL code

� Xilinx Sysgen for HIL

Page 6: day1_siggaard

6

® ®

Development Issues

� In cryptography all numbers are usually either bit fields or integers modulo n. Therefore use a toolbox like Fixed Point Toolbox to model these numbers.

� Model the algorithm in Simulink/Stateflow, and compare the results vs. the results from the Matlab model.

� Generate the code and run it.

� Model the algorithm in Matlab

Page 7: day1_siggaard

7

® ®

RSA Key Exchange (RFC4432)

b,p Bobs public key

Randombytes K

a,b,p

c= mb mod p

Put K intomessage m

m=(c)a mod p

Signed exchange hash

Page 8: day1_siggaard

8

® ®

What is the engine in RSA, Diffie-Hellman and El-Gamal

Xn mod mDiscrete logarith

m

modulo m is

DIFFICULT

Page 9: day1_siggaard

9

® ®

The Usual approach

� To calculate exponentiation modulo m repeatedly do:

1. X*X (square and multiply)2. Reduce modulo m by trial division or Barret’s

algorithm

� For small numbers this can be done efficiently

� For large numbers this can become a bit difficult

Page 10: day1_siggaard

10

® ®

*

x y

*

*

M

n' r

+

n

-

/

y1 y2b

t

t

m

m2

y1

y2

The Montgomery Algorithm

Calculates(a*r) * (b*r) *r-1 mod n

Result is(a*b*r) mod n

Be aware of timing/power attacks!

Page 11: day1_siggaard

11

® ®

Matlab Development

� Matlabs built-in GCD is based upon floats (Double)� A GCD must be created which uses the FI-type.

� R2 mod n must be calculated � Create a function which uses the FI type.

� A helper function which generates stimuli structures for simulink.

� The Montgomery Algorithm was developed to compare the results from this algorithm with the results from Simulink.

Page 12: day1_siggaard

12

® ®

Important topics for the NumericType and fimath objects!� Be aware of the round and overflow modes, they are

intended to be used with signal processing.

� Be aware of how the numbers expands during the calculation because� The precision have impact on the correctness

� The precision have impact on the performance.

Page 13: day1_siggaard

13

® ®

The Engine – Schoolbook multiplication

Page 14: day1_siggaard

14

® ®

HW in the LOOP

JTAG

Page 15: day1_siggaard

15

® ®

Perspectives� The title is ”Using Matlab to aid the implementation of a fast RSA

Cryptocore”

� The title should have been ”Using Matlab to do the implementation of a fast RSA Cryptocore”

� An advanced encryption algorithm can implemented using Matlab/Simulink.

� For commercial SSL offload engines certification is a must.

� The core can be implemented as an Off-the shelf service

Page 16: day1_siggaard

16

® ®

Conclusion� Correct use of Simulink with the hdlCoder results in a FAST

and efficient core.� Simulink runs faster than a comparable VHDL simulation

� More tests can be performed during the same time.

� Using a faster model-based approach make programming more efficient.

� You must have knowledge of the mapping from Simulink Blocks into HDL blocks, and the result will also depend on your synthesis tool!

� You do not need to spend time digging into subtile VHDL constructs.

� The result is virtually generic.

Page 17: day1_siggaard

17

® ®

Questions ?

� Taastrup

� Swedcert AB

� Teknologisk Institut AB, Formerly SIFU

� FIRMA 2000 Poland

� Teknologisk Institut Denmark

� Aarhus

http://www.teknologiskinstitut.se

http://www.teknologisk.dk

� Kolding, Herning, Odense, Hirtshals

[email protected]