Motivation and introduction Efficient arithmetic in FPGAs Pairing on Edwards curves Tate pairing coprocessor Results and conclusions . An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields Marcin Rogawski Ekawat Homsirikamol Kris Gaj Cryptographic Engineering Research Group (CERG) Department of ECE, Volgenau School of Engineering George Mason University, Fairfax, VA, USA 11 th CryptArchi Workshop - Fr´ ejus , June 23-26, 2013 Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 1 / 25
25
Embed
An FPGA-based Accelerator for Tate Pairing on Edwards ...mason.gmu.edu/~mrogawsk/arch/cryptoarchi2013_talk.pdf · An FPGA-based Accelerator for Tate Pairing on Edwards Curves over
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
.
An FPGA-based Accelerator for Tate Pairing onEdwards Curves over Prime Fields
Marcin Rogawski Ekawat Homsirikamol Kris Gaj
Cryptographic Engineering Research Group (CERG)Department of ECE, Volgenau School of Engineering
George Mason University, Fairfax, VA, USA
11th CryptArchi Workshop - Frejus , June 23-26, 2013
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 1 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Pairing Based CryptographyPrime fields
Pairing Based Cryptography
M
BA
C B
A
E
PTA
BOBID
E
PTA
ALICE BOB
BOB
CM
Cert(BOB, P )TA
PA
IRIN
GT
radit
ional
PK
C
EncryptionKey Agreement
One−Round
Note: Cert − Certificate, TA − Trust Authority
ALICE BOB
C
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 2 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Pairing Based CryptographyPrime fields
Pairing Based Cryptography
18
A
0B
1A
1BFA
Cin
LUT
LUTFA
Cout
0A
0B
1A
1B
25 x 18
0S
1S
1S
0SM
72
MULTIPLIERADDERA
LT
ER
A
LUT
LUT
Cin
Cout
B
A
M43(41)
25(24)
18(17)
XIL
INX
0 1
0 1
Note: signed size(unsigned size)
A
B36
18
18
18
18
18
18
18
18
3636
36
36
36
54
54 54
54
hi hi
1818
lo
(A * B )+(A *2 * B ))lo lo hi lo
Note: M = ((A *B )*2 +(A *2 *B ) +hi
0
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 3 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Pairing Based CryptographyPrime fields
Prime Fields
Pairing transformations can be defined over multiple fields:binary - GF(2n), ternary - GF(3m), and prime fields - GF(p)
Binary and ternary fields are generally hardware-friendly
Prime fields are generally better for software implementations andfor cross-platform solutions
The National Security Agency (NSA Suite B Cryptography) andeCRYPT II recommend prime fields
Scope of this work
Efficient implementation of Pairing Based Cryptosystems over primefields using internal resources of modern FPGAs, such as fast carry chains(carry logic) and DSP units.
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 4 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Hierarchy of Operations in Pairing Based Cryptography (PBC)
Field Operations
Multiplication
Curve Operations
Cryptographic protocols and schemes
Pairings
SquaringPoint Addition Point Doubling
Squaring Addition Subtraction
Bilinear Operations
Extension Field Operations
Multiplication
Group Operations
Multiplication
Scalar
Hardware architecture recipe:
Optimizations on every level CAN NOT be conducted totallyindependently!
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 5 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Novel Hybrid high-radix carry save adder with parallel prefix Kogge-Stone network
a(2*w−1)..a(w)
=1..1?
s(w−1), ..., s(0)
r(w−1), ..., r(0)
c(w)
b(w−1), ..., b(0)
a(w−1), ..., a(0)
c(2*w)
=1..1?
1
1
w+1
w
w
w
cout
1
1
c(N*w) c((N−1)*w)
p(N−1)
pc(N−1)
r(N*w−1)..r(N*(w−1))
Functionality: A + B = S + C = cout, R
b(N*w−1)..b(N*(w−1))
s(N*w−1)..s(N*(w−1))
a(N*w−1)..a(N*(w−1))
block N−1
g(N−2)g(N−1)
pc(N)
High−Radix Carry Save Form
g(0)
block 0block 1
s(2*w−1)..s(w)
p(1)
ww
r(2*w−1), ..., r(w)
w
pc(1)
g(1)
A = {a(N*w−1), ..., a(0)}
B = {b(N*w−1), ..., b(0)}
R = {r(N*w−1), ..., r(0)}
S = {s(N*w−1), ..., s(0)}
Parallel Prefix Network
Kogge−Stone Adder’s
pc(N−1)
pc(N)
pc(1)
g(N−1) g(0)g(1)
p(N−1) p(1)
Carry Projection Unit
C = {c(N*w), 0 , c((N−1)*w), 0 ... c(w), 0 }w−1 w−1 w
0 − i consecutive zerosi
b(2*w−1)..b(w)
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 6 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Generic modular adder
n − number of bits of P
1 0 1 0
1 0
01
0 1
2n
− P
2n
− P
A
cout#2
cout#1
cout#1
cout#2
SUB
P
AA B
B
cout#1 cout#1
cout#2
1
R R
R
P
SUB
B
SUB
cout#1
R = A + B mod P, R =
{A + B − P, if A + B ≥ 2n ∨ A + B − P ≥ 0
A + B, otherwise
R = A− B mod P, R =
{A− B + P, if A− B < 0
A− B, otherwise
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 7 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Novel modular adder/subtractor
w=17 bits
="1...1"?="1...1"?
="1...1"?="1...1"?
Unit #2
Carry Projection
80−bit: n=521 bits, N=31 words
120−bit: n=1264 bits, N=75 words
128−bit: n=1493 bits, N=88 words
1
1
1
1
1
1
b(1)a(1)
w
w+1PE#1
c(2) 1
1
r(1)
fpc(1)
spc(1)
sel
ip(1)
p(1)
a(0) b(0) "0""1"
w
w+1PE#0
c(1)
p(0)
ip(0)
w
w
w+1
w w
w
wa(N−1) b(N−1)
PE#(N−1)
r(N−1)
spc(N−1)
fpc(N−1)
ip(N−1)
p(N−1)c(N−1)
w fg(0)fg(1)fg(N−1)
sg(0)sg(1)sg(N−1)
fp(1)fp(N−1)
sp(N−1)
sp(1)
Functionality: (A + B) mod P = R
0 1 0 1 0 1
0 1
0 1
0 1
0 1
0 1
0 1 0 1
0 1
0 1 spc(1)
SUB
Carry Projection
Unit #1
fpc(N−1)fpc(N)
fp(1)
fp(N−1)
fg(N−1)
sp(N−1)
sg(N−1)
A = {a(N−1), ..., a(0)}, B = {b(N−1), ..., b(0)}, P = {p(N−1), ... p(0)}, IP = two’s complement of P
r(0)
sel
selsel
fpc(N)
fpc(N)
spc(N)
spc(N−1)spc(N)
sg(0)
sp(1)
fpc(1)
fg(0)
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 8 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Novel Multiply-and-Add DSP-based multiplier 1/3
17
80−bit: N=31, M=22
120−bit: N=75, M=53
128−bit: N=88, M=62
DSP Slice
b(j) b(j)
17 24
41
"0"
7
7
177
1717
2
17
a(0) b(j)
s(0)
"0"
s(−1)
"0"
24
Radix 2 operations17
24Radix 2 operations
c(1)
24
s(−1)s(0)
17
Switch Radix 2
to Radix2 Sum
24
zc(−1)
ss(−1)
cc(−1)
17
24
2
"0"
"0"
24
cc(−1)ss(−1)
rr(0)
2
"0"
"0"
cc(0)ss(0)
24
241 11
a(N−1) a(N−2)
zc(N)
cc(M)
s(N) c(N) s(N−1) c(N−1)s(N−2)
s(N) s(N−1)
ss(M)
ss(M) cc(M)
rr(2M−1) rr(2M−2) rr(M−1) rr(M−2)rr(M−3)
Functionality: A * B = (RR, RC), where
rc(M−1)rc(M)rc(2M−1)rc(2M)
24
24
2
"0"
"0"24
24
1
ss(M−1) cc(M−1)
A = {a(N−1), ..., a(0)}, B = {b(N−1, ...,0}, RR = {rr(2M−1), ..., rr(0)}, RC = {rc(2M−1), ..., rc(0)}
....
zc(0) = 017
zc(−1) = 017
15
15
172
zc(1) = 0 , c(1)
zc(N−1) = 0 , c(N−1)
zc(N) = 0 , c(N)15
to Radix2 Carry24
24
17
Switch Radix 2
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 9 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Novel Multiply-and-Add DSP-based multiplier 2/3
Phase III: clock cycle M−1
24
24
2
"0"24
24
1
ss(i) cc(i)
rr(i+M−1)
24
24
2
"0"24
24
1
ss(i) cc(i)
rr(i+M−1)
24
24
2
"0"24
24
1
ss(i) cc(i)
rr(i+M−1)
"0" "0" "0"
rc(i+M) rc(i+M) rc(i+M)
Three operational phases of the selected processing element:
Protocol Xilinx Virtex-6 Altera Stratix IV & V#bits of A processed per clock cycle n n
#bits of B processed per clock cycle 24 36
#clock cycles per multiplication d n24e d n
36e#DSP units d n
17e d n36e
Meaning of DSP unit DSP48E1 slice Half-DSP block
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 10 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Novel Multiply-and-Add DSP-based multiplier 3/3
Novel double speed mode: One multiplication using two multipliers
Let A,B be M ∗ w -bit numbers, and B = BH ∗ 2w∗M
2 + BL, then themultiplication of A and B can be computed as follows: A ∗ B = A*(BH ∗ 2
w∗M2 + BL) = A ∗ BH ∗ 2
w∗M2 + A * BL.
Very similar idea to BiPartite, Kaihara et al. [CHES’05], TiPartitemultiplication, Sakiyama et al. [Integration’11]
When to use it? When we conduct the computations of a singlemultiplication, but we have two multipliers available! Or twomultiplications and we have four multipliers available!
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 11 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Arithmetic for Special Primes
Reductions modulo 2n+1 and modulo 2n-1 are very efficient.(Problem: Not every number of this form is prime!)
Primes of a form (2a ± 2b ± 1 and 2a ± 2b ± 2c ± 1) wereintroduced by Solinas [NSA’99], Solinas prime’s arithmetic isrecommended by NIST for digital signature schemes [FIPS-186]!
Comment:
But it is not applicable for all primes in Solinas form!(e.g.: 2520 + 2363 − 2360 − 1)
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 12 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Field operationsSolinas primes and Barrett reduction
Novel solution: Barrett-based reductor for Solinas primes
The multiplication by p and µ can be replaced by multi-operandaddition!
The prime divisor r (2a + 2b − 1) has always a form of 10..01..1(computationally cheaper - check next slide!).
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 19 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Elliptic curves cryptographyPairingParamters generationTate pairing on Edwards curves
General algorithm for the modified Tate pairing
Algorithm 2 Miller’s algorithm for computing modified Tate pairing
Require: Points P and φ(Q), prime divisor r = (rl−1 ... r0), field order p, and embedding degree k, hP,Q arational function
Ensure: F = e(P, φ(Q))
1: F = 1, R = P2: for i = l − 2 downto 0 do3: G ← hR,R (φ(Q)) and R = 2R /* Algorithm 3: 14 multiplications */
4: F = F 2 ∗ G /* Algorithm 5 and 6: 2+4 multiplictions */
5: if ri = 1 then
6: G ← hR,P (φ(Q)) and R = R + P /* Algorithm 4: 24 multiplications */
7: F = F ∗ G /* Algorithm 5: 2 multiplications */
8: end if9: end for
10: return F ← Fpk−1
r /* Algorithm 7: next slide */
What if for the substantial number of P and Q the result of e(P,Q) = 1?
Distortion maps! φ(Q) = (xQ i, 1yQ
), where i2 = −1. Consequently, F and G are the complex numbers.
Other names: twists or operations in the extension field x2 + 1, in this case.
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 20 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Elliptic curves cryptographyPairingParamters generationTate pairing on Edwards curves
Final exponentiation (Alg. 7) - novel approach
Traditional optimization method, the Frobenius mapping is notapplicable to supersingular curves. F (p−1) and F−1 are equallydifficult!The fixed exponent e = p2−1
r , after Booth recoding can berepresented as five term Solinas prime (e.g.: 128-bit:22716 + 22461 − 22449 + 22448 − 21225)The computation of the Final exponentiation
(F e = F 2a1 +2a2−2a3 +2a4−2a5= F 2a1
F 2a2F 2a4
F 2a3 F 2a5 ): a1 complex squaringsand the three complex multiplications of five intermediate valuesTwo modular multiplications (complex squaring) can be computedusing 4 multipliers working in the double speed mode
Final result reconstruction:
F ← Ru
Rd← xi+y
vi+z (vi + z)−1 = (v ′i + z ′) =
{v ′ = −v
z2+v2
z ′ = zz2+v2
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 21 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
Coprocessor overview
The overview of novel coprocessor block diagram
R = {r(N−1), ... r(0)}
for the double speed modeAdditional circuit
System Multipliexer
Modular Reductor
R
17N
out
A
17
PISO
17N........
in17
17
DPRAM
17
17 17a(0) b(0)
A B
a(0)
DPRAM
#0
r(0)
b(0)
Multiplier #1
A B
Multiplier #2 #3
A B A B A B A B
ModularAdder
#4Multiplier Multiplier
17N
17N
34N
17N 17N
b(N−1)a(N−1)
#N−1
b(N−1)a(N−1)
r(N−1)
Number of 17−bit words:
N = 31 (80−bit), 75 (120−bit), 88 (128−bit)
A = {a(N−1), ... a(0)}
B = {b(N−1), ...b(0)}
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 22 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
ResultsConclusions
FPGA-based hardware architectures - preliminary results Stratix V
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 23 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
ResultsConclusions
Speed records for the range of 120-128-bits security for the pairing transformations
over prime fields
Publication Curve Type Security Type Platform Latency
This work twisted supersingular Edwards 120-bit Tate Stratix V 0.54msCheung et al. [CHES’11] Barreto-Naehring 126-bit Opt.-Ate Virtex-6 0.57ms
This work twisted supersingular Edwards 128-bit Tate Stratix V 0.70msThis work twisted supersingular Edwards 120-bit Tate Stratix IV 0.70msBeuchat et al. [Pairing’10] Barreto-Naehrig 126-bit Opt.-Ate Core i7 2.8 0.83ms
This work twisted supersingular Edwards 128-bit Tate Stratix IV 0.88msThis work twisted supersingular Edwards 120-bit Tate Virtex-6 1.05msCheung et al. [CHES’11] Barreto-Naehrig 126-bit Opt.-Ate Stratix III 1.07ms
Fan et al. [Computers’11] Barreto-Naehrig 128-bit Opt.-Ate Virtex-6 1.36ms
This work twisted supersingular Edwards 128-bit Tate Virtex-6 1.05msFan et al. [Computers’11] Barreto-Naehrig 128-bit Ate Virtex-6 1.60ms
Cheung et al. [CHES’11] Barreto-Naehrig 126-bit Opt.-Ate Cyclone II 1.93ms
Comment:
The fastest reported pairing coprocessor over prime fields for securitylevel above 120 bits!
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 24 / 25
Motivation and introductionEfficient arithmetic in FPGAs
Pairing on Edwards curvesTate pairing coprocessorResults and conclusions
ResultsConclusions
Major Contributions
Novel, low latency, generic, optimized for fast carry-chains(FPGA), hybrid adder for big numbers (thousand of bits andmore)
Solinas primes-based, DSP-oriented, modular arithmeticarchitectures for addition, subtraction and multiplication
First hardware architectures for 80, 120 and 128-bit pairing onEdwards curves
Our coprocessor (on Stratix V) computes 120 and 128-bitsecure pairing over prime field in less than 0.54 and 0.70 ms,respectively. It is the fastest pairing implementation overprime fields in this security range
Cryptoarchi ’13 M.Rogawski, E.Homsirikamol, K. Gaj An FPGA-based Accelerator ... 25 / 25