Developing an Architecture for a Single-Flux Quantum Based Reconfigurable Accelerator F. Mehdipour, Hiroaki Honda*, H. Kataoka, K. Inoue and K. Murakami Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan *Institute of Systems, Information Technologies and Nanotechnologies (ISIT), Fukuoka, Japan E-mail: farhad @c.csce.kyushu-ua.c.jp
46
Embed
Developing an Architecture for a Single-Flux Quantum Based Reconfigurable Accelerator F. Mehdipour, Hiroaki Honda *, H. Kataoka, K. Inoue and K. Murakami.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Developing an Architecture for a Single-Flux Quantum Based Reconfigurable Accelerator
F. Mehdipour, Hiroaki Honda*, H. Kataoka, K. Inoue and K. Murakami
Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan
*Institute of Systems, Information Technologies and Nanotechnologies (ISIT), Fukuoka, Japan
IntroductionSFQ-LSRDP General ArchitectureThe Design Procedure and Tool ChainInput/ Output Nodes PlacementArea MinimizationExperimental ResultsConclusions
2009/01/29
CREST-JST SFQ-RDP Project (2006~): A Low-power, high-performance reconfigurable processor based on single-flux quantum circuits
SFQ-LSRDPProf. K. Murakami et al.
Kyushu Univ.Architecture, Compiler
and Applications
Dr. S. Nagasawa et al.
Superconducting Research Lab. (SRL)
SFQ process
Prof. N. Yoshikawa et al.
Yokohama National Univ.SFQ-FPU chip, cell library
Prof. A. Fujimaki et al.
Nagoya Univ.SFQ-RDP chip, cell library,
and wiring
Prof. N. Takagi (Leader) et al.
Nagoya Univ.CAD for logic design and arithmetic circuits
2009/01/29
Goals
Discovering appropriate scientific applications
Developing compiler tools
Developing performance analyzing tools
Designing and Implementing SFQ-LSRDP Designing and Implementing SFQ-LSRDP architecture considering the features and architecture considering the features and limitations of SFQ circuitslimitations of SFQ circuits
2009/01/29
How a reconfigurable processor works
Application codeMain
Memory
GPPComputation-intensive (critical) code
Non-critical code
Computation-intensive (critical) code
Non-critical code
Non-critical code
LSRDP
...PE PE PEPE
...PE PE PEPE
...PE PE PEPE
LSRDP
ORN
…
ORN...
2009/01/29
Single-flux quantum (SFQ)against CMOS
CMOS main issues in implementing a large accelerator: High electric power consumption High heat radiation Difficulties in high-density packing
SFQ Features: High-speed switching and signal transmission Low power consumption Compact implementation (smaller area) Suitable for pipeline processing of data stream
ジョセフソン接合
超伝導ループ
磁束量子Single Flux QuantumSuperconductivityloop
Josephson junctionジョセフソン接合
超伝導ループ
磁束量子
ジョセフソン接合
超伝導ループ
磁束量子
ジョセフソン接合
超伝導ループ
磁束量子Single Flux QuantumSuperconductivityloop
Josephson junction
2009/01/29
Outline of large-scale reconfigurable data-path (LSRDP) processor
Features:Handling data flow graphs (DFGs)
extracted from scientific applicationsPipeline executionBurst transfer of input /output
rearranged data from/to memoryReduced no. of memory accesses
(alleviating the memory wall problem)MainMemory
GPP
ORN
: : : :
ORN : Operand Routing Network
...PE PE PEPE
...PE PE PEPE
...PE PE PEPE
LSRDP
: : : ... :SB
SMAC
Scratchpad Memory
Reconfigurable data-path components:A matrix of large number of floating-
point Functional Units (FUs) Reconfigurable Operand Routing
Network : (ORN)Dynamic reconfiguration facilitiesStreaming Buffer (SB) for I/O ports
SFQ-LSRDP General Architecture
2009/01/29
LSRDP architecture
Processing Elements
FU (Functional Unit): implements basic 64-bit double-precision floating point operations including: ADD/SUB and MUL
TU(transfer unit): as a routing resource for transferring data b/w inconsecutive rows
FU TU
FU
TU FU TUTU
FU TUFU
PE including two components
Four functionalities
Input ports
Output ports
MULNode 15
TU
7 4
15
13
12
2009/01/29
PE structures
FU TU
PE Basic arch.
3-inps/2-outs
FU -
- TU
FU TU
TU TU
FU TUTUPE arch. I
4-inps/3-outs
FU --
FU TU-
- TU-
FU TUTU
FU TU-
TU TUTU
FU TU
PE arch. II
3-inps/3-outs
TU TU
- TU
FU -TU TU
FU TU
TU-TU TU
2009/01/29
Layout types- Type IW
ORN
ORN
ORN
.
.
.
…A
TM
AT
M
AT
M
AT
M
AT
M
…A
TM
AT
M
AT
M
AT
M
AT
M
…A
TM
AT
M
AT
M
AT
M
AT
M
…A
TM
AT
M
AT
M
AT
M
AT
M
ADD/SUB
MUL
TU
Each PE implements ADD/SUB and MUL
M
A
T
: ADD/SUB
: MUL
: Transfer Unit
H
Flexible but consumes a lot of resources
2009/01/29
W
ORN
ORN
ORN
.
.
.
…M TA T A T A T M T
…M TA T A T A T M T
…M TA T A T A T M T
…M TA T A T A T M T
Layout types- Type II
H
Each PE implements ADD/SUB or MUL Each PE implements
ADD/SUB or MUL
ADD/SUB TU MUL TU
2009/01/29
Maximum connection length (MCL)-Definition
(i, 0)
(i+1,0)
(i+1,j)
...
...
(i,j)
ORN
...
... ...
...
(i+1,j+L)
Longest ConnectionLength= L
(i,j+2)
(i,j+1)
(i+1,j+2)
(i+1,j+1)
ConnectionLength= 0
ConnectionLength= 2
MCL: maximum horizontal distance b/w two PEs located in two subsequent rows
2009/01/29
An ORN structure
A. Fujimaki, et al., Demonstration of an SFQ-Based Accelerator Prototype for a High-Performance Computer,” ASC08, 2008.
FPUFPUFPUFPUFPU TTTTT
FPUFPUFPUFPUFPU TT
T
TT
½CB½CB½CB½CB½CB
CB CB CB CBT2 T2
½CB½CB½CB½CB½CB
CB CB CB CBCB
CB CB CB CBCB CB CB CBCBCB
CB CB CB CBT2 T2CB CB CB CBCB
T2 CB T2 CBT2 CB T2 CBCBT2
FPUFPUFPUFPUFPU TTTTT
FPUFPUFPUFPUFPU TT
T
TT
½CB½CB½CB½CB½CB
CB CB CB CBT2 T2
½CB½CB½CB½CB½CB
CB CB CB CBCB
CB CB CB CBCB CB CB CBCBCB
CB CB CB CBT2 T2CB CB CB CBCB
T2 CB T2 CBT2 CB T2 CBCBT2
ORN is consisted of 2-bit shift registers, 1-by-2 and 2-by-2 cross bar switches
FPU
2bit shiftregister
ORN
2009/01/29
Dynamic reconfiguration architecture
FU(A op B)
TransferUnit
ImmediateRegister (64b)
ORN
MUX
・・・・・・
ImmediateRegister
・・・・・・
PEInput-AInput-B Input-C
log(2x (2MCL+1)) x 3 [b]
Conf. Reg.[bit]
Three bit-stream lines for dynamic reconfiguration of:• Immediate registers (64bit) in each PE • Selector bits for muxes selecting the input data of FUs• Cross-bar switches in ORNs
Execution
wait
Starting ofExecution
End ofExecution
Starting ofReconfiguration
End ofReconfiguration
idle
Reconfiguration
ORN
Immediate
PE
InitialState
2009/01/29
What should be decided during the design procedure
Height
PE1 ...
...
...
PEm...
.
.
.
.
.
.
.
.
.
PE2 PE3
ORN
ORN
Width
...
...
Streaming Buffer (SB)
ORN
Operand Routing Network (ORN)
Streaming Buffer (SB)
Maximum Connection Length (MCL)? ORN size and structure?