CASTNESS’07 workshop – Rome – 2007-01-15 Application design flow for the MORPHEUS heterogeneous dynamically reconfigurable platform Philippe BONNOT - THALES
Feb 02, 2016
CASTNESS’07 workshop – Rome – 2007-01-15
Application design flow for the MORPHEUS heterogeneous dynamically reconfigurable platform
Philippe BONNOT - THALES
CASTNESS’07 workshop – Rome – 2007-01-152
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
MORPHEUS project
EU FP6 IST project 02 7342 Started 1st January 2006 Duration 3 years
Goals : a reconfigurable architecture chip and associated toolset improving computing density, flexibility (reconfiguration time) and time-to-
market
Partners are: THALES, THOMSON, ALCATEL-LUCENT, THALES Optronics,
INTRACOM, ST, PACT, M2000, ACE, CRITICALBLUE CEA, Universities of KARLSRUHE, DELFT, Bretagne Occ. , BOLOGNA,
BRAUNSCHWEIG, CHEMNITZ, ARTTIC
CPU RU 1 RU 2 RU 3
applicationcode --------- f(.) ---------
Associatedtoolset
programming communication
configuration bitstreamsSW code
CASTNESS’07 workshop – Rome – 2007-01-153
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Contents
Introduction
MORPHEUS execution model
MORPHEUS programming model
MORPHEUS toolset
Conclusion and perspectives
CASTNESS’07 workshop – Rome – 2007-01-154
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
MORPHEUS architecture
External Memory DeviceSRAM/DRAM/FLASH
External Memory DeviceExternal Memory DeviceSRAM/DRAM/FLASHSRAM/DRAM/FLASH
SRAM/DRAM ControllerSRAM/DRAM ControllerSRAM/DRAM ControllerARM9
Core(s)
ARM9Core(s)
DMAcontrol
DMAcontrolOn-chip
SRAM(s)
OnOn--chipchipSRAM(sSRAM(s))On-chipSRAM(s)
OnOn--chipchipSRAM(sSRAM(s))On-chipSRAM(s)
OnOn--chipchipSRAM(sSRAM(s))
PACTXPP
PiCoGAM2000M2000M2000M2000M2000
AMBA AHB Bus (I/O)AMBA AHB Bus (I/O)
Network on Chip (Circuit Switched, Pipelined)Network on Chip (Circuit Switched, Pipelined)
AMBA AHB Bus (AMBA AHB Bus (ConfigConfig))
Reconf.control
Reconf.control
DataExchange
Buffer
IO Interf. /Peripherals
IO Interf. /Peripherals
Reconfigurable units
Memories
Interconnections
General-purpose
processor
Config.manager
Fine grain : eFPGA•Arbitrary logic
Medium-grained : PicoGA• Reconf. array of 4-bit oriented ALU•Target instruction level parallelism
Coarse-grain : PACT XPP•Data flow algorithm•Huge computational demand
CASTNESS’07 workshop – Rome – 2007-01-155
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Data Flow view
Ext. DDR
Master AMBA AHB/APB + DMA
On-chip SRAM
XPP HRE
PicoGA HRE
M2000 HREs
ARM + OS
IO periph.
NOC + DNA
Reconfiguration AMBA AHB + DMA
IO pads
A data stream lives only during the life of a configurationStreams are under the control of HREHRE are under the control of ARM (see control flow) for exec and config (and of CM for config)
CM
On-chip reconfiguration RAM
CASTNESS’07 workshop – Rome – 2007-01-156
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Execution Control Flow view
Master AMBA AHB/APB + DMA
On-chip SRAM
XPP HRE
PicoGA HRE
M2000 HREs
ARM + OS
IO periph.
NOC + DNA
Reconfiguration AMBA AHB + DMA
IO pads
CM
On-chip reconfiguration RAM
Ext. DDR
CASTNESS’07 workshop – Rome – 2007-01-157
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Configuration Flow view
Master AMBA AHB/APB + DMA
On-chip SRAM
XPP HRE
PicoGA HRE
M2000 HREs
ARM + OS
NOC + DNA
Reconfiguration AMBA AHB + DMA
CM
IO padsOn-chip
reconfiguration RAM
Ext. DDR
IO periph.
CASTNESS’07 workshop – Rome – 2007-01-158
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Reconfiguration Control Flow view
Master AMBA AHB/APB + DMA
On-chip SRAM
XPP HRE
PicoGA HRE
M2000 HREs
ARM + OS
IO periph.
NOC + DNA
Reconfiguration AMBA AHB + DMA
CM
IO padsOn-chip
reconfiguration RAM
Ext. DDR
CASTNESS’07 workshop – Rome – 2007-01-159
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Contents
Introduction
MORPHEUS execution model
MORPHEUS programming model
MORPHEUS toolset
Conclusion and perspectives
CASTNESS’07 workshop – Rome – 2007-01-1510
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
The application description: What programmer must do
C programming application at global level (sequences of tasks, etc) with manual annotations to identify « HW » accelerated tasks and their
synchronisation (parallel execution)
Detailing accelerated tasks may generally be complex data-streaming processing functions that requiring
data-parallelism techniques to be described and mapped A graphical tool is proposed for that.
Engineers who usually design such systems should easily handle it. However, the task must be split in sub-tasks easily interconnected with the proposed tool. Sub-tasks have to be described in C.
A direct path is available when: the accelerated task is not complex optimisation is not expected
CASTNESS’07 workshop – Rome – 2007-01-1511
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Design Flow view
DDR controller
Reconfiguration AMBA AHB + DMA
On-chip SRAM
XPP HRE
PicoGA HRE
M2000 HREs
ARM+ OS
IO periph.
NOC + DNA
CM
Master AMBA AHB/APB + DMA
Run-time scheduling
of the application
compilation-time scheduling of accelerated functions setting and execution
Run-time scheduling
of the configuration
Graphical parallel + Ckernels description of accelerated function
Sequential C-based description of the application
Configuration (bitstream, …)DMA/DNA parameters
CASTNESS’07 workshop – Rome – 2007-01-1512
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
WP2 toolset
Sequential C-based description of the application(compilation-time scheduling of accelerated functions setting and execution)
Run-time scheduling of the application
Configuration manager
- Information on accelerated function
implementations- DMA/DNA parameters
Communication mechanisms(DNA, DMA, DDR controller)
Reconfigurable Units(M2000 blocks, XPP, PicoGA)
Configuration (bitstream, …)
Sequential C-based description of the application
Graphical parallel + Ckernels description of accelerated function
Accelerated function synthesis (including memory to memory
communication aspects)
MOLEN paradigm and compiler
ECOS-based dynamicreconfiguration control
Formal verification
CASTNESS’07 workshop – Rome – 2007-01-1513
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Contents
Introduction
MORPHEUS execution model
MORPHEUS programming model
MORPHEUS toolset
Conclusion and perspectives
CASTNESS’07 workshop – Rome – 2007-01-1514
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Reconfigurable Units(M2000 blocks, XPP, PicoGA)
WP2 toolset
Run-time scheduling of the application
Configuration manager Communication mechanisms(DNA, DMA, DDR controller)
Configuration (bitstream, …)
Graphical parallel + Ckernels description of accelerated function
Accelerated function synthesis (including memory to memory
communication aspects)
ECOS-based dynamicreconfiguration control
Formal verificationMOLEN paradigm and compiler
Sequential C-based description of the application
Sequential C-based description of the application(compilation-time scheduling of accelerated functions setting and execution)
- Information on accelerated function
implementations- DMA/DNA parameters
CASTNESS’07 workshop – Rome – 2007-01-1515
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
MOLEN paradigm and compiler
C with MOLEN annotations
Expansion to MOLEN instructions
Optimized placement of configuration instructions
ARM assembly with MOLEN abstraction library
Made by the university of Delft and ACE company
An extension of the instruction set for the reconfigurable processing elements with : configuration parameter passing execution instructions
CASTNESS’07 workshop – Rome – 2007-01-1516
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
MOLEN
Example: C code: res = alpha(param1, param2);
movtx XR1 ← param1movtx XR2 ← param2set <address_alpha_set>exec <address_alpha_exec>movfx res ← XR3
Send param.
HW reconfigurationHW execution Return result
Binary Code
Reconfigurable array
f(.)
call f(.) HDL
Architecture
Retargeted compiler
CASTNESS’07 workshop – Rome – 2007-01-1517
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
MOLEN
C Source Program AnnotationsMOLEN_FUNCTION id
declares the next function to correspond with PE task id
MOLEN_PARALLEL on/off Starts/ends scope for parallel tasks execution
MOLEN_CONFLICT id1 id2 Declares configuration conflict for tasks with id1 and id2
MOLEN InstructionsSET (id) Configure PE for task id
MOVTX(id,val) Move value to task id
EXEC (id) Execute task id
BREAK Wait for all executing tasks
MOVFX(id,reg) Move data from task to reg
RELEASE (id) Release configuration id
CASTNESS’07 workshop – Rome – 2007-01-1518
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
WP2 toolset
Configuration (bitstream, …)
Graphical parallel + Ckernels description of accelerated function
Accelerated function synthesis (including memory to memory
communication aspects)
MOLEN paradigm and compiler
Sequential C-based description of the application
- Information on accelerated function
implementations- DMA/DNA parameters
Run-time scheduling of the application
Configuration manager
ECOS-based dynamicreconfiguration control
Formal verification
Sequential C-based description of the application(compilation-time scheduling of accelerated functions setting and execution)
Communication mechanisms(DNA, DMA, DDR controller)
Reconfigurable Units(M2000 blocks, XPP, PicoGA)
CASTNESS’07 workshop – Rome – 2007-01-1519
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Dynamic reconfiguration RTOS structure eCos extension made by university of Karlsruhe
CASTNESS’07 workshop – Rome – 2007-01-1520
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Dynamic reconfiguration RTOS relationships
Compiled application binary code
RTOS
Configuration Manager
Reconfigurable units
Reconfiguration and execution system
call
Reconfiguration directives
Reconfiguration control HW status
HW status
Reconfiguration and execution
control
HW status
Dynamic reconfigurat
ion
Retargetable
compilation
Spatial design
The Configuration Manager performs: Configuration priority management Configuration cache management Prefetch prediction
The RTOS performs: Priority calculation Tasks execution status management Resource request to the Configuration Manager for fine dynamic
scheduling Allocation decision (on the various reconfigurable units) (only in the
second phase of the project)
CASTNESS’07 workshop – Rome – 2007-01-1521
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
WP2 toolset
MOLEN paradigm and compiler
Sequential C-based description of the application
Run-time scheduling of the application
Configuration manager
ECOS-based dynamicreconfiguration control
Formal verification
Sequential C-based description of the application(compilation-time scheduling of accelerated functions setting and execution)
Communication mechanisms(DNA, DMA, DDR controller)
Graphical parallel + Ckernels description of accelerated function
Accelerated function synthesis (including memory to memory
communication aspects)
- Information on accelerated function
implementations- DMA/DNA parameters
Configuration (bitstream, …)
Reconfigurable Units(M2000 blocks, XPP, PicoGA)
CASTNESS’07 workshop – Rome – 2007-01-1522
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Applications in SPEAR DE
mostly regular data streaming applications
captured as acyclic graphs of tasks each task represented by its way to
(linearly) access data from/to its input/output arrays, and as a nest of loops
SPEAR DE does not participate to creating the code within a task
SPEAR DE helps the user to select and implement a mapping of the application on the computing architecture
CASTNESS’07 workshop – Rome – 2007-01-1523
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
SPATIAL DESIGN: framework architecture
Joint works of university of Bretagne Occidentale, Critical Blue and THALES company
SPEAR DE
Technology mapping
CDFG generation
Data flow of the process
SPEAR sub-function (ANSI C subset)
CascadeCriticalBlue
MADEOUBO
Bitstream
C files
CDFGglobalCDFG
Application capture and system optimizations
CASTNESS’07 workshop – Rome – 2007-01-1524
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
M0
m
CPUm
CPU
m
CPU
M M M
Mapping _ Fusion of tasks
FMOtoM
MtoOtherSegGMtom
mtoM
F
MOtoM MtoOtherSeg
GMtom mtoM
Fusion
Do 4 times
M0
m
CPU
m
CPU
m
CPU
M M M
Fusion reduces memory needs
CASTNESS’07 workshop – Rome – 2007-01-1525
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Design on reconfigurable units
HRE HRE
C
C C
F
C
NOC, AMBA
F
F
F F FC
Synthesis of subtasks from C code
Automatic generation of interconnections and control logic
BuffersBuffers
CASTNESS’07 workshop – Rome – 2007-01-1526
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
MADEO: framework architecture
EDIFe.g. NML e.g. Griffy-C
MADEO : behavioral
& physical synthesis
Archi. 1 :M2000
Archi. 2 :XPP
Archi. 3 :PicoGA
CDFG LL
CDFG HLL
rewriting
compilationsynthesis
Global CDFG(from SPEAR)
Subtasks CDFG(from Cascade)
Behavioral & physical synthesis
Open framework
Global and subtasks CDFG generation
CASTNESS’07 workshop – Rome – 2007-01-1527
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Contents
Introduction
MORPHEUS execution model
MORPHEUS programming model
MORPHEUS toolset
Conclusion and perspectives
CASTNESS’07 workshop – Rome – 2007-01-1528
Thi
s do
cum
ent
and
any
data
incl
uded
are
the
pro
pert
y of
Tha
les.
The
y ca
nnot
be
repr
oduc
ed,
disc
lose
d or
use
d w
ithou
t T
hale
s' p
rior
writ
ten
appr
oval
.©
TH
ALE
S 2
005.
Tem
plat
e t
rtco
en v
ersi
on
1.0.
2
Conclusion and perspectives
A reconfigurable heterogeneous architecture is in development
Associated toolset based on C language, composed of 3 modules: Retargetable compiler based on MOLEN paradigm Reconfiguration control added to eCos OS Accelerated function synthesis abstracts the architecture heterogeneity
Developments of application test cases in progress (Work Package 5)
A comprehensive toolset : allows application developers to fully exploit MORPHEUS architecture reduced time-to-market, improving flexibility
Second phase of the project: Parallel extensions to MOLEN instruction set Dynamic reconfiguration control Function synthesis optimizations
CASTNESS’07 workshop – Rome – 2007-01-15
Application programming design flow for the MORPHEUS heterogeneous dynamically reconfigurable platform
Philippe BONNOT - THALES