POLITECNICO DI MILANO
High Performance Processors and
Systems PdM – UIC joint master 2007PdM – UIC joint master 2007
Instructor: Prof. Donatella SciutoInstructor: Prof. Donatella Sciuto
HPPS @ PdM – March 2007HPPS @ PdM – March 2007
2
OutlineOutline
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessio MontoneAlessandro MeroniSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
3
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessio MontoneAlessandro MeroniSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
DDynamicynamic Re Reconfigurabilityconfigurability AAppliedpplied toto M Multi-FPGAulti-FPGA
SSystemsystems
DReAMS
DReAMSDReAMS
Dynamic ReconfigurabilityApplied to Multi-
FPGA SystemsBranch of DRESD projectInherits architectures and tools
Automatic workflow from VHDL system description to FPGA implementation
VHDL parsing and system simulationSystem creation over a specific architectureBitstream creation and download onto FPGAs
DReAMS
7
Project OrganizationProject Organization
First Phase (15 Mar- 15 Apr) [DONE]Goals
State of the art analysisProposed approach: basic idea
Second Phase (15 Apr – 15 May) [PARTIALLY DONE]Goal
Partitioning algorithm: development and implementation
Third Phase (15 May – 15 June) [TODO]Goal
Algorithm experimental evaluationPhysical evaluation using the DReAMS architecture
8
PartitioningPartitioning
Two kinds of multi-FPGA partitionings:
Topology-awareArchitecture topology is an inputNo optimizaiton in the no. Of FPGAsAssociation between the (larger) system graph and the (smaller) architecture graph => PARTITIONING
Topology-freeArchitecture topology is not providedInput: dimension and communication features of FPGAsMinimization of number of FPGAsPlace and Route after partitioning
9
The algorithm (1)The algorithm (1)
Copes with topology-free problemStructural approach
Exploits the design hierarchyTries to keep modules' integrity
Several advantages, less work to be done
ObjectivesMinimize the number of FPGAsMinimize inter-FPGA communication
Greedy set-covering algorithm
10
The algorithm (2)The algorithm (2)
Nodes can be: COVERED, UNCOVERED, PARTIALLY COVEREDStop condition: TOP = COVEREDIn the exploration of the tree, precedence to siblings w.r.t. children => keep module integrity
Procedure cover(set of nodes)Called recursively, starting from TOP
11
What’s next?What’s next?
Data structure developmentAlgorithm C++ implementationFirst verification and “tuning”Obtain hierarchical trees from synthesis tool (Synplify)VerificationPhisical evaluation
Bound with the other branch of DReAMS
12
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESSimone CorbettaAlessandro MeroniAlessio Montone
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
ChimeraChimeraMulti-FPGAs Architecture DefinitionMulti-FPGAs Architecture Definition
Matteo [email protected]
14
Project OrganizationProject Organization
1st PhaseGoals:
Digilent Spartan-3 Starter Board studyBoards connection
2nd PhaseGoals:
Communication between two Microblaze soft-processorsGPIO integration in the architecture
3rd PhaseGoal
Interrupt handlingDesign a simple distributed application to verify the correctness of the proposed approach
Second Phase: results (1/2)Second Phase: results (1/2)
Communication between two Microblaze soft-processorsDevelopment of a display controller to visualize the data flow
16
Second Phase: results (2/2)Second Phase: results (2/2)
Higher architecture portability through the use of the GPIO IP-Core.
17
What’s next ...What’s next ...
Interrupt handling, also through the use of the Interrupt Controller
Development of a simple application to verify the correctness of the proposed approach
18
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessio MontoneAlessandro MeroniSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
CITiESCITiES
CITiESCITiES
POLITECNICO DI MILANO
PProcessingrocessing E Elementslements REREconfigurationconfiguration I Inn
RReconfigurableeconfigurable A Architecturesrchitectures
Alessio [email protected]
Second Phase GoalsSecond Phase Goals
Create a software thattakes in input .bmm (BRAM used) and .elf (code) fileoutputs: memory configuration bitstreamis device parametricis tailored for Xilinx Virtex II Pro Family FPGAs
Second Phase: results - ISecond Phase: results - I
Second Phase: results - IISecond Phase: results - II
Second Phase: results - IIISecond Phase: results - III
Output binary file is a downloadable bitstream
Target FPGA
Processor #BRAM Blocks
#BRAM column
s involve
d
marBram
execution time(ms)
Commands overhead(approx.
%)
Bitstream size
(Kbytes)
VP7 Microblaze 4 2 179 1.5 56
VP7 PPC-405 8 3 203 1.5 84
VP7 Microblaze 8 5 263 1.5 136
VP20 PPC-405 8 3 248 1.5 112
VP20 Microblaze 8 5 326 1.5 160
VP20 Microblaze 16 5 326 1.5 160(on a Core 2 Duo @ 2.33 GHz)
What’s next…What’s next…
Third phase in detailsPerform functional tests on a single output bitstream
Debug both bitstream structure and software structure
Test a complete processing elementConfiguring it independently from the rest of the architectureswapping its memory content
27
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessio MontoneAlessandro MeroniSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
RReconfigurationeconfiguration O Orientedriented MeMetricstrics
Alessandro [email protected]
29
Second Phase ObjectivesSecond Phase Objectives
Real World Applications AnalysisApplications AnalysisCommon Scenarios IdentificationCharacteristics Evaluation
Metrics Evaluation Through Graphics supported by a Prototype Analyzer (C/C++)
Performance/AreaMaster/Slave
Different Network Simulators AnalysisNS2OMNeT++SSFNetOPnet
30
Application AnalysisApplication Analysis
It’s possible to make a classification that binds together the majority of these applications:
31
Metrics EvaluationMetrics Evaluation
We need to consider different metrics w.r.t. different scenarios
which FPGAs ?how many elements ?which configuration ?
By now, there is a qualitative estimation of some metrics’ trends supported by a Prototype Analyzer
Throughput and Area w.r.t. the # of elements of the system (Master/Slave)no configuration informationno FPGA information...
32
NS-2good hardcoded modulesbad flexibilitymodels are “flat”, cannot create subnetworksdifficult separation of concepts: different parameters in same TCL script
OMNeT++good not only for networks (MP systems and hw architectures)very flexiblesupport for hierarchical module structureenforces the separation between model and experiments
all parameters in the omnet.ini file
SSFNetnot yet supported: last release on January 15, 2004
OPnetnot free
Simulators AnalysisSimulators Analysis
33
Next Phase...Next Phase...
Simulator ExploitationUse of OMNeT++ to gain information w.r.t. the Throughput and other useful metrics
Graphics Redefinition and Expansion
Analyzer Improvement
34
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessandro MeroniAlessio MontoneSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
REREconfigurableconfigurable CCommunicationommunication
IInfrastructurenfrastructure F Foror EEmbedded-systemsmbedded-systems
Simone [email protected]
3636
April 2007/May 2007: April 2007/May 2007: objectivesobjectives
Extend survey Reconfigurable communication infrastructure exploration
De Micheli Verilog description analysisXPIPES architecture analysisXPIPES synthesis on Xilinx FPGAs
Area requirements
Applications and scenarios of dynamic reconfigurabilityCommunication infrastructure model
First ideasBasis for next-step implementation
3737
April 2007/May 2007 : work April 2007/May 2007 : work (1/3)(1/3)
XPIPES ArchitectureLayered approach to decouple communication communication from computationcomputation
Network switches Network switches and network interfacesnetwork interfacesXPIPES Methodology
XpipesCompiler XpipesCompiler used to automatically generate synthesizable Verilog-based architecture
Table 1: Area requirements of a single-master/single-slave Network-on-Chip
3838
April 2007/May 2007 : work April 2007/May 2007 : work (2/3)(2/3)
Scenarios and applicationsRATIONALE: need of a concrete comparative term of performances of our solution w.r.t third-party ones
NO existing standard benchmark!
Different applications and market segmentsAutomotive
Aerospace & defense
Industrial
Scientific & medical
3939
April 2007/May 2007 : work April 2007/May 2007 : work (3/3)(3/3)
Communication infrastructure model (first ideas)
Layered approachFlexibility and independent optimization
Decoupling communication from computationSwitching and interfacing elements are crucial
Physical and logical addressing methodsUseful for task relocation
Adaptive architectureAchieving fault-toleranceIntegrable with legacy-systems
Bridge is required
Plugging-in and -off of IP-Cores
40
May 2007/June 2007: May 2007/June 2007: objectivesobjectives
XPIPESPossible improvements in the context of dynamic reconfiguration
Implementation (Verilog)
Basic essential elements for the communication infrastructure (reconfigurable switch)
Testing
41
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessio MontoneAlessandro MeroniSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
OOperatingperating Sy System support stem support forfor R Reconfeconfiigurablegurable S SoCoC
POLITECNICO DI MILANO
Development of an OS Development of an OS architecture-independent architecture-independent
layer for dynamic layer for dynamic reconfigurationreconfiguration
Ivan [email protected]
4444
Project OverviewProject Overview
Study of current operating system support for dynamic-reconfigurable architectures
Two solutions inside DRESD group
Definition of an intermediate layer for dynamic reconfiguration support
Architecture independentDistribution independent
4545
Second Phase: GoalsSecond Phase: Goals
Implementation of the DRESD operating system solution
Old kernel recovery Hardware architecture replication using ISE and EDK 9.1 version, on Xilinx Virtex II Pro VP7
Layer definitionComparison between existing solutionsBasic definition of the boundaries of the new intermediate layer
46
Second Phase: Results (1 of 2)Second Phase: Results (1 of 2)
Recovery of DRESD solution for CaronteStatic hardware architectureBootmanager recoveryBootstrap from flash memoryBase kernel
Hardware architectures upgradeNew synthesis tools (Xilinx ISE and EDK 9) and new cores
Kernel compilationRecovery of dynamic-reconfiguration support
46
4747
Second Phase: Results (2 of 2)Second Phase: Results (2 of 2)
Basic definition of the architecture-independent layer
Factorization of existing solutionsInterface to the reconfiguration controller driverAddress space manager moduleDriver loader moduleCore caching and placement module
Introduction of new elementsReconfiguration scheduler
4848
What’s next…What’s next…
Third phase:
Complete definition of the boundaries of the new intermediate layer
Full implementation of DRESD existing solutionsModule-based reconfigurable architecture Virtex II Pro VP7Synthesis flow based on Xilinx ISE and EDK 8.2 and 9.1Porting of YaRA solution on Virtex II Pro VP7
49
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessio MontoneAlessandro MeroniSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
5151
Project OrganizationProject Organization
1st phase (15 March – 15 April): BudgetingStudy of the state of art
2nd phase(15 April – 15 May): Realization phaseCostruction of the entire tools based on prevoiusly separated toolsImplementation of a innovative work flow
3rd phase (15 May – 15 June): Project’s validationValidation on real architecture and performance’s quotation
5252
Second Phase: resultsSecond Phase: results
Output files: system.vhd; inserted device wrapper, ngc project files
System.vhd scomposition (ArchGen based)
Output files: fix.vhd and top.vhd
Comunication infrastructure generation (COMiC based)
Output file: <file name>.nmc <file name>.xdl
Collect information about comunication infrastructure from
xdl file
Output file: port.cfg
Adding information to top.vhd
Start related flow tool
Generation of the UCF file
5353
Second Phase: resultsSecond Phase: results
Basic previously tools:ArchGenComICYaRA scriptInCA script
Generated toolEditing ArchGen output file (top.vhd)Parsing xdl to collect information on busmacroTraslation of YaRA script into sequence of C++ instruction to be include into the earendil tool chain.
54
State of the progress
Tool phases
VH
DL
ge
nera
tion
VH
DL
ge
nera
tion
UC
F
gene
ratio
n&
Co
m.
Inf.
Gen
.
UC
F
gene
ratio
n&
Co
m. I
nf.
Gen
.
Bits
trea
m g
en.
Bits
trea
m g
en.
Me
rgin
g p
has
eM
erg
ing
ph
ase
Pla
nnin
gP
lann
ing
54
Second Phase: resultsSecond Phase: results
Tool phases
VH
DL
g
en
era
tion
VH
DL
g
en
era
tion
UC
F
ge
ne
ratio
n&
Co
m.
Inf.
Ge
n.
UC
F
ge
ne
ratio
n&
Co
m.
Inf.
Ge
n.
Bits
tre
am
ge
n.
Bits
tre
am
ge
n.
Me
rgin
g p
ha
seM
erg
ing
ph
ase
Pla
nn
ing
Pla
nn
ing
Manual process
Automated process
Planning
VHDL gen.
UCF and Com. Inf. Gen.
Bitstream gen.
Merging phase
Planning
VHDL gen.
UCF and Com. Inf. Gen.
Bitstream gen.
Merging phase
5555
What’s next…What’s next…
Automated switchingThe tool must be able to recognize from the device type the typology of the communication infrastructure to create and the appropriate flow design
Upgrade of the communication infrastructure with a deep integration of ComIC tool in the project
ComIC maybe considered a extension of ArchGen, this guide us to a different approach that free us from the “parserization” of the top file and the xdl file witch deline the bus
Patch for ComIC to create a bus Wishbone compatible
The idea is to create a complete bus witch presents all signals proposed by Wishbon protocol
5656
What’s nextWhat’s next
DReAMSMatteo MurgidaAlessandro Panella
CITiESSimone CorbettaAlessandro MeroniAlessio Montone
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
PolarisPolaris
58
PolarisPolaris
Create an integrated HW/SW system to manage 2D reconfiguration
SW side:Maintain information on FPGA statusDecide of how to efficiently allocate tasks
HW side:Provide support for effective task allocationPerform 2D bitstream relocation
58
POLITECNICO DI MILANO
Effects of 2D Reconfiguration Effects of 2D Reconfiguration in a Reconfigurable Systemin a Reconfigurable System
Massimo [email protected]
6060
22ndnd Phase Goals Phase Goals
Definition of a 2D reconfiguration allocation manager:
Evaluation of the desired featuresDefinition of its structure
State of the art analysis:Investigation of literature solutionsComparison of their costs, effectiveness, versatility… to propose a novel one representing a good compromise
6161
Allocation managerAllocation manager
Allocation manager desired features:Low TRRLow management overheadHigh routing efficiencyLow fragmentation
Allocation manager structure:Empty space manager
Complete space Heuristic selection
FitterGeneral (FF,BL,BF,WF…)Focused (FA,RA… )
62
Most relevant worksMost relevant worksMaintain complete information on empty space:
KAMER:Keep All Maximally Empty RectanglesApply a general fitting strategy
CUR:Maintain the Countour of a Union of RectanglesApply a focused fitting strategy
Heuristically prune part of the information:KNER:
Keep Non-overlapping Empty RectanglesApply a general fitting strategy
2D-HASHING:Keep Non-ov. Empty Rectangles in optimized data structureApply (exclusively) a general fitting strategy
62
63
EvaluationEvaluation
High placement quality => high complexity
Lowest complexity => no focused fitting (which is bad especially for routing)
63
6464
Next PhaseNext Phase
Chosen approach is heuristic (KNER-like) but with a fitting strategy focused on minimizing routing costs
To be done:Clearly define the interface for the allocation managerDesign KNER-like empty space managerIntegrate routing aware fitting strategy (with Manhattan distance metric)
6565
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessio MontoneAlessandro MeroniSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
Relocation for 2D Relocation for 2D Reconfigurable SystemsReconfigurable Systems
Marco [email protected]
6767
Goals of 2Goals of 2ndnd phase phase
Implementation of BiRF²:
Define the functionality:Create the new bitstream parserDetermine fomulae for:
– FAR calculation– CRC calculation
Design the structure BiRF²
Hw implementation
6868
New ParserNew Parser
69
CRC CalculationCRC Calculation
Particular CRC value, used by Xilinx tools
Two version of BiRF Square:By using the “predefined” valueWith actual CRC calculation
An optimized algorithm has been used
69
70
Synthesis resultsSynthesis results
On a Virtex-4 with speed grade -12General purpose version: max frequency of 160 MHzSpecific version: maxfrequency of 290Mhz
70
7171
What’s next…What’s next…
Simulation of BiRF Square
Interfacement on OPB Bus
Creation of a toy architecture for the validation
Actual validation on the new Virtex-4
7272
What’s nextWhat’s next
DReAMSAlessandro PanellaMatteo Murgida
CITiESAlessio MontoneAlessandro MeroniSimone Corbetta
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
74
Project OrganizationProject Organization
First PhaseTime window: 1st monthGoal: Clustering
Second PhaseTime window: 2nd monthGoal:Coloring
Third PhaseTime window: 3rd monthGoal:Scheduling
ClusteredGraph
MetricCircuit
Representation
ReconfigurableClustered
Graph
AreaLatency
Rec. TimePower
Isomorphic
Target Architecture
Database
Gcc Frontend PartitioningAlgorithmPandA
SchedulingAlgorithm
75
Second Phase: ColoringSecond Phase: Coloring
Theoretical WorkFrom Clusters to Reconfigurable Graph
Definition of the interfaces for Coloring phaseStudy of a metric for cluster execution time
Implementation of the Coloring phaseColoring based onto delay of nodes
Applied to results of isomorphic clusteringGraphGen on Earendil
Produce Graph from specificationAutomatically Integrated with Panda
76
Second Phase: ColoringSecond Phase: Coloring
Add usefull information for next stepsExecution time mandatory for schedulingArea/Power/Rec.Time can optimize the final resultBased onto a target architecture
Interchangeable metrics
ClusteredGraph
Latency Area
Rec. Time
Power
Needed
Usefull
77
Second Phase: GraphGenSecond Phase: GraphGen
Basically a tool for graph generation (DFG,SDG,CDF,BB)...
Write .dot files...
Here some benchmark...AESWhetstone
78
What’s next…What’s next…
Third phase in details
Apply reconfigurable scheduling Adapts specification to reconfigurable architecture Uses information obtained from coloringPossible different algorithms
Define a schedule result structure
Implement the Salomone algorithm
Publish the entire work onto Earendil
79
QuestionsQuestions