A Reconfigurable Processor A Reconfigurable Processor Architecture and Software Architecture and Software Development Environment for Development Environment for Embedded Systems Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma, A. La Rosa, L. Lavagno, C. Passerone, R.Canegallo Nice, France April 22, 2003
16
Embed
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Reconfigurable Processor Architecture A Reconfigurable Processor Architecture and Software Development Environment and Software Development Environment
for Embedded Systemsfor Embedded Systems
A Reconfigurable Processor Architecture A Reconfigurable Processor Architecture and Software Development Environment and Software Development Environment
for Embedded Systemsfor Embedded Systems
Andrea CappelliF. Campi, R.Guerrieri, A.Lodi, M.Toma, A. La Rosa,
L. Lavagno, C. Passerone, R.Canegallo
Nice, FranceApril 22, 2003
OutlineOutline
Motivations XiRisc: a VLIW Processor PiCoGA: A Pipelined Configurable Gate
Array Software Development Environment Results & Measurements Conclusions
MotivationsMotivations
Increased on-chip Transistor density
Increased Integration costs
Strong limitations in power supply
Severepower consumption
constraints
Millions of transistors/Chip
1997199920012003200520070
400
200
300
100
2009
Technology (nm)
Increased Algorithmic complexity
Quest for performance and
flexibility
1997199920012003200520072009
Algorithm complexityMoore’s law
Battery capacity
Embedded systems Algorithms analysisEmbedded systems Algorithms analysis 90% of computational complexity is concentrated
in small kernels covering small parts of overall code
Many algorithms show a relevant instruction-level parallelism Performance improved by multiple parallel data paths
Operand granularity is typically different from 32-bit Traditional ALU is power-inefficient
Significant improvements can be obtained extending embedded processors with application-specific function units
Reconfigurable computingto achieve maximum flexibility
Existing ArchitecturesExisting Architectures
Standard processor coupled with embedded programmable logic where application specific functions are dynamically
Duplicated commonly used function Units (Alu and Shifter)
All others function units are shared (DSP operations, Memory handler)
A tightly coupled pipelined configurable Gate Array
Dynamic Instruction Set ExtensionDynamic Instruction Set Extension
configuration specificationregion
specificationpGA-load
Specific operation to transfer data from a configuration cache to the PiCoGA:
32-bit and 64-bit operation to launch the execution inside the PiCoGA(Data exchange through register file):
operation
specification
32-bit
pGA-opSource 1 Source 2 Dest 1 Dest 2
64-bit
pGA-opSource 1 Source 2
operation
specificationDest 1 Dest 2Source 3 Source 4
PiCoGA: a Pipelined ConfigurablePiCoGA: a Pipelined ConfigurableGate ArrayGate Array
Two-dimensional array of LUT-based Reconfigurable Logic Cells Each row implements a possible stage of a customized pipeline, independent and concurrent with the processor Up to 4x32-bit input data and up to 2x32-bit output data from/to register File
Embedded function unit for dynamic extension of the Instruction Set
PiCoGA
DFG-based elaborationDFG-based elaboration Row elaboration is activated by an embedded control unit Execution enable signal for of each pipeline stage
PiCoGA operation latency is dependent on the operation performed
ConfigurationCachePiCoGA
PiCoGA ConfigurationPiCoGA Configuration
Goal: to reduce cache misses due to PiCoGA configuration
Multi-context programming (4 cache layers/planes inside the array) Dedicated Configuration Cache with high bandwith bus to the PiCoGA (192 bits) Partial Run-Time Reconfiguration (A region is configured while another one is
active) Configuration is completely concurrent with processor elaboration
Layer4
Layer3
Layer2
Layer1
PiCoGA mapping
The Software Development EnvironmentThe Software Development Environment