F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet 38031 Grenoble Cedex - France Embedded Memory Wrapper Generation for Multi-processor SoC Design gabriela: gabriela:
Jan 01, 2016
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya
TIMA laboratory
46 avenue Felix Viallet
38031 Grenoble Cedex - France
Embedded Memory Wrapper Generation for Multi-processor
SoC Design
Embedded Memory Wrapper Generation for Multi-processor
SoC Design
gabriela:gabriela:gabriela:gabriela:
Memory for SoC
SoC: a single chip Heterogeneous components (CPU, IP, …) Application-specific architecture
Integration of standard Memory IP Adaptation of memory protocols to the
specific network
(N processors) DSPCPU IP
Communication Network
SRAMMemory
FLASHMemory
GLUE
Memory for SoC
SoC: a single chip Heterogeneous components (CPU, IP, …) Application-specific architecture
Integration of standard Memory IP Adaptation of memory protocols to the
specific network
(N processors) DSPCPU IP
Communication Network
Wrapper Wrapper
SRAMMemory
FLASHMemory
Outline
Introduction Memory IP based design Memory integration issues
Architectural Models and Basic Concepts
Memory Wrapper Generic architecture Automatic generation
Experiments Conclusion
Memory IP based design
Steadily Increasing Capacity
Memory Reuse Based Design to close the gap between capacity and productivity
MEMORY INTERFACE DESIGN IS A DOMINANT PROBLEM
ITRS 2000 prevision for SoC capacity
0%10%
20%30%
40%50%60%
70%80%
90%100%
1999 2002 2005 2008 2011 2014
Area New Logic
Area Reused Logic
Area Memory
Memory integration issues Complex system design
Heterogeneous components Several logical ports and specific communication
protocols
Standard Memory components Limited physical ports and standard access
protocols Large memory design space exploration Different memory characteristics (Type, Size,
Consumption)
Multi-masters SoC Parallel accesses to the global memory
Memory integration issues
Complex system design PORT ADAPTATION is needed
Large memory design space exploration WRAPPER FLEXIBILITY is required
Multi-masters SoC SOPHISTICATED SYNCHRONIZATION MECHANISMS are required
Related Work Port adaptation
CoWare Polis Cadence (VCC)
Wrapper flexibility Marie Curie COSY
Synchronization mechanisms Fixed priority (PalmChip) TDMA and Round-Robin
(Sonics)
None of the existing strategies has fully addressed the problems of memory IP integration already
described
Our Contributions
Generic memory wrapper architecture Port adaptation Memory flexibility Arbitration between parallel memory accesses
Automatic generation of memory wrapper by assembling library components
Outline
Introduction Memory IP based design Memory integration issues
Architectural Models and Basic Concepts
Memory Wrapper Generic architecture Automatic generation
Experiments Conclusion
Architectural models
Virtual architecture model Abstract modules (Virtual
modules) Abstract channels Implicit communication
procedures Wrapper specification but no
implementation
M1
M2
MEMORY
Channels
Virtual architecture
M1
OS
Wrapper Wrapper
Physical Communication Network
MEMORY
Micro-architecture
Module implementation
Micro-architecture model Modules implementation Physical communication
network Explicit communication
procedures HW wrapper
implementation and synthesis
Basic concepts: virtual module Separation between behavior and communication
interface Memory access must be independent of the memory type
Hiding the abstraction level of memory description Memory integration must be independent of these
abstraction levels Logical and physical accesses
To adapt these accesses, we use a wrapper
Memory IP
External port (logic port) Internal port (physical memory port) virtual port
Wrapper
Channel 1 Channel 2
Outline
Introduction Memory IP based design Memory integration issues
Architectural Models and Basic Concepts
Memory Wrapper Generic architecture Automatic generation
Experiments Conclusion
Memory wrapper architecture
Generic wrapper architecture
Memory dependent part Memory port adapter (MPA)
Communication dependent part
Channel adapter (CA)
Internal bus (IB) Address, data and control
Arbiter MemoryIP
Memory Bus
IB
MPA
CA3CA1
arbiter
memorywrapper
CA2
channels
Communication network
Flexibility of the memory architecture Flexible memory wrapper architecture for a large
design space exploration Flexibility is ensured by generic and modular models
CA: customized with communication network specific parameters
MPA: customized with memory specific parameters
We change only the Memory Port Adapter part
MPA2MPA1
Single portmemory
IP
Memory Bus
IB
MPA
CA3CA1
arbiter
CA2
memorywrapper
Communication network
Memory Busses
Dual portmemory
IP
IB
CA3
arbiter
CA1 CA2
memorywrapper
Communication network
Memory wrapper generation flow
Wrapper generation Input :
Memory IP library Wrapper components library (CA,
MPA) Architectural parameters
– Number of ports, channels, protocols Action
Customizing the generic CA and MPA
from library using the architectural
parameters Instantiation of customized CA and
MPA Interconnection to the rest of system
Output : Micro-architecture
Virtual ArchitectureAnnotated with
ParametersMemory
IP Library
CAMPA
library
Wrapper Generation
Micro-architecture
Outline
Introduction Memory IP based design Memory integration issues
Architectural Models and Basic Concepts
Memory Wrapper Generic architecture Automatic generation
Experiments Conclusion
Image Filtering Process Input/Output Image
Input image Output image
Experiments
Low level image processing for digital camera
The initial specification is Memory rich (2 Mbytes Flash, 2Mbytes ROM, 256
Kbytes SRAM)
Processor poor (only one 8 bit RISC processor)
Acceleration by adding an other processor We use 2 ARM7 processors 1 global memory Point-to-point communication network
2 Experiments to prove the memory flexibility ensured by wrapper
Experiment 1: using a dual port SRAM Experiment 2: using a single port SDRAM
Experience 1: Dual port memory
T1
T2
M1 T3
T4
M2
Logical channels
SRAMdual port
Experience 1: Dual port memory
T1
T2
M1 T3
T4
M2
Logical channels
SRAMdual port
Extracted parameters
Port number 2
Port type sc_lv
Port width 32
Access mode Burst
Channel number 2
… …
Experience 1: Dual port memory
T1
T2
M1 T3
T4
M2
Logical channels
SRAMdual port
Extracted parameters
Port number 2
Port type sc_lv
Port width 32
Access mode Burst
Channel number 2
… …
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implemenbtation
ARM7 ISS
CPU wrapper
Memory Busses (32)
SRAMdual port
SRAMdual port
MEMORY WRAPPER
Experience 1: Dual port memory
T1
T2
M1 T3
T4
M2
Logical channels
SRAMdual port
Extracted parameters
Port number 2
Port type sc_lv
Port width 32
Access mode Burst
Channel number 2
… …
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implemenbtation
ARM7 ISS
CPU wrapper
Memory Busses (32)
SRAMdual port
SRAMdual port
SRAMMPA
SRAMMPA
Experience 1: Dual port memory
T1
T2
M1 T3
T4
M2
Logical channels
SRAMdual port
Extracted parameters
Port number 2
Port type sc_lv
Port width 32
Access mode Burst
Channel number 2
… …
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implemenbtation
ARM7 ISS
CPU wrapper
Memory Busses (32)
SRAMdual port
SRAMdual port
CA1AFIFO + BUFFER
CA2AFIFO + BUFFER
SRAMMPA
SRAMMPA
Experience 1: Dual port memory
T1
T2
M1 T3
T4
M2
Logical channels
SRAMdual port
Extracted parameters
Port number 2
Port type sc_lv
Port width 32
Access mode Burst
Channel number 2
… …
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implemenbtation
ARM7 ISS
CPU wrapper
Memory Busses (32)
SRAMdual port
SRAMdual port
CA1AFIFO + BUFFER
CA2AFIFO + BUFFER
IB1(32) IB2(32)
SRAMMPA
SRAMMPA
Experience 1: Dual port memory
MPA services Test Address decoding Access mode
burst mode
– burst seq (4 words) Bank control
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implemenbtation
ARM7 ISS
CPU wrapper
Memory Busses (32)
SRAMdual port
SRAMdual port
CA1AFIFO + BUFFER
CA2AFIFO + BUFFER
IB1(32) IB2(32)
SRAMMPA
SRAMMPA
Experience 2: Single port memory
T1
T2
M1 T3
T4
M2
SDRAMSingle port
Logical channels
Experience 2: Single port memory
T1
T2
M1 T3
T4
M2
SDRAMSingle port
Logical channels
Extracted parameters
Port number 1
Port type sc_lv
Port width 16
Access mode R/W
Channel number 2
… …
Experience 2: Single port memory
T1
T2
M1 T3
T4
M2
SDRAMSingle port
Logical channels
Extracted parameters
Port number 1
Port type sc_lv
Port width 16
Access mode R/W
Channel number 2
… …
IB (32) arbiter
Memory Bus (16)
SDRAMSingle port
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implementation
ARM7 ISS
CPU wrapper
CA1AFIFO + BUFFER
CA2AFIFO + BUFFER
SDRAMMPA
MEMORY WRAPPER
Experience 2: Single port memory
T1
T2
M1 T3
T4
M2
SDRAMSingle port
Logical channels
Extracted parameters
Port number 1
Port type sc_lv
Port width 16
Access mode R/W
Channel number 2
… …
Memory Bus (16)
SDRAMSingle port
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implementation
ARM7 ISS
CPU wrapper
CA1AFIFO + BUFFER
CA2AFIFO + BUFFER
Experience 2: Single port memory
T1
T2
M1 T3
T4
M2
SDRAMSingle port
Logical channels
Extracted parameters
Port number 1
Port type sc_lv
Port width 16
Access mode R/W
Channel number 2
… …
Memory Bus (16)
SDRAMSingle port
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implementation
ARM7 ISS
CPU wrapper
CA1AFIFO + BUFFER
CA2AFIFO + BUFFER
SDRAMMPA
Experience 2: Single port memory
T1
T2
M1 T3
T4
M2
SDRAMSingle port
Logical channels
Extracted parameters
Port number 1
Port type sc_lv
Port width 16
Access mode R/W
Channel number 2
… …
IB (32) arbiter
Memory Bus (16)
SDRAMSingle port
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implementation
ARM7 ISS
CPU wrapper
CA1AFIFO + BUFFER
CA2AFIFO + BUFFER
SDRAMMPA
Experience 2: Single port memory
MPA services Test Address decoding Access mode
classic R/W mode
Bank control Initialization Refresh Conversion 16 <-> 32
bits
IB (32) arbiter
Memory Bus (16)
SDRAMSingle port
Module 1implementation
ARM7 ISS
CPU wrapper
Module 2implementation
ARM7 ISS
CPU wrapper
CA1AFIFO + BUFFER
CA2AFIFO + BUFFER
SDRAMMPA
Results
SystemC code size for the memory wrapper
Experience 1 : 1438 lines Experience 2 : 1335 lines
Latency (without memory latency) Write : 3 CPU cycles Read : 7 CPU cycles (send/receive)
Simulation results of an image of 387 x 222 :
Experience 1: 2.05 millions of CPU cycles Experience 2: 2.97 millions of CPU cycle
Fast design exploration with different memories thanks to automatic memory wrapper generation
Outline
Introduction Memory IP based design Memory integration issues
Architectural Models and Basic Concepts
Memory Wrapper Generic architecture Automatic generation
Experiments Conclusion
Conclusion
Systematic method to integrate Memory IP in the multi-processors SoC architectures at system level
Generic memory wrapper architecture Port adaptation Flexibility of the memory architecture Parallel accesses arbitration
Automatic memory wrapper generation is done by assembling library components
Fast memory design exploration Application for low-level image
processing
Perspectives
Generalization of IP wrapper architecture based on generic wrapper model
Using a sophisticated communication network like AMBA bus and packet switch communication network
Configurable memory test bench
THANK YOU