System requirements System design System development Summary Multicore DSP Architecture and Programming O. Dahl 1 1 Electrical Engineering, Linköping University, Linköping, Sweden Guest lecture in TDDD56 Multicore and GPU Programming, LiU, December 5, 2011
58
Embed
Multicore DSP Architecture and ProgrammingTDDD56/slides/14-DSP-OlaDahl.pdf · DSP - ePUMA Properties of DSP algorithms Most DSP algorithms share some common traits. Predictable addressing.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
System requirements System design System development Summary
Multicore DSP Architecture and Programming
O. Dahl1
1Electrical Engineering, Linköping University, Linköping, Sweden
Guest lecture in TDDD56 Multicore and GPU Programming,LiU, December 5, 2011
System requirements System design System development Summary
Personal background
At LiU since 2011-01-01, at ISY (Institutionen förSystemteknik) - Associate Professor in System Integration(a new subject at the department)http://www.da.isy.liu.se/∼olad/
Moved from ST-Ericsson
Started at Ericsson November 2006 - worked withapplications, software architecture, LTE design, simulationfor software development
Before that: engineer, consultant, manager, associateprofessor in Computer Science and Automatic Control
Experience in software development, system engineering,system development, simulation, real-time systems, control
Ph D Automatic Control, Lund, 1992
System requirements System design System development Summary
Problem to solve
How to make a wireless modem for 3GPP LTE (and olderstandards as well e.g. WCDMA, GSM)?
System requirements System design System development Summary
Challenges
Meet requirements in
high-speed wireless mobile communication (> 100Mb/s)
Low-pass filtering and ωc ≈ ωc gives 2s2(t) ≈ Q(t)
System requirements System design System development Summary
LTE - basic concepts
OFDM
Send data on multiple frequencies
Send during a symbol interval Tu
Use subcarrier spacing ∆f = 1Tu
In LTE, ∆f = 15kHz (mostly), i.e. Tu ≈ 66.7µs
System requirements System design System development Summary
LTE - basic concepts
OFDM - orthogonality
Fourier transform of a pulse [Wikipedia, 2011b]
System requirements System design System development Summary
LTE - basic concepts
OFDM - orthogonality
Orthogonality, since signals on two subcarriers
x1(t) = a1ej2πk1∆ft , x2(t) = a2ej2πk2∆ft
fulfil∫ (m+1)Tu
mTu
x1(t)x∗
2 (t)dt =∫ (m+1)Tu
mTu
a1a∗
2ej2π(k1−k2)∆ft dt = 0
for k1 6= k2
System requirements System design System development Summary
LTE - basic concepts
OFDM - implementation using FFT
OFDM can be implemented using FFT (Fast Fourier Transform)at receiver side and IFFT (Inverse FFT) at sender side
System requirements System design System development Summary
LTE - basic concepts
OFDM and modulation - sender
[Wikipedia, 2011c]
System requirements System design System development Summary
LTE - basic concepts
OFDM and modulation - receiver
[Wikipedia, 2011c]
System requirements System design System development Summary
LTE - basic concepts
Coding and Decoding
Main coding algorithm is Turbo coding with a coding rateR = 1/3Convolutional coding (for BCH - broadcast channel)
Turbo encoder [Wikipedia, 2011d]
System requirements System design System development Summary
LTE - basic concepts
Parallel signal processing
OFDM symbols received in series from the radio interface,processed in parallel, processing stages include e.g. FFT,demodulation, control decoding, data decoding.
Uplink processing proceeds in parallel
System requirements System design System development Summary
LTE - basic concepts
Channel estimation
Estimate properties of channel
Compensate for channel effects
Communication with base station
Reference signal (pilot symbols)
System requirements System design System development Summary
LTE - basic concepts
MIMO
Multiple-antennas
Diversity techniques
Spatial multiplexing (send more than one data stream)
System requirements System design System development Summary
LTE - basic concepts
And there is more ...
synchronization (time, frequency)
cell search
receive system information
power control
uplink synchronization (timing advance)
FDD and TDD
Random access
Paging
HARQ
and ... this is only L1 ... we have to make a completeprotocol stack ... and it has to be mobile (handover etc.)
System requirements System design System development Summary
LTE - basic concepts
What speed do we get?
20Mhz bandwidth, 1200 subcarriers
14 OFDM symbols in one subframe (1 ms)
64QAM - 6 bits per resource element
14*6*1200/1e-3 = 100800000 (without coding, controlinformation, but also without MIMO)
System requirements System design System development Summary
Outline
1 System requirements3GPPLTE - basic concepts
2 System designDSPDSP - ePUMAASICControl processors
3 System development
4 Summary
System requirements System design System development Summary
Building blocks
DSP
ASIC
Control processors
System requirements System design System development Summary
And there is more ...
Application processors
Radio, radio interface
Interconnect, buses
Memory, caches
Power management, thermal management
Imaging, video, graphics, display
Storage, e.g. flash, memory card
System requirements System design System development Summary
DSP
Leocore
Information from [Coresonic, 2011, Anjum et al., 2011]
Leocore
ASIP for baseband processing
Identify common operations in baseband processing -domain specific architecture
Coresonic developer studio
SIMT TM- Single Instruction-flow Multiple Tasks
Units for complex calculations, control unit (RISC),accelerators for FEC (Viterbi, Turbo)
DFE interface, MAC interface
System requirements System design System development Summary
DSP
Master thesis proposal - Parallel Simulation ofMulticore DSP Systems for Software Defined Radio
Requires competence in concurrent programming andhardware/software interaction. Knowledge of DSPhardware and software is beneficial, but not strictlyrequired
C++, some Python
more info at [Computer Engineering, 2011]
System requirements System design System development Summary
Code generator for scrambling and generatingchannelization codes
< 0.5mW/MHz
System requirements System design System development Summary
DSP - ePUMA
ePUMA
Research project at Division of Computer Engineering,cooperation also with Information Coding (ISY) and IDA(parallel programming)
Overview
Master thesis proposals
System requirements System design System development Summary
DSP - ePUMA
ePUMA
Highly parallel processor for predictable DSP tasksHeterogenous design:
1 master control processor8 slave processor cores
Exploited parallelism:Task-parallelism (several processor cores)Data-parallelism (SIMD instructions on slave processors)
System requirements System design System development Summary
DSP - ePUMA
Applications
Some example applications:
Baseband processing.
Media processing.
Radar.
Often in constrained environments, such as phones. Ordinaryprocessors often fail because of
high power consumption.
high cost.
low performance.
System requirements System design System development Summary
DSP - ePUMA
Properties of DSP algorithms
Most DSP algorithms share some common traits.
Predictable addressing. I.e the addresses of the accessedvalues are not data dependant.
Few branches other than back jumps in loops.
Constant iteration counts.
Application Specific Instruction set Processors (ASIPs) for DSPtake advantage of this to solve the previous problems.
System requirements System design System development Summary
DSP - ePUMA
System overview
Sleipnir 0 Sleipnir 1
Sleipnir 3
Master
DMA
Main Memory
Sleipnir 5 Sleipnir 6 Sleipnir 7
Sleipnir 4
Sleipnir 2
N0 N1 N2
N4
N7N6N5
N3
System requirements System design System development Summary
DSP - ePUMA
Memory hierarchy
Off chip main memory
On chip interconnection
Master LS
PM
DM
0
DM
1
Master Core
Registers
Sleipnir 0 LS
PM
CM
LV
M 1
LV
M 2
LV
M 3
Sleipnir Core
Registers
Sleipnir 7 LS
PM
CM
LV
M 1
LV
M 2
LV
M 3
Sleipnir Core
Registers
...
Level 1
Level 2
Level 3
System requirements System design System development Summary
DSP - ePUMA
Sleipnir features
Scratchpad memory based programming - no data cache
Up to 16-way SIMD datapath (operates on 128 bit datavectors)
Up to 16 real or 4 complex multiplications per cycle (16 bitdata)Supported datatypes:
Real fixed-point data: 8, 16, 32 bitsComplex fixed-point data: 16, 32 bit real and imaginarypartsSingle precision floating-point (32 bits)
Special purpose instructions: DCT, butterflies, sort...
System requirements System design System development Summary
DSP - ePUMA
Sleipnir customization
Many parameters can be customized:
Instruction set
Local memory and register file sizes
AGU capabilities
Accelerators
Parameter ValueLocal vector memory (LVM) size Up to 8k 128-bit vectors (128kB)Register file size 8-32 vectors (0.125 - 0.5 kB)Constant memory size Up to 256 vectors (4kB)Program memory size Typically 8-16 kB
System requirements System design System development Summary
DSP - ePUMA
Addressing
Normally many cycles are wasted on rearranging data withshuffle-instructions. This is often due to issues with dataalignment and bank-conflicts.
System requirements System design System development Summary
DSP - ePUMA
Data access
Consider the following address layout in a single bank memory.The only vectors of length four that can be accessed in onecycle is the row vectors {0,. . . ,3}, {4,. . . ,7}, {8,. . . ,11} and{12,. . . ,15}. Accessing one of the colored column vectors take4 cycles.
System requirements System design System development Summary
DSP - ePUMA
Multi-bank
By splitting the memory into different banks (which increasesthe area cost somewhat), the only constraint is that no twoelements reside in the same bank. So while we may nowaccess e.g. vectors {x,. . . ,x+3} in one cycle, the columns stilltake four cycles to access.
�
�
�
��
�
�
�
��
�
��
��
�
A
��
��
B� B� B� B�
System requirements System design System development Summary
DSP - ePUMA
Multi-bank and permutation
Given that the access patterns are known in advance, as iscommon in DSP algorithms, we may reorder the physicaladdresses of the logical addresses.An example of a permution that allows single cycle access forthe columns can be seen below. No two elements of anycolumn reside in the same memory bank.
�
�
�
��
�
�
�
��
�
��
��
�
A
��
��
B� B� B� B�
System requirements System design System development Summary
Evaluate different memory configurations for our multicorearchitecture
Single-bankMulti-bankMulti-bank with permutationMulti-portCache...
Investigate impact of memory architecture for differentapplicationsInteresting aspects:
PerformancePower-consumptionChip area
System requirements System design System development Summary
DSP - ePUMA
Master thesis proposal - FPGA Board Demo ofePUMA
Setting up an FPGA board demo of ePUMA to verify thehardware designGoals:
Setting up demo environmentTest some of our excisting demo applications (MotionJPEGand MPEG2-decoder) on real hardwarePossibility to set up your own demo!
more info at [Computer Engineering, 2011]
System requirements System design System development Summary
ASIC
ASIC
Decide which blocks to be implemented in hardware
Decide on programmability
Power consumption
System requirements System design System development Summary
Control processors
Control processor(s)
Modem control
Power control
ARM Cortex R, M (A)
RTOS
System requirements System design System development Summary
Outline
1 System requirements3GPPLTE - basic concepts
2 System designDSPDSP - ePUMAASICControl processors
3 System development
4 Summary
System requirements System design System development Summary
Parallel development
Concurrent development of hardware and software
Hardware simulation for software development
Virtual platform
System requirements System design System development Summary
SystemC
Event-driven simulation framework
Handles time and parallel activities
Standardized by OSCI, IEEE
C++ class library
System requirements System design System development Summary
TLM
Transaction level modeling
Function calls vs. pin-level simulation
Bit-accurate interfaces
Varying degrees of timing can be added (loosely timed,approximately timed)
Hardware modeling for software verification
System requirements System design System development Summary
Virtual platform
A virtual representation of the system
SystemC and TLM
Processor models
Peripheral models
Commercial tools
Model handling - signal processing models, HWverification models, virtual platform models
Acceptance and usage, finding bugs, early SWdevelopment and verification, release of platform,supporting different RATs, software layer dependencies
System requirements System design System development Summary
Outline
1 System requirements3GPPLTE - basic concepts
2 System designDSPDSP - ePUMAASICControl processors
3 System development
4 Summary
System requirements System design System development Summary
Summarizing notes
LTE as an example of multicore digital signal processing
Digital signal processors and ASIC blocks
Control processors
3GPP
Time-to-market
Hardware and Software as parallel developent tracks
System requirements System design System development Summary
Agilent (2009).3GPP Long Term Evolution: System overview, productdevelopment, and test challenges.http://cp.literature.agilent.com/litweb/pdf/5989-
Anjum, O., Ahonen, T., Garzia, F., Nurmi, J., Brunelli, C.,and Berg, H. (2011).State of the art baseband DSP platforms for SoftwareDefined Radio: A survey.EURASIP Journal on Wireless Communications andNetworking.
Computer Engineering, I. (2011).Master thesis proposals.http://www.da.isy.liu.se/undergrad/exjobb/open/en