Spring 2006 W. Rhett Davis NC State University ECE 406 Slide 1 ECE 747 ECE 747 Digital Signal Processing Digital Signal Processing Architecture Architecture DSP Implementation Architectures DSP Implementation Architectures Spring 2006 Spring 2006 W. Rhett Davis W. Rhett Davis NC State University NC State University
15
Embed
ECE 747 Digital Signal Processing Architecture DSP ...wdavis/doc/ece747spr06_1_1up.pdf · Digital Signal Processing Architecture DSP Implementation Architectures ... in a GP processor
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 1
ECE 747 ECE 747 Digital Signal Processing Digital Signal Processing
Spring 2006Spring 2006W. Rhett DavisW. Rhett Davis
NC State UniversityNC State University
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 2
My Goal
Challenge you to use the techniques you have learned in this class to design the next generation of DSP hardware
When you undertake a new design, the most important question for you to answer is whether or not it will work better than an existing design.» Faster» Longer Battery Life» Cheaper
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 3
Today’s Lecture
Types of DSP Implementations
Comparison of Hardware Efficiency
The Promise of Systems-on-Chip
What’s keeping us from getting there?
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 4
ECE 747-Style Design
Up to now, you have been designing signal-flow graphs and converting them into hardware, through a design process some call direct-mapping of algorithmsBut what are the other choices?
MACreg.file
addshift
reg. file
Σ
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 5
Efficiency-Flexibility Trade-Off
100x-1000x Difference in Power
Direct MappedHardware
EmbeddedFPGA
Flex
ibili
ty
Efficiency
Embedded Processor
DSPEmbedded
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 6
Computational Efficiency Metrics
Definition: MOPS » Millions of algorithmically defined arithmetic operations (e.g.
multiply, add, shift) – in a GP processor several instructions per “useful” operation
Figures of merit » MOPS/mW - Energy efficiency (battery life)» MOPS/mm2 - Area efficiency (cost)
Optimization of these “efficiencies” is the basic goal assuming functionality is met
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 7
Dedicated Designs 10X-100X More Efficient
0.01
0.1
1
10
100
1000
PPC
-95
PPC
1-SO
I-00
Spar
c-95
Spar
c2-9
7PP
C2-
SOI-
00Sp
arc1
-97
X86-
97A
lpha
-00
Alp
ha-9
7PP
C-0
0SA
-DSP
-98
Hit-
DSP
-98
Fuj-D
SP2-
98Fu
j-DSP
1-00
NEC
-DSP
-98
MPE
G2-
99En
cryp
t-00
MU
D-9
8M
PEG
2-98
802.
11a-
01
MO
PS
/mW
Microprocessors
General Purpose
DSP
Dedicated
2 orders ofmagnitude
(Brodersen, ISSCC 2002)
Low Energy Efficiency / Battery Life (MOPS/mW) due to overhead, lack of parallelism, high supply voltages
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 8
Potential of Direct Mapping
In .25 micron a multiplier requires .05 mm2 and 7pJ per operation at 1 V. Adders and registers are about 10 times smaller and 10 times lower energy
Lets implement a 50mm2 , .25 micron chip using adders, registers and multipliers
We can have 2000 adders/registers and 200 multipliers in less than 1/2 of the chip, also assume 1/3 of power goes into clocks
25 MHz clock (1 volt) gives ~50 Gops at 100mW
500 MOPS/mW and 1000 MOPS/mm2
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 9
Why is Direct Mapping Better?
DSP processor with 1 multiplier (25 mm2)
16x16 multiplier(.05 mm2)
Low Area Efficiency / High Cost (MOPS/mm2) due to large on-chip memoriesLow Energy Efficiency due to long wires and overhead of multiplexing the datapath
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 10
Results in Fully Parallel Solutions
108
19.6
5.5
0.022
16-State ViterbiDecoder
Energy per Decoded bit
(nJ)
10
4.3
1.8
2,200
64-point FFTTransforms per second per unit
area (Trans/ms/mm2)
AreaEnergy
16-State ViterbiDecoder
Decode rate per unit area
(kb/s/mm2)
64-point FFTEnergy per
Transform (nJ)
1501700High-Performance DSP
50436Low-Power DSP
100683FPGA
200,0001.78Direct-Mapped Hardware
(numbers taken from vendor-published benchmarks)Orders of magnitude lower efficiency
even for an optimized processor architecture
Reducing supply voltage saves energy: E = CV2
(N. Zhang)
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 11
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 12
Enter the Era of Systems-on-Chip
MCU
Gates(RTL)
ROM
DSP
RAM
(Courtesy Mike McMahon, Texas Instruments)
Cellular Phone Baseband SOC
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 13
What is a System-on-Chip?
“a complex IC that integrates the major functional elements of a complete end-product into a single chip... incorporates at least one programmable processor, on-chip memory, and accelerating function-units....”» Winning the SoC Revolution,
Martin & Chang 2003(paraphrasing from Dataquest)
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 14
Problem: Design Productivity Gap
“The main message in 2001 is this: Cost of design is the greatest threat to continuation of the semiconductor roadmap” – ITRS
$20M Average cost for an SOC (includes only software licenses & slaries) – ITRS
1
10
100
1,000
10,000
100,000
1,000,000
19881990
19921994
19961998
20002002
20042006
1000
's o
f T
ran
sist
ors per chip
per designer per year
10 designers
100 designers
Spring 2006W. Rhett Davis NC State University ECE 406 Slide 15
Next Lectures
Survey of System-Level Design Techniques» What tools can I use to get performance
estimates faster, with less work?
Methods of Scaling» How do I convert my 180 nm performance