Subra Ganesan
Presentation at TACOM
December 6 2002
Professor, Computer Science and EngineeringAssociate Director, Product Development and Manufacturing Center, Oakland University,
Rochester, MI 48309
Email: [email protected]
Topics Covered:
1. Introduction to DSP Processors
2. Fixed Point DSP- c24x
3. Floating Point DSP- C6711
4. Code Composer Studio
5. DSP/BIOS for C6711
6. External Memory Interface for C6711
7. Interrupt – C6711
8. Applications
DSP Microprocessor – Advances and Automotive Applications
• Advances in Circuit Technology, Architecture, Algorithms and VLSI design techniques have contributed to high performance Digital Signal Processing(DSP) microprocessors and to multitude of novel applications of DSP chips.
• DSP processors are RISC based which have fast arithmetic units, on chip memory, analog interface, serial ports, timers, counters, facilities for inter processor communications and
other special features.
The Microprocessor overview1949 Transistors1958 Integrated Circuits1961 ICs IN Quality1964 Small Scale IC(SSI) Gates1968 Medium Scale IC(MSI) Registers1971 Large Scale IC(LSI), Memory, CPU1972 8 BIT MICROPROCESSORS1973 16 BIT MICROPROCESSORS1982 32 BIT MICROPROCESSORS1984 DSP MICROPROCESSORS – I GENERATION1986 DSP MICROPROCESSORS – II GENERATION1988 DSP MICROPROCESSORS – III GENERATION1989 RISC MICROPROCESSORS – II QUALITY1990 MISC MINIMUM INSTRUSTION SET MICROPROCESSOR
MICROPROCESSOR OVERVIEW
2 Billion operations per second [BOPs]
TMS 320C80 32 bit RISC
80 Different14 address
modeSize B,W,L
0.5 MIPS7000068000
4523004 Bit Intel 4004 1971
Number of Instructions
PerformanceNumber of transistors
Microprocessor
INTRODUCTION TO DSP MICROPROCESSORS
DSP micros are reduced-instruction-set computers optimized for the fastest possible execution of the following instructions
• Addition• Subtraction• Multiplication• Shifting
Single cycle multiplication and shifting using ARRAY multiplier and barrel (or combination) shifter.
In contrast, general purpose micros effect such as operations via multiple cycle, micro-code instructions that make use of the ALU’s single cycle, parallel-add, single bit shift capability.
DSP micros do each multiply/accumulate in a single cycle = (e.g 100 ns.)
• For 80386: Add( 16 bit addition) = 125 ns(16 Mhz)
(IMUL) 16 bit * 16 bit multiplication = 1250 ns
DSP micros employ• Pipe lining of instructions• Use of addressing modes that efficiently access relevant
data structure (e.g., auto increment, auto decrement modes for arrays & Indexed addressing modes for FFTs)
Dual-Bus HARVARD ARCHITECTURE, which enables
• Simultaneous fetching of data and instructions• Special DSP related addressing modes (e.g., Index
computation module an arbitrary number, automatic circular queue or free data move for FIR filters, bit reversal for FFTs)
• Extra addressing,Multiple ALUs• Special interfaces to serve specific fields of application(
e.g., serial interfaces for CODEC in telecommunications)
Progress in new technologies, Gallium arsenate (GaAs) transistors and high electron-mobility transmission increase in the future DSP microprocessor.
80836 computes 1024 point FFT only 66% slower than 20 MHz TMS 32010.
New version general purpose micros with DSP like dual bus structures(e.g., 68030 Motorola) array multiplier, barrel shifter, GaAs/HEMT technology, can achieve a performance of 100 MIPS and upwards.
• TMS 32010 does = 5 MIPs320C25 = 10 MIPs
• Motorola 56000 = 10.25 MIPs(24 bit data)• TMS 320 C 6201 = 1600 MIPs
FLOATING-POINT DIGITAL SIGNAL PROCEESING CHIPS
DSP has the capability to perform floating-point arithmetic including multiply-accumulate operations with an increased degree of parallelism.
The design phase is often performed with the aid of high-level language or a commercial, DSP-oriented “design system” that yields a nonreal-time, floating point simulation on a general purpose computer.
The new generation of floating point digital signal processors are AT&T, DSP32C, Motorola DSP96002, and Texas Instruments TMS320C30.
A typical development system could involve an
• Iconic graphical interface( implemented in PC software)• A computer• A PC plugin board containing a floating point DSP micro chip• Memory system
The Next –PC is the first to incorporate a DSP micro. The on-board Motorola fixed-point DSP56001 is complemented by numerous “canned” procedures.
These procedures enable graphics and signal processing tasks to be carried out at rates orders-of-magnitude faster than possible with on-board MC68882 floating-point co-processor.
The cycle of improvement in functionality and performance for both general-purpose and DSP micros continues.
Architectures incorporating such structures as systolic arrays and neural networks, will replace those now considered conventional.
DSP APPLICATIONS CHARACTERSTICS1. Algorithms are mathematically intensive
e.g., for FIR filtern-1
y(n) = ∑ a(i) * x(n-1)i=0
Where y(n) = output samplesa(i) = coefficientsx(n-1) = input samples
2. Real time performance
e.g. Speech RecognitionImage processing within a frame update period
3. Sample Input SignalDSP processor must effectively handle sampled data in large quantities.
DSP processors must be flexible to accommodate changing algorithms, new DSP processors etc.
The DSP Environment: Definitions
LowpassFilter
(LPF1)
A/DConverter
DSP Processor
D/AConverter
LowpassFilter
(LPF2)
AnalogAnalog
Signal Signal
A simple digital filter system
X X X
+
X(n)
X(n)X(n-2)
X(n-N+1)
a(1) a(2)
Y(n)
Where
fs sampling frequency
a(0),a(i) co-efficients
y(n) Digital output
y(t) analog output
SampleRegister,
RRA/D R
fsX(n-1)
Xa(0)
Y(t)D/A
As long as the system samples the analog input at a frequency fs that is at least twice the information band width of that input, all information present in the original analog signal is contained in the digital signal
A/D conversion introduces quantization noise. Signal to quantization noise ratio or SQNR is a function of A/D’s accuracy.
• DSP stores current A/D sample and N-1 previous samples in a sample shift register, or a RAM which can simulate shift register function by modifying memory address pointers.
• The coefficients ai are stored in ROM or RAM and they determine the impulse response and filter characteristics.
• A large N gives a longer impulse response and generally produces filters with sharper roll-off, greater stop band attenuation, and less frequency ripple.
• This filter is called Nth order, finite impulse response (FIR) (no feed back path), digital filter.
• The FIR filter requires N multiplies and N-1 additions to compute an output y(n) each time the input signal is sampled.
• Some DSP applications involve sampling rates of up to 100 Mhz and 100 MIPS.
SHANNON’S SAMPLING THEORY
An analog signal containing maximum frequency fi Hz may be completely represented by regularly spaced samples, provided the sampling rate is at least 2f1 samples per second.
fs = 2f1 Nyquist sampling rate.If sampled at less than 2f1 rate, aliasing error occurs. Signal is
then represented with distortion which depends on the degree of aliasing.
• Use anti-aliasing filter, a low-pas filter with cut-off frequency at f1 (or fs/2)
Quantization Noise (Qe)
A/Da(t) n bit
Qe = ± ( V ref / 2 * 2 n )
e.g. V ref = 5 V, n = 8 then Qe = 5 / 512
|G(f)|
f
fSAMPfs/2
fs/2 fSAMP
(a) Input spectrum
(b) Sampled spectrum
(c) Reconstructed spectrum
(a) Input continuous time signalg(t)
(b) Sampled signalgr(t)
(c) Reconstructed signalFig. Aliasing in the frequency domain Fig. Aliasing in the time domain
LINEAR SYSTEM obeys the principle of superposition.If an input consisting of a number of signals is applied to a linear system, then the output is the sum or the superposition of the system’s responses to each signal considered separately
FREQUENCY PRESERVATION PROPERTY
If we apply a complicated signal containing many frequencies, the output must be the sum of output due to each input frequency , considered separately. The output contains only those frequencies present in the input.
TIME INVARIANT SYSTEMIt is the one whose property do not vary with time.
• LTI: Linear Time InvariantLTI associative property means that we may
analyze a complicated LTI system by breaking down into a number of simpler subsystems.
• Commutative PropertyIt means that the subsystems can be arranged in series or cascaded in any order without affecting the overall performance.
• Causal SystemIn this system the output depends only on the present and or/previous values of the input.
• Stable SystemIt is one that produces a finite or bounded output in response to the bounded input.
• InvertibilityIf a system with input x[n] gives an output y[n], then its inverse would produce x[n] if fed with y[n].
BIT REVERSED ADDRESSING
It is a special type of indirect addressing. It is used for implementing FFT*ARn ++ (IRO)BAfter the operand is fetched, AR n is updated to (AR N + IRO) in a reversed carry propagation format.
CIRCULAR ADDRESSING
A circular buffer is necessary to implement the delays associated with convolution and correlation equations. The block size is in register Bk.
*ARI ++;ARI is incremented each time until it points to the bottom of the circular buffer. After that it will point to the top of the buffer.
REPEAT INSTRUCTION
A block of instruction is repeated ‘count’ number of times using RPTB. RC contains the count number.
LDI 8, RCRPTB Label 1CALL filterFIX RO
Label1 STI RO, * AR3
RPTB instruction repeats next instruction ‘count’ number of times
PARALLEL INSTRUCTION
The symbol ‘||’ indicated parallel operationLDF 0, ROLDI 29, AR2RPTS AR2MPYF *ARO++, *AR1++, R0
|| ADDF RO, R2, R2
MPYF ---> Multiply Floating point numberOld value
New Value
Parallel operation
DELAYED BRANCHConditional or unconditional delayed branch allows the subsequent 3 instruction to be fetched and executed. This gives the effect of single cycle branch.
BD Loop; Delayed BranchADDF R0, R1 } FIX R1 } executed whetherSTI R1, *AR3 } branch is taken or notLoop
Standard branches empty the pipeline before branching. This results in taking 4 cycles to execute branch.
DSP CHIPS• Analog Devices ADSP 2100, 21020• AT&T DSP 16. 32• DSP semiconductors Pine 16 bit fixed point• Motorola 56100, 96000• NEC uPD 77C25 (16 bit fixed pt)• 77220 (24 bit fixed pt)• SGS Thomson ST 18 ( 16 bit fixed point)• Start semiconductor SPROC 1000 24 bit fixed point• Texas Instruments TMS3201x, 2x, 3x, 4x, 80, 6xx• Zilog Z89 Cxx 16 bit fixed DSP• Xilinx DSP FPGA
MARKET SHARE
• TI 46.7%• AT&T 18.7%• MOTOROLA 15%• AD 9.3%• NEC 8.4%• OTHER 1.9%
DSP Vs Microcontroller
Microcontroller Digital Signal Processor
• Multicycle instruction set. Single cycle inst. set.
• Multicycle multiplicity. Single cycle multiply.• 8 or 16 bit support. 16/32 bit fixed or floating.• Limited onchip RAM. Large on chip data RAM.
• Limited data pointers. Data pointers.• Limited BW and limited algorithms. Speed!
Present Day ApplicationsPresent Day Applications
Consumer AudioConsumer AudioStereo A/D, D/AStereo A/D, D/A
PLLPLLMixersMixers
MultimediaMultimediaStereo audioStereo audio
ImagingImagingGraphics paletteGraphics palette
Voltage regulationVoltage regulation
Wireless / CellularWireless / CellularVoiceVoice--band audioband audio
RF codecsRF codecsVoltage regulationVoltage regulation
HDDHDDPRML read channelPRML read channel
MR preMR pre--ampampServo controlServo control
SCSI tranceiversSCSI tranceivers
AutomotiveAutomotiveDigital radio A/D/ADigital radio A/D/AActive suspensionActive suspensionVoltage regulationVoltage regulation
DTADDTADSpeech synthesizerSpeech synthesizer
MixedMixed--signalsignalprocessorprocessor
DSP:DSP:TechnologyTechnology
EnablerEnabler
System ConsiderationsSystem Considerations
PerformancePerformanceInterfacingInterfacingPowerPower
SizeSize
EaseEase--of Useof Use•• ProgrammingProgramming•• InterfacingInterfacing•• Debugging Debugging
IntegrationIntegration•• MemoryMemory•• PeripheralsPeripherals
CostCost•• Device costDevice cost•• System costSystem cost•• Development costDevelopment cost•• Time to market Time to market
Different Needs? Multiple Families!Different Needs? Multiple Families!
C2000C2000(C20x/24x/28x)(C20x/24x/28x)
‘C1x ‘C2x‘C1x ‘C2x
C6000C6000(C62x/64x/67x)(C62x/64x/67x)
‘C3x ‘C4x ‘C8x‘C3x ‘C4x ‘C8x
Multi Channel and Multi Channel and Multi Function App'sMulti Function App'sComm InfrastructureComm InfrastructureWireless BaseWireless Base--stationsstationsDSLDSLImagingImagingMultiMulti--media Serversmedia ServersVideoVideo
Max Max PerformancePerformancewith with
Best Best EaseEase--ofof--UseUse
C5000C5000(C54x/55x)(C54x/55x)
‘C5x‘C5x
Lowest CostControl SystemsControl Systems
Motor ControlMotor ControlStorageStorageDigital Ctrl Systems
Lowest Cost
EfficiencyBest MIPS perBest MIPS per
Watt / Dollar / SizeWatt / Dollar / SizeWireless phonesWireless phonesInternet audio playersInternet audio playersDigital still cameras Digital still cameras ModemsModemsTelephonyTelephonyVoIP
Efficiency
Digital Ctrl Systems VoIP
'C6000 Block Diagram'C6000 Block Diagram
CPU
Internal BusesInternal Buses
PPEERRIIPPHHEERRAALLSS
InternalInternalMemoryMemory
ExternalExternalMemoryMemory
'C6000 System Block Diagram'C6000 System Block Diagram
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Regs (B
0R
egs (B0 -- B
15)B
15)
Regs (A
0R
egs (A0 -- A
15)A
15)
CPUCPU
PPEERRIIPPHHEERRAALLSS
Internal BusesInternal Buses
InternalInternalMemoryMemory
ExternalExternalMemoryMemory
What Problem Are We Trying To Solve?What Problem Are We Trying To Solve?
Digital sampling of Digital sampling of an analog signal:an analog signal:
A
tt
Most DSP algorithms can be Most DSP algorithms can be expressed with MAC:expressed with MAC:
countcount
i = 1i = 1Y = Y = ΣΣ aaii * x* xii
for (i = 1; i < count; i++){for (i = 1; i < count; i++){sum += m[i] * n[i]; }sum += m[i] * n[i]; }
DACDACxx YYADCADC DSPDSP
What does it take to do this fast … and easy?What does it take to do this fast … and easy?
Fastest Execution of MACsFastest Execution of MACsThe ‘C6x roadmap ... from 200 to 2400 MMACsThe ‘C6x roadmap ... from 200 to 2400 MMACs
Ease of C ProgrammingEase of C ProgrammingEven using natural C, the ‘C6000 Architecture can Even using natural C, the ‘C6000 Architecture can perform 2 to 4 MACs per cycleperform 2 to 4 MACs per cycleCompiler generates 80Compiler generates 80--100% efficient code100% efficient code
MultiplyMultiply--Accumulate (MAC) in Natural C CodeAccumulate (MAC) in Natural C Code
for (i = 0; i < count; i++){for (i = 0; i < count; i++){sum += m[i] * n[i]; }sum += m[i] * n[i]; }
Fast MAC using only CFast MAC using only C
How does the ‘C6000 achieve such performance from C?How does the ‘C6000 achieve such performance from C?
Sample Compiler BenchmarksSample Compiler BenchmarksGreat out-of-box experience Completely natural C code (non ’C6x specific)Code available at: www.ti.com/sc/c6000compilerVersus hand-coded assembly based on cycle count
How does the ‘C6000 achieve such performance from C?How does the ‘C6000 achieve such performance from C?
'C6000 Architecture: Built for Speed'C6000 Architecture: Built for Speed
A0A0
A31A31
....A15A15
....
.M1.M1.M1
.L1.L1.L1
.D1.D1.D1
.S1.S1.S1
.M2.M2.M2
.L2.L2.L2
.D2.D2.D2
.S2.S2.S2
B0B0
B31B31
....B15B15
....
Controller/DecoderController/DecoderController/Decoder
MemoryMemory‘C6000 Compiler ‘C6000 Compiler excels at excels at Natural CNatural CWhile While dualdual--MACMAC speeds speeds math intensive algorithms, math intensive algorithms, flexibility of 8 independent flexibility of 8 independent functional unitsfunctional units allows the allows the compiler to quickly perform compiler to quickly perform other types of processingother types of processingAll ‘C6000 instructions are All ‘C6000 instructions are conditionalconditional allowing efficient allowing efficient hardware pipelininghardware pipeliningInstruction set and CPU Instruction set and CPU hardware orthogonality allow hardware orthogonality allow the compiler to achieve 80the compiler to achieve 80--100% efficiency100% efficiency
Fastest MAC using Natural CFastest MAC using Natural C
;** ;** ----------------------------------------------------------------------------------------------------**LOOP:LOOP: ; PIPED LOOP KERNEL; PIPED LOOP KERNEL
LDDWLDDW .D1.D1 A4++,A7:A6A4++,A7:A6|||| LDDWLDDW .D2.D2 B4++,B7:B6B4++,B7:B6|||| MPYSPMPYSP .M1X.M1X A6,B6,A5A6,B6,A5|||| MPYSPMPYSP .M2X.M2X A7,B7,B5A7,B7,B5|||| ADDSPADDSP .L1.L1 A5,A8,A8A5,A8,A8|||| ADDSPADDSP .L2.L2 B5,B8,B8B5,B8,B8|| [A1]|| [A1] BB .S2.S2 LOOPLOOP|| [A1]|| [A1] SUBSUB .S1.S1 A1,1,A1A1,1,A1;** ;** ----------------------------------------------------------------------------------------------------**
float mac(float *m, float *n, int count)float mac(float *m, float *n, int count){ int i, float sum = 0;{ int i, float sum = 0;
for (i=0; i < count; i++) {for (i=0; i < count; i++) {sum += m[i] * n[i]; } …
A0A0
A31A31
....A15A15
....
.M1.M1.M1
.L1.L1.L1
.D1.D1.D1
.S1.S1.S1
.M2.M2.M2
.L2.L2.L2
.D2.D2.D2
.S2.S2.S2
B0B0
B31B31
....B15B15
....
Controller/DecoderController/DecoderController/Decoder
MemoryMemory
sum += m[i] * n[i]; } …
'C6000 System Block Diagram'C6000 System Block Diagram
ExternalExternalMemoryMemory
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
PPEERRIIPPHHEERRAALLSS
Internal BusesInternal Buses
InternalInternalMemoryMemory
Looking at the internal buses ...Looking at the internal buses ...
‘C6000 Internal Buses‘C6000 Internal Buses
PCPCProgram AddrProgram Addr x32x32
Program DataProgram Data x256x256
DMADMA
DMA AddrDMA Addr -- ReadReadDMA DataDMA Data -- ReadRead
DMA AddrDMA Addr -- WriteWriteDMA DataDMA Data -- WriteWrite
AAregsregs
BBregsregs
Data AddrData Addr -- T1T1 x32x32
Data DataData Data -- T1T1 x32/64x32/64
Data AddrData Addr -- T2T2 x32x32
Data DataData Data -- T2T2 x32/64x32/64
InternalInternalMemoryMemory
ExternalExternalMemoryMemory
PeripheralsPeripherals
'C6000 System Block Diagram'C6000 System Block Diagram
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
ExternalExternalMemoryMemory
Next, the internal memory ...Next, the internal memory ...
‘C6711 Memory‘C6711 Memory
FFFF_FFFFFFFF_FFFF
0000_00000000_000064KB Internal64KB Internal
OnOn--chip Peripheralschip Peripherals0180_00000180_0000
128MB External2
128MB External3
8000_00008000_00009000_00009000_0000A000_0000A000_0000B000_0000B000_0000
128MB External0
128MB External1
64K64KProg / DataProg / Data
(Level 2)(Level 2)CPUCPU
4K4KProgramProgramCacheCache
4K4KDataData
CacheCache
cache logic
cache details
‘C6711 Cache Logic‘C6711 Cache Logic
CPU requestsCPU requestsdatadata
Is data in L1?Is data in L1? Is data in L2?Is data in L2?Copy DataCopy Data
fromfromExternal MemExternal Mem
to L2to L2
Copy DataCopy Datafrom L2 to L1from L2 to L1
Send DataSend Datato CPUto CPU
NoNo
YesYesYesYes
NoNo
‘C6711 Cache Details‘C6711 Cache Details
Level 1 ProgramLevel 1 Program•• Always cacheAlways cache•• 1 way cache 1 way cache
(direct mapped)(direct mapped)•• Zero waitZero wait--statestate•• Line size:Line size: 512 bits512 bits
(or 16 instr)(or 16 instr)Level 1 DataLevel 1 Data•• Always cacheAlways cache•• 2 way cache2 way cache•• Zero waitZero wait--statestate•• Line size:Line size: 256 bits256 bits
Level 2Level 2•• Unified (prog or data)Unified (prog or data)•• RAM or cacheRAM or cache•• 11--4 way cache4 way cache•• 32 data bytes in 4 cycles32 data bytes in 4 cycles•• 16 instr. in 5 cycles16 instr. in 5 cycles•• Line Size:Line Size: 1024 bits1024 bits
(or 128 bytes)(or 128 bytes)
CPU
L1Prog(4KB)
L1Data(4KB)
L2L2UnifiedUnified
(64KB)(64KB)
256256
8/16/32/648/16/32/64
128128
256256
'C6000 System Block Diagram'C6000 System Block Diagram
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
PPEERRIIPPHHEERRAALLSS
Internal BusesInternal Buses
InternalInternalMemoryMemory
ExternalExternalMemoryMemory
Looking at each peripheral ...Looking at each peripheral ...
'C6000 Peripherals'C6000 Peripherals
ExternalExternalMemoryMemory
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
McBSP’sMcBSP’sUtopiaUtopia
GPIOGPIO
VCPVCPTCPTCP
DMA, EDMADMA, EDMA(Boot)(Boot)
TimersTimers
PLLPLL
XB, PCI,XB, PCI,Host PortHost Port
EMIFEMIF
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
SDRAMSDRAM
AsyncAsync
SBSRAMSBSRAM
EMIFEMIF
EMIFEMIF
External Memory Interface (EMIF)Glueless access to async/sync memoryWorks with PC100 SDRAM (cheap, fast, and easy!)Byte-wide data access16, 32, or 64-bit bus widths
External Memory Interface (External Memory Interface (EMIFEMIF))Glueless access to async/sync memoryGlueless access to async/sync memoryWorks with PC100 SDRAM (cheap, fast, and easy!)Works with PC100 SDRAM (cheap, fast, and easy!)ByteByte--wide data accesswide data access16, 32, or 6416, 32, or 64--bit bus widthsbit bus widths
N/AN/A64M Bytes64M Bytes(16(16--bits wide)bits wide)C6712C6712
L1PL1P == 16 KB16 KBL1DL1D == 16 KB16 KBL2L2 == 1 MB1 MB
L1PL1P == 4 KB4 KBL1DL1D == 4 KB4 KBL2L2 == 64 KB64 KB
P = 384 KBP = 384 KBD = 512 KBD = 512 KB
P = 256 KBP = 256 KBD = 128 KBD = 128 KB
P = 64 KBP = 64 KBD = 64 KBD = 64 KB
InternalInternal
256M Bytes256M Bytes(64(64--bits wide)bits wide)
128M Bytes128M Bytes(32(32--bits wide)bits wide)
52M Bytes52M Bytes(32(32--bits wide)bits wide)
52M Bytes52M Bytes(32(32--bits wide)bits wide)
52M Bytes52M Bytes(32(32--bits wide)bits wide)
EMIF (A)EMIF (A)size of rangesize of range
64M Bytes64M Bytes(16(16--bits wide)bits wide)
C6414C6414C6415C6415C6416C6416
N/AN/AC6211C6211C6711C6711C6712C6712
N/AN/AC6203C6203
N/AN/AC6202C6202
N/AN/AC6201C6201C6204C6204C6205C6205C6701C6701
EMIFBEMIFBsize of rangesize of range
DevicesDevices
52M Bytes52M Bytes(32(32--bits wide)bits wide) N/AN/A
L1PL1P == 4 KB4 KBL1DL1D == 4 KB4 KBL2L2 == 64 KB64 KB
N/AN/A
Internal and External MemoryInternal and External Memory
ExternalExternalMemoryMemory
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
XBUS, PCI,XBUS, PCI,Host PortHost Port
EMIFEMIF
Parallel Peripheral InterfaceHPI: Dedicated, slave-only, async 16/32-bit bus allows
host-µP access to C6000 memoryXBUS: Similar to HPI but provides …
Master/slave and sync modesGlueless i/f to FIFOs (up to single-cycle xfer rate)
PCI: Standard 32-bit, 33MHz PCI interfaceThese interfaces provide means to bootstrap the C6000
Parallel Peripheral InterfaceParallel Peripheral InterfaceHPI:HPI: Dedicated, slaveDedicated, slave--only, async 16/32only, async 16/32--bit bus allows bit bus allows
hosthost--µµP access to C6000 memoryP access to C6000 memoryXBUS:XBUS: Similar to HPI but provides …Similar to HPI but provides …
Master/slave and sync modesMaster/slave and sync modesGlueless i/f to FIFOs (up to singleGlueless i/f to FIFOs (up to single--cycle xfer rate)cycle xfer rate)
PCI:PCI: Standard 32Standard 32--bit, 33MHz PCI interfacebit, 33MHz PCI interfaceThese interfaces provide means to bootstrap the C6000These interfaces provide means to bootstrap the C6000
HPI / XBUS / PCIHPI / XBUS / PCI
GPIOGPIO
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
GPIOGPIO
XB, PCI,XB, PCI,Host PortHost Port
EMIFEMIF
General Purpose Input/Output (GPIO)‘C64x provides 8 or 16 bits of general purpose bitwise I/OUse to observe or control the signal of a single-pin
General Purpose Input/Output (GPIO)General Purpose Input/Output (GPIO)‘‘C64x provides 8 or 16 bits of general purpose bitwise I/OC64x provides 8 or 16 bits of general purpose bitwise I/OUse to observe or control the signal of a singleUse to observe or control the signal of a single--pinpin
ExternalExternalMemoryMemory
McBSP and UtopiaMcBSP and Utopia
ExternalExternalMemoryMemory
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
McBSP’sMcBSP’sUtopiaUtopia
GPIOGPIO
XB, PCI,XB, PCI,Host PortHost Port
EMIFEMIF
Multi-Channel Buffered Serial Port (McBSP)2 (or 3) full-duplex, synchronous serial-portsUp to 100 Mb/sec performanceSupports multi-channel operation (T1, E1, MVIP, …)
Utopia (C64x)ATM connection50 MHz wide area network connectivity
MultiMulti--Channel Buffered Serial Port (Channel Buffered Serial Port (McBSPMcBSP))2 (or 3) f2 (or 3) fullull--duplex, synchronous serialduplex, synchronous serial--portsportsUp to 100 Mb/sec performanceUp to 100 Mb/sec performanceSSupportsupports multimulti--channel operation (T1, E1, MVIP, …)channel operation (T1, E1, MVIP, …)
Utopia (Utopia (C64xC64x))ATM connectionATM connection50 MHz wide area network connectivity50 MHz wide area network connectivity
DMA / EDMADMA / EDMA
ExternalExternalMemoryMemory
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
McBSP’sMcBSP’sUtopiaUtopia
GPIOGPIO
DMA, EDMADMA, EDMA(Boot)(Boot)
XB, PCI,XB, PCI,Host PortHost Port
EMIFEMIF
Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to another4 / 16 / 64 channels (transfer parameter sets)Transfers can be triggered by any interrupt (sync)Operates independent of CPUOn reset, provides bootstrap from memory
Direct Memory Access (DMA / EDMA) Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to anotherTransfers any set of memory locations to another4 / 16 / 64 channels (transfer parameter sets)4 / 16 / 64 channels (transfer parameter sets)Transfers can be triggered by any interrupt (sync)Transfers can be triggered by any interrupt (sync)Operates independent of CPUOperates independent of CPUOn reset, provides bootstrap from memoryOn reset, provides bootstrap from memory
Timer / CounterTimer / Counter
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
McBSP’sMcBSP’sUtopiaUtopia
GPIOGPIO
DMA, EDMADMA, EDMA(Boot)(Boot)
TimersTimers
XB, PCI,XB, PCI,Host PortHost Port
EMIFEMIF
Timer / CounterTwo (or three) 32-bit timer/countersCan generate interruptsBoth input and output pins
Timer / CounterTimer / CounterTwo (or three) 32Two (or three) 32--bit timer/countersbit timer/countersCan generate interruptsCan generate interruptsBoth iBoth input and output pinsnput and output pins
ExternalExternalMemoryMemory
VCP / TCP VCP / TCP ---- 3G Wireless3G Wireless
ExternalExternalMemoryMemory
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
McBSP’sMcBSP’sUtopiaUtopia
GPIOGPIO
VCPVCPTCPTCP
DMA, EDMADMA, EDMA(Boot)(Boot)
TimersTimers
XB, PCI,XB, PCI,Host PortHost Port
EMIFEMIFTurbo Coprocessor (TCP)Supports 35 data channels at 384 kbps3GPP / IS2000 Turbo coderProgrammable parameters include mode, rate and frame length
Viterbi Coprocessor (VCP)Supports >500 voice channels at 8 kbpsProgrammable decoder parameters include constraint length, code rate, and frame length
Turbo Coprocessor (TCP)Supports 35 data channels at 384 kbpsSupports 35 data channels at 384 kbps3GPP / IS2000 Turbo coder3GPP / IS2000 Turbo coderProgrammable parameters include mode, rate and frame lengthProgrammable parameters include mode, rate and frame length
Viterbi Coprocessor (VCP)Supports >500 voice channels at 8 kbpsSupports >500 voice channels at 8 kbpsProgrammable decoder parameters include constraint length, Programmable decoder parameters include constraint length, code rate, and frame lengthcode rate, and frame length
PLLPLL
ExternalExternalMemoryMemory
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
McBSP’sMcBSP’sUtopiaUtopia
GPIOGPIO
VCPVCPTCPTCP
DMA, EDMADMA, EDMA(Boot)(Boot)
TimersTimers
PLLPLL
XB, PCI,XB, PCI,Host PortHost Port
EMIFEMIF
PLLExternal clock multiplierReduces EMI and costPin selectable
PLLPLLExternal clock multiplierExternal clock multiplierReduces EMI and costReduces EMI and costPin selectablePin selectable
InputCLKIN
OutputCLKOUT1- Output rate of PLL- Instruction (MIP) rateCLKOUT2- 1/2 rate of CLKOUT1
InputInputCLKINCLKIN
OutputOutputCLKOUT1CLKOUT1-- Output rate of PLLOutput rate of PLL-- Instruction (MIP) rateInstruction (MIP) rateCLKOUT2CLKOUT2-- 1/2 rate of CLKOUT11/2 rate of CLKOUT1
'C6000 Peripherals'C6000 Peripherals
ExternalExternalMemoryMemory
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
Internal BusesInternal Buses
InternalInternalMemoryMemory
McBSP’sMcBSP’sUtopiaUtopia
GPIOGPIO
VCPVCPTCPTCP
DMA, EDMADMA, EDMA(Boot)(Boot)
TimersTimers
PLLPLL
XB, PCI,XB, PCI,Host PortHost Port
EMIFEMIF
C6000 RoadmapC6000 RoadmapPe
rfor
man
ce
Highest
Performance
Time
Software CompatibleSoftware CompatibleFloating PointFloating PointFloating Point
Multi-coreMultiMulti--corecore C64x™ DSP1.1 GHz
C64xC64x™™ DSPDSP1.1 GHz1.1 GHz
C64x™ DSPC64xC64x™™ DSPDSP2nd Generation2nd Generation
General General PurposePurpose C6414C6414C6414 C6415C6415C6415 C6416C6416C6416
MediaMediaGatewayGateway
3G Wireless 3G Wireless InfrastructureInfrastructure
C6201C6201
C6701C6701
C6202C6202C6203C6203
C6211C6211C6711C6711
C6204C6204
1st Generation1st Generation
C6205C6205
C6712C6712
C62xC62x™™
C67xC67x™™
P erf
orm
a nc e
Time
C67x
3 GFLOPS and beyond
C6712
600MFLOPS
C6711
900 MFLOPS
C6701
1 GFLOPS
150 MFLOPSC32
C31
C30
C33
’C6000 Floating’C6000 Floating--PointPoint
TI FloatingTI Floating--Point InnovationPoint Innovation
TI Floating Point TI Floating Point -- A History of Firsts:A History of Firsts:First commerciallyFirst commercially--successful floatingsuccessful floating--point DSP point DSP ‘C30 (1987)‘C30 (1987)First floatingFirst floating--point DSP with multiprocessing support point DSP with multiprocessing support ‘C40 (1991)‘C40 (1991)First $10 floatingFirst $10 floating--point DSP point DSP ‘C32 (1995)‘C32 (1995)First 1First 1--GFLOPS DSP GFLOPS DSP ‘C6701 (1998)‘C6701 (1998)First $5 floatingFirst $5 floating--point DSP point DSP ‘C33 (1999)‘C33 (1999)First 2First 2--level cache floatinglevel cache floating--point DSP point DSP ‘C6711 (1999)‘C6711 (1999)First to offer 600 MFLOPS for under $10First to offer 600 MFLOPS for under $10 ‘C6712 (2000)‘C6712 (2000)
What Problem Are We Trying To Solve?What Problem Are We Trying To Solve?
Digital sampling of Digital sampling of an analog signal:an analog signal:
A
tt
4040
i = 1i = 1Y = Y = ΣΣ aaii * x* xii
DACDACxx YYADCADC DSPDSP
Most DSP algorithms can be Most DSP algorithms can be expressed as:expressed as:
What are the two primary instructions?What are the two primary instructions?
The Core of DSP : Sum of ProductsThe Core of DSP : Sum of Products
MultMultMult
ALUALUALUMPYMPY a, x, proda, x, prodADDADD y, prod, yy, prod, y
y =y =4040
∑∑ aann xxnnn = 1n = 1
**
ALUAALLUU
.M.M.M
MPYMPY .M.M a, x, proda, x, prod.L.L.L ADDADD .L .L y, prod, yy, prod, y
Note:Note:You don’t have to You don’t have to specify functional specify functional units (.M or .L)units (.M or .L)
Where are the variables?Where are the variables?
The ’C6000The ’C6000Designed to Designed to
handle DSP’shandle DSP’smathmath--intensiveintensive
calculationscalculations
Working Variables : The Register FileWorking Variables : The Register File
Register File ARegister File Ay =y = ∑∑ aann xxnn
n = 1n = 1**
MPYMPY .M.M a, x, proda, x, prodADDADD .L.L y, prod, yy, prod, y
4040
aaxx
prodprodyy......
16 re
gist
ers
16 re
gist
ers
.M.M.M
.L.L.L
3232--bitsbits
How are the number of iterations specified?How are the number of iterations specified?
Loops: Coding on a RISC ProcessorLoops: Coding on a RISC Processor
1.1. Program flow: Program flow: the branch instructionthe branch instruction
2.2. Initialization: Initialization: setting the loop countsetting the loop count
3.3. Decrement: Decrement: subtract 1 from the loop countersubtract 1 from the loop counter
B loop B loop
SUB cnt, 1, cnt SUB cnt, 1, cnt
MVK 40, cnt MVK 40, cnt
The “S” Unit : For Standard OperationsThe “S” Unit : For Standard Operations
.M.M.M
.L.L.L
.S.S.S
Register File ARegister File A
3232--bitsbits
16 re
gist
ers
16 re
gist
ers
aaxx
prodprodyy......
cntcnt
y =y =4040
∑∑ aann xxnnn = 1n = 1
**
MVKMVK .S.S 40, cnt40, cntloop:loop:
MPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cntBB .S.S looploop
How is the loop terminated?How is the loop terminated?
Conditional Instruction ExecutionConditional Instruction Execution
To minimize branching, To minimize branching, allall instructions are conditionalinstructions are conditional
[condition][condition] BB looploop
Code SyntaxCode Syntax Execute if:Execute if:[ cnt ][ cnt ] cnt cnt ≠≠ 00[ !cnt ][ !cnt ] cnt = 0cnt = 0
Execution based on [zero/nonExecution based on [zero/non--zero] value of specified variablezero] value of specified variable
Note: if condition is false, execution replaced with nopNote: if condition is false, execution replaced with nop
Loop Control via Conditional BranchLoop Control via Conditional Branch
.M.M.M
.L.L.L
.S.S.S
Register File ARegister File A
3232--bitsbits
aaxx
prodprodyy......
cntcnt
y =y =4040
∑∑ aann xxnnn = 1n = 1
**
MVKMVK .S.S 40, cnt40, cntloop:loop:
MPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cnt
[cnt][cnt] BB .S.S looploop
How are the a and x array values brought in from memory?How are the a and x array values brought in from memory?
Memory Access via “.D” UnitMemory Access via “.D” Unit
.M
16 re
gist
ers
16 re
gist
ers
Register File ARegister File Aaaxx
prodprodyy
cntcnt
*ap*ap*xp*xp*yp*yp
.D .D .D
.M .M
.L .L .L
.S .S .S y =y =
4040
∑∑ aann xxnnn = 1n = 1
**
MVKMVK .S.S 40, cnt40, cntloop:loop:
LDHLDH .D.D *ap , a*ap , aLDHLDH .D.D *xp , x*xp , xMPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cnt
[cnt][cnt] BB .S.S looploop
Data Memory:Data Memory:x(40), a(40), yx(40), a(40), y How do we increment through the arrays?How do we increment through the arrays?
AutoAuto--Increment of PointersIncrement of Pointers
Register File ARegister File Aaaxx
prodprodyy
cntcnt
*ap*ap*xp*xp*yp*yp
y =y =4040
∑∑ aann xxnnn = 1n = 1
**
MVKMVK .S.S 40, cnt40, cntloop:loop:
LDHLDH .D.D *ap*ap++++, a, aLDHLDH .D.D *xp*xp++++, x, xMPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cnt
[cnt][cnt] BB .S.S looploop
How do we store results back to memory?How do we store results back to memory?
.M .M .M
.L .L .L
.S .S .S
Data Memory:Data Memory:x(40), a(40), yx(40), a(40), y
.D .D .D
16 re
gist
ers
16 re
gist
ers
Storing Results Back to MemoryStoring Results Back to Memory
Register File ARegister File Aaaxx
prodprodyy
cntcnt
*ap*ap*xp*xp*yp*yp
.M .M .M
.L .L .L
.S .S .S
Data Memory:Data Memory:x(40), a(40), yx(40), a(40), y
.D .D .D
y =y =4040
∑∑ aann xxnnn = 1n = 1
**
MVKMVK .S.S 40, cnt40, cntloop:loop:
LDHLDH .D.D *ap++, a*ap++, aLDHLDH .D.D *xp++, x*xp++, xMPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cnt
[cnt][cnt] BB .S.S looploopSTWSTW .D.D y, *ypy, *yp
But wait But wait -- that’s only half the story...that’s only half the story...
Dual Resources : Twice as NiceDual Resources : Twice as Nice
A0A0A1A1A2A2A3A3A4A4
Register File ARegister File A
A15A15
A5A5A6A6A7A7
aann
xxnn
prdprdsumsum
cntcnt
....
*a*a*x*x*y*y
.M1.M1.M1
.L1.L1.L1
.S1.S1.S1
.D1.D1.D1
.M2.M2.M2
.L2.L2.L2
.S2.S2.S2
.D2.D2.D2
Register File BRegister File BB0B0B1B1B2B2B3B3B4B4
B15B15
B5B5B6B6B7B7....
3232--bitsbits
........
3232--bitsbits
Our final view of the sum of products example...Our final view of the sum of products example...
‘C6000 System Block Diagram‘C6000 System Block Diagram
.D1.D1
.M1.M1
.L1.L1
.S1.S1
.D2.D2
.M2.M2
.L2.L2
.S2.S2
Register Set BRegister Set B
Register Set ARegister Set A
CPUCPU
PPEERRIIPPHHEERRAALLSS
Internal BusesInternal Buses
InternalInternalMemoryMemory
ExternalExternalMemoryMemory
To summarize each units’ instructions ...To summarize each units’ instructions ...
‘C62x RISC‘C62x RISC--like instruction setlike instruction set
No Unit UsedIDLEIDLENOPNOP
.S Unit.S UnitNEGNEGNOT NOT ORORSETSETSHLSHLSHRSHRSSHLSSHLSUBSUBSUB2SUB2XORXORZEROZERO
ADDADDADDKADDKADD2ADD2ANDANDBBCLRCLREXTEXTMVMVMVCMVCMVKMVKMVKHMVKH
.L Unit.L UnitNOTNOTORORSADDSADDSATSATSSUBSSUBSUBSUBSUBCSUBCXORXORZEROZERO
ABSABSADDADDANDANDCMPEQCMPEQCMPGTCMPGTCMPLTCMPLTLMBDLMBDMVMVNEGNEGNORMNORM
.M Unit.M UnitSMPYSMPYSMPYHSMPYH
MPYMPYMPYHMPYHMPYLHMPYLHMPYHLMPYHL
.D Unit.D UnitNEGNEGSTBSTB (B/H/W) (B/H/W) SUBSUBSUBABSUBAB (B/H/W) (B/H/W) ZEROZERO
ADDADDADDABADDAB (B/H/W)(B/H/W)LDBLDB (B/H/W)(B/H/W)
MVMV
.L .L .L
.D .D .D
.S .S .S
.M .M .M
‘C67x : Superset of Fixed‘C67x : Superset of Fixed--PointPoint
No Unit UsedIDLEIDLENOPNOP
.S Unit.S UnitNEGNEGNOT NOT ORORSETSETSHLSHLSHRSHRSSHLSSHLSUBSUBSUB2SUB2XORXORZEROZERO
ADDADDADDKADDKADD2ADD2ANDANDBBCLRCLREXTEXTMVMVMVCMVCMVKMVKMVKHMVKH
ABSSPABSSPABSDPABSDPCMPGTSPCMPGTSPCMPEQSPCMPEQSPCMPLTSPCMPLTSPCMPGTDPCMPGTDPCMPEQDPCMPEQDPCMPLTDPCMPLTDPRCPSPRCPSPRCPDPRCPDPRSQRSPRSQRSPRSQRDPRSQRDPSPDPSPDP
.L Unit.L UnitNOTNOTORORSADDSADDSATSATSSUBSSUBSUBSUBSUBCSUBCXORXORZEROZERO
ABSABSADDADDANDANDCMPEQCMPEQCMPGTCMPGTCMPLTCMPLTLMBDLMBDMVMVNEGNEGNORMNORM
ADDSPADDSPADDDPADDDPSUBSPSUBSPSUBDPSUBDPINTSPINTSPINTDPINTDPSPINTSPINTDPINTDPINTSPRTUNCSPRTUNCDPTRUNCDPTRUNCDPSPDPSP
.M Unit.M UnitSMPYSMPYSMPYHSMPYH
MPYMPYMPYHMPYHMPYLHMPYLHMPYHLMPYHL
MPYSPMPYSPMPYDPMPYDPMPYIMPYIMPYIDMPYID
.D Unit.D UnitNEGNEGSTBSTB (B/H/W) (B/H/W) SUBSUBSUBAB SUBAB (B/H/W) (B/H/W) ZEROZERO
ADDADDADDABADDAB (B/H/W)(B/H/W)LDBLDB (B/H/W)(B/H/W)LDDWLDDWMVMV
.L .L .L
.D .D .D
.S .S .S
.M .M .M
‘C64x ‘C64x →→ Superset of ‘C62xSuperset of ‘C62x
.L .L .L
.S .S .S
.D .D .D
.M .M .M
.S Unit.S UnitPACK2PACK2PACKH2PACKH2PACKLH2PACKLH2PACKHL2PACKHL2UNPKHU4UNPKHU4UNPKLU4UNPKLU4SWAP2SWAP2SPACK2SPACK2SPACKU4SPACKU4
SADD2SADD2SADDUS2SADDUS2SADD4SADD4ANDNANDNSHR2SHR2SHRU2SHRU2SHLMBSHLMBSHRMBSHRMB
CMPEQ2CMPEQ2CMPEQ4CMPEQ4CMPGT2CMPGT2CMPGT4CMPGT4BDECBDECBPOSBPOSBNOPBNOPADDKPCADDKPC
.L Unit.L UnitSHLMBSHLMBSHRMBSHRMBMVK(5MVK(5--bit)bit)
ABS2ABS2ADD2ADD2ADD4ADD4MAXMAXMINMINSUB2SUB2SUB4SUB4SUBABS4SUBABS4ANDNANDN
PACK2PACK2PACKH2PACKH2PACKLH2PACKLH2PACKHL2PACKHL2PACKH4PACKH4PACKL4PACKL4UNPKHU4UNPKHU4UNPKLU4UNPKLU4SWAP2/4SWAP2/4
.D Unit.D UnitLDDWLDDWLDNWLDNWLDNDWLDNDWSTDWSTDWSTNWSTNWSTNDWSTNDWMVK(5MVK(5--bit)bit)
ADD2ADD2SUB2SUB2ANDANDANDNANDNORORXORXORADDADADDAD
.M .M .M
.M Unit.M UnitMVDMVDBITC4BITC4BITRBITRDEALDEALSHFLSHFLMPYHIMPYHIMPYLIMPYLIMPYHIRMPYHIRMPYLIRMPYLIR
AVG2AVG2AVG4AVG4ROTLROTLSSHVLSSHVLSSHVRSSHVRBITC4BITC4BITRBITRDEALDEALSHFLSHFL
MPY2/SMPY2MPY2/SMPY2DOTP2DOTP2DOTPN2DOTPN2DOTPRSU2DOTPRSU2DOTPNRSU2DOTPNRSU2DOTPU4DOTPU4DOTPSU4DOTPSU4GMPY4GMPY4XPND2/4XPND2/4
DoubleDouble--sizesizeRegister setsRegister sets
(A16(A16--A31)A31)(B16(B16--B31)B31)
Advanced Advanced Instruction Instruction
PackingPacking(minimizes(minimizes
codecode--size)size)
Advanced Advanced EmulationEmulationFeaturesFeatures
Different Needs? Multiple Families!Different Needs? Multiple Families!
C2000C2000(C20x/24x/28x)(C20x/24x/28x)
‘C1x ‘C2x‘C1x ‘C2x
C6000C6000(C62x/64x/67x)(C62x/64x/67x)
‘C3x ‘C4x ‘C8x‘C3x ‘C4x ‘C8x
Multi Channel and Multi Channel and Multi Function App'sMulti Function App'sComm InfrastructureComm InfrastructureWireless BaseWireless Base--stationsstationsDSLDSLImagingImagingMultiMulti--media Serversmedia ServersVideoVideo
Max Max PerformancePerformancewith with
Best Best EaseEase--ofof--UseUse
C5000C5000(C54x/55x)(C54x/55x)
‘C5x‘C5x
Lowest CostControl SystemsControl Systems
Motor ControlMotor ControlStorageStorageDigital Ctrl Systems
Lowest Cost
EfficiencyBest MIPS perBest MIPS per
Watt / Dollar / SizeWatt / Dollar / SizeWireless phonesWireless phonesInternet audio playersInternet audio playersDigital still cameras Digital still cameras ModemsModemsTelephonyTelephonyVoIP
Efficiency
Digital Ctrl Systems VoIP
C6000 RoadmapC6000 RoadmapPe
rfor
man
ce
Highest
Performance
Time
Software CompatibleSoftware CompatibleFloating PointFloating PointFloating Point
Multi-coreMultiMulti--corecore C64x™ DSP1.1 GHz
C64xC64x™™ DSPDSP1.1 GHz1.1 GHz
C64x™ DSPC64xC64x™™ DSPDSP2nd Generation2nd Generation
General General PurposePurpose C6414C6414C6414 C6415C6415C6415 C6416C6416C6416
MediaMediaGatewayGateway
3G Wireless 3G Wireless InfrastructureInfrastructure
C6201C6201
C6701C6701
C6202C6202C6203C6203
C6211C6211C6711C6711
C6204C6204
1st Generation1st Generation
C6205C6205
C6712C6712
C62xC62x™™
C67xC67x™™
For More Information . . .For More Information . . .Website:Website: www.ti.comwww.ti.com
dspvillage.comdspvillage.comFTP:FTP: ftp://ftp.ti.com/pub/tms320bbsftp://ftp.ti.com/pub/tms320bbsFAQ:FAQ: http://wwwhttp://www--k.ext.ti.com/sc/technical_support/knowledgebase.htm k.ext.ti.com/sc/technical_support/knowledgebase.htm
Device informationDevice information TI & METI & MEApplication notesApplication notes News and eventsNews and eventsTechnical documentationTechnical documentation TrainingTraining
InternetInternet
Phone:Phone: 972972--644644--55805580Email:Email: [email protected]@ti.com
Information and support for Information and support for allall TI Semiconductor products/toolsTI Semiconductor products/toolsSubmit Submit suggestionssuggestions and errata for tools, silicon and documentsand errata for tools, silicon and documents
USA USA -- Product Information Center ( PIC )Product Information Center ( PIC )
Software Registration/Upgrades:Software Registration/Upgrades: 972972--293293--50505050Hardware Repair/Upgrades:Hardware Repair/Upgrades: 281281--274274--22852285Enroll in Technical Training:Enroll in Technical Training: www.ti.com/sc/trainingwww.ti.com/sc/training
(choose (choose Design WorkshopsDesign Workshops))
Other ResourcesOther Resources
Key C6000 ManualsKey C6000 Manuals
HardwareHardwareSPRU189SPRU189 -- CPU and Instruction Set Ref. GuideCPU and Instruction Set Ref. GuideSPRU190SPRU190 -- Peripherals Ref. GuidePeripherals Ref. GuideSPRU401SPRU401 -- Peripherals Chip Support Lib. Ref.Peripherals Chip Support Lib. Ref.SoftwareSoftwareSPRU198SPRU198 -- Programmer’s GuideProgrammer’s GuideSPRU303SPRU303 -- C6000 DSP/BIOS User’s GuideC6000 DSP/BIOS User’s GuideCode GenerationCode GenerationSPRU186SPRU186 -- Assembly Language Tools User’s GuideAssembly Language Tools User’s GuideSPRU187SPRU187 -- Optimizing C Compiler User’s GuideOptimizing C Compiler User’s Guide
Refer to the Refer to the C6000 Family UpdateC6000 Family Update handout for full listhandout for full list
Looking for Literature on DSP?Looking for Literature on DSP?
“A Simple Approach to Digital Signal Processing”“A Simple Approach to Digital Signal Processing”by Craig Marven and Gillian Ewers; by Craig Marven and Gillian Ewers; ISBN 0ISBN 0--47114711--52435243--99
“DSP Primer (Primer Series)”“DSP Primer (Primer Series)”by C. Britton Rorabaugh; by C. Britton Rorabaugh; ISBN 0ISBN 0--07050705--40044004--77
“A DSP Primer : With Applications to Digital Audioand Computer Music” by Ken Steiglitz; ISBN 0-8053-1684-1
“DSP First : A Multimedia Approach”James H. McClellan, Ronald W. Schafer, Mark A. Yoder;ISBN 0-1324-3171-8
Looking for Literature on ‘C6000 DSP?Looking for Literature on ‘C6000 DSP?
“Digital Signal Processing Implementation “Digital Signal Processing Implementation using the TMS320C6000TM DSP Platform”using the TMS320C6000TM DSP Platform”
by Naim Dahnoun; ISBN 0201by Naim Dahnoun; ISBN 0201--6191661916--44
“C6x“C6x--Based Digital Signal Processing”Based Digital Signal Processing”by Nasser Kehtarnavaz and Burc Simsek;by Nasser Kehtarnavaz and Burc Simsek;ISBN 0ISBN 0--1313--088310088310--77
Embedded System Design
MPUFPGA
ASICFPGA
Embedded System Design
MPUFPGA
Microprocessor Unit (MPU)Simple – not much area of FPGA
Probably not a 32-bit RISCMaybe an 8-bit or 16-bit stack-based design
FastSingle clock cycle instructions where possible
Easy to programProbably not C or C++Maybe FORTH or WHYP
Embedded System Design
MPUFPGA
ASICFPGA
Goal: Design entire embedded system as a single FPGA.Use VHDL to design all hardware including the MPU.Write the MPU software in WHYP and compile to VHDL.
Potential AdvantagesMinimize overall system costMinimize development timeMinimize per unit cost
Implement only the hardware and software needed for a particular design
Xilinx Spartan FPGAs
Xilinx XC4000E FPGAs
Xilinx Spartan-II FPGAs