Top Banner
Subra Ganesan Presentation at TACOM December 6 2002 Professor, Computer Science and Engineering Associate Director, Product Development and Manufacturing Center, Oakland University, Rochester, MI 48309 Email: [email protected]
95
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Microprocessors

Subra Ganesan

Presentation at TACOM

December 6 2002

Professor, Computer Science and EngineeringAssociate Director, Product Development and Manufacturing Center, Oakland University,

Rochester, MI 48309

Email: [email protected]

Page 2: Microprocessors

Topics Covered:

1. Introduction to DSP Processors

2. Fixed Point DSP- c24x

3. Floating Point DSP- C6711

4. Code Composer Studio

5. DSP/BIOS for C6711

6. External Memory Interface for C6711

7. Interrupt – C6711

8. Applications

Page 3: Microprocessors

DSP Microprocessor – Advances and Automotive Applications

• Advances in Circuit Technology, Architecture, Algorithms and VLSI design techniques have contributed to high performance Digital Signal Processing(DSP) microprocessors and to multitude of novel applications of DSP chips.

• DSP processors are RISC based which have fast arithmetic units, on chip memory, analog interface, serial ports, timers, counters, facilities for inter processor communications and

other special features.

Page 4: Microprocessors

The Microprocessor overview1949 Transistors1958 Integrated Circuits1961 ICs IN Quality1964 Small Scale IC(SSI) Gates1968 Medium Scale IC(MSI) Registers1971 Large Scale IC(LSI), Memory, CPU1972 8 BIT MICROPROCESSORS1973 16 BIT MICROPROCESSORS1982 32 BIT MICROPROCESSORS1984 DSP MICROPROCESSORS – I GENERATION1986 DSP MICROPROCESSORS – II GENERATION1988 DSP MICROPROCESSORS – III GENERATION1989 RISC MICROPROCESSORS – II QUALITY1990 MISC MINIMUM INSTRUSTION SET MICROPROCESSOR

Page 5: Microprocessors

MICROPROCESSOR OVERVIEW

2 Billion operations per second [BOPs]

TMS 320C80 32 bit RISC

80 Different14 address

modeSize B,W,L

0.5 MIPS7000068000

4523004 Bit Intel 4004 1971

Number of Instructions

PerformanceNumber of transistors

Microprocessor

Page 6: Microprocessors

INTRODUCTION TO DSP MICROPROCESSORS

DSP micros are reduced-instruction-set computers optimized for the fastest possible execution of the following instructions

• Addition• Subtraction• Multiplication• Shifting

Single cycle multiplication and shifting using ARRAY multiplier and barrel (or combination) shifter.

In contrast, general purpose micros effect such as operations via multiple cycle, micro-code instructions that make use of the ALU’s single cycle, parallel-add, single bit shift capability.

Page 7: Microprocessors

DSP micros do each multiply/accumulate in a single cycle = (e.g 100 ns.)

• For 80386: Add( 16 bit addition) = 125 ns(16 Mhz)

(IMUL) 16 bit * 16 bit multiplication = 1250 ns

Page 8: Microprocessors

DSP micros employ• Pipe lining of instructions• Use of addressing modes that efficiently access relevant

data structure (e.g., auto increment, auto decrement modes for arrays & Indexed addressing modes for FFTs)

Page 9: Microprocessors

Dual-Bus HARVARD ARCHITECTURE, which enables

• Simultaneous fetching of data and instructions• Special DSP related addressing modes (e.g., Index

computation module an arbitrary number, automatic circular queue or free data move for FIR filters, bit reversal for FFTs)

• Extra addressing,Multiple ALUs• Special interfaces to serve specific fields of application(

e.g., serial interfaces for CODEC in telecommunications)

Page 10: Microprocessors

Progress in new technologies, Gallium arsenate (GaAs) transistors and high electron-mobility transmission increase in the future DSP microprocessor.

80836 computes 1024 point FFT only 66% slower than 20 MHz TMS 32010.

New version general purpose micros with DSP like dual bus structures(e.g., 68030 Motorola) array multiplier, barrel shifter, GaAs/HEMT technology, can achieve a performance of 100 MIPS and upwards.

• TMS 32010 does = 5 MIPs320C25 = 10 MIPs

• Motorola 56000 = 10.25 MIPs(24 bit data)• TMS 320 C 6201 = 1600 MIPs

Page 11: Microprocessors

FLOATING-POINT DIGITAL SIGNAL PROCEESING CHIPS

DSP has the capability to perform floating-point arithmetic including multiply-accumulate operations with an increased degree of parallelism.

The design phase is often performed with the aid of high-level language or a commercial, DSP-oriented “design system” that yields a nonreal-time, floating point simulation on a general purpose computer.

The new generation of floating point digital signal processors are AT&T, DSP32C, Motorola DSP96002, and Texas Instruments TMS320C30.

Page 12: Microprocessors

A typical development system could involve an

• Iconic graphical interface( implemented in PC software)• A computer• A PC plugin board containing a floating point DSP micro chip• Memory system

Page 13: Microprocessors

The Next –PC is the first to incorporate a DSP micro. The on-board Motorola fixed-point DSP56001 is complemented by numerous “canned” procedures.

These procedures enable graphics and signal processing tasks to be carried out at rates orders-of-magnitude faster than possible with on-board MC68882 floating-point co-processor.

The cycle of improvement in functionality and performance for both general-purpose and DSP micros continues.

Architectures incorporating such structures as systolic arrays and neural networks, will replace those now considered conventional.

Page 14: Microprocessors

DSP APPLICATIONS CHARACTERSTICS1. Algorithms are mathematically intensive

e.g., for FIR filtern-1

y(n) = ∑ a(i) * x(n-1)i=0

Where y(n) = output samplesa(i) = coefficientsx(n-1) = input samples

2. Real time performance

e.g. Speech RecognitionImage processing within a frame update period

Page 15: Microprocessors

3. Sample Input SignalDSP processor must effectively handle sampled data in large quantities.

DSP processors must be flexible to accommodate changing algorithms, new DSP processors etc.

Page 16: Microprocessors

The DSP Environment: Definitions

LowpassFilter

(LPF1)

A/DConverter

DSP Processor

D/AConverter

LowpassFilter

(LPF2)

AnalogAnalog

Signal Signal

Page 17: Microprocessors

A simple digital filter system

X X X

+

X(n)

X(n)X(n-2)

X(n-N+1)

a(1) a(2)

Y(n)

Where

fs sampling frequency

a(0),a(i) co-efficients

y(n) Digital output

y(t) analog output

SampleRegister,

RRA/D R

fsX(n-1)

Xa(0)

Y(t)D/A

Page 18: Microprocessors

As long as the system samples the analog input at a frequency fs that is at least twice the information band width of that input, all information present in the original analog signal is contained in the digital signal

A/D conversion introduces quantization noise. Signal to quantization noise ratio or SQNR is a function of A/D’s accuracy.

• DSP stores current A/D sample and N-1 previous samples in a sample shift register, or a RAM which can simulate shift register function by modifying memory address pointers.

• The coefficients ai are stored in ROM or RAM and they determine the impulse response and filter characteristics.

• A large N gives a longer impulse response and generally produces filters with sharper roll-off, greater stop band attenuation, and less frequency ripple.

Page 19: Microprocessors

• This filter is called Nth order, finite impulse response (FIR) (no feed back path), digital filter.

• The FIR filter requires N multiplies and N-1 additions to compute an output y(n) each time the input signal is sampled.

• Some DSP applications involve sampling rates of up to 100 Mhz and 100 MIPS.

Page 20: Microprocessors

SHANNON’S SAMPLING THEORY

An analog signal containing maximum frequency fi Hz may be completely represented by regularly spaced samples, provided the sampling rate is at least 2f1 samples per second.

fs = 2f1 Nyquist sampling rate.If sampled at less than 2f1 rate, aliasing error occurs. Signal is

then represented with distortion which depends on the degree of aliasing.

• Use anti-aliasing filter, a low-pas filter with cut-off frequency at f1 (or fs/2)

Page 21: Microprocessors

Quantization Noise (Qe)

A/Da(t) n bit

Qe = ± ( V ref / 2 * 2 n )

e.g. V ref = 5 V, n = 8 then Qe = 5 / 512

Page 22: Microprocessors

|G(f)|

f

fSAMPfs/2

fs/2 fSAMP

(a) Input spectrum

(b) Sampled spectrum

(c) Reconstructed spectrum

(a) Input continuous time signalg(t)

(b) Sampled signalgr(t)

(c) Reconstructed signalFig. Aliasing in the frequency domain Fig. Aliasing in the time domain

Page 23: Microprocessors

LINEAR SYSTEM obeys the principle of superposition.If an input consisting of a number of signals is applied to a linear system, then the output is the sum or the superposition of the system’s responses to each signal considered separately

Page 24: Microprocessors

FREQUENCY PRESERVATION PROPERTY

If we apply a complicated signal containing many frequencies, the output must be the sum of output due to each input frequency , considered separately. The output contains only those frequencies present in the input.

TIME INVARIANT SYSTEMIt is the one whose property do not vary with time.

Page 25: Microprocessors

• LTI: Linear Time InvariantLTI associative property means that we may

analyze a complicated LTI system by breaking down into a number of simpler subsystems.

• Commutative PropertyIt means that the subsystems can be arranged in series or cascaded in any order without affecting the overall performance.

Page 26: Microprocessors

• Causal SystemIn this system the output depends only on the present and or/previous values of the input.

• Stable SystemIt is one that produces a finite or bounded output in response to the bounded input.

Page 27: Microprocessors

• InvertibilityIf a system with input x[n] gives an output y[n], then its inverse would produce x[n] if fed with y[n].

Page 28: Microprocessors

BIT REVERSED ADDRESSING

It is a special type of indirect addressing. It is used for implementing FFT*ARn ++ (IRO)BAfter the operand is fetched, AR n is updated to (AR N + IRO) in a reversed carry propagation format.

Page 29: Microprocessors

CIRCULAR ADDRESSING

A circular buffer is necessary to implement the delays associated with convolution and correlation equations. The block size is in register Bk.

*ARI ++;ARI is incremented each time until it points to the bottom of the circular buffer. After that it will point to the top of the buffer.

Page 30: Microprocessors

REPEAT INSTRUCTION

A block of instruction is repeated ‘count’ number of times using RPTB. RC contains the count number.

LDI 8, RCRPTB Label 1CALL filterFIX RO

Label1 STI RO, * AR3

RPTB instruction repeats next instruction ‘count’ number of times

Page 31: Microprocessors

PARALLEL INSTRUCTION

The symbol ‘||’ indicated parallel operationLDF 0, ROLDI 29, AR2RPTS AR2MPYF *ARO++, *AR1++, R0

|| ADDF RO, R2, R2

MPYF ---> Multiply Floating point numberOld value

New Value

Parallel operation

Page 32: Microprocessors

DELAYED BRANCHConditional or unconditional delayed branch allows the subsequent 3 instruction to be fetched and executed. This gives the effect of single cycle branch.

BD Loop; Delayed BranchADDF R0, R1 } FIX R1 } executed whetherSTI R1, *AR3 } branch is taken or notLoop

Standard branches empty the pipeline before branching. This results in taking 4 cycles to execute branch.

Page 33: Microprocessors

DSP CHIPS• Analog Devices ADSP 2100, 21020• AT&T DSP 16. 32• DSP semiconductors Pine 16 bit fixed point• Motorola 56100, 96000• NEC uPD 77C25 (16 bit fixed pt)• 77220 (24 bit fixed pt)• SGS Thomson ST 18 ( 16 bit fixed point)• Start semiconductor SPROC 1000 24 bit fixed point• Texas Instruments TMS3201x, 2x, 3x, 4x, 80, 6xx• Zilog Z89 Cxx 16 bit fixed DSP• Xilinx DSP FPGA

Page 34: Microprocessors

MARKET SHARE

• TI 46.7%• AT&T 18.7%• MOTOROLA 15%• AD 9.3%• NEC 8.4%• OTHER 1.9%

Page 35: Microprocessors

DSP Vs Microcontroller

Microcontroller Digital Signal Processor

• Multicycle instruction set. Single cycle inst. set.

• Multicycle multiplicity. Single cycle multiply.• 8 or 16 bit support. 16/32 bit fixed or floating.• Limited onchip RAM. Large on chip data RAM.

• Limited data pointers. Data pointers.• Limited BW and limited algorithms. Speed!

Page 36: Microprocessors

Present Day ApplicationsPresent Day Applications

Consumer AudioConsumer AudioStereo A/D, D/AStereo A/D, D/A

PLLPLLMixersMixers

MultimediaMultimediaStereo audioStereo audio

ImagingImagingGraphics paletteGraphics palette

Voltage regulationVoltage regulation

Wireless / CellularWireless / CellularVoiceVoice--band audioband audio

RF codecsRF codecsVoltage regulationVoltage regulation

HDDHDDPRML read channelPRML read channel

MR preMR pre--ampampServo controlServo control

SCSI tranceiversSCSI tranceivers

AutomotiveAutomotiveDigital radio A/D/ADigital radio A/D/AActive suspensionActive suspensionVoltage regulationVoltage regulation

DTADDTADSpeech synthesizerSpeech synthesizer

MixedMixed--signalsignalprocessorprocessor

DSP:DSP:TechnologyTechnology

EnablerEnabler

Page 37: Microprocessors

System ConsiderationsSystem Considerations

PerformancePerformanceInterfacingInterfacingPowerPower

SizeSize

EaseEase--of Useof Use•• ProgrammingProgramming•• InterfacingInterfacing•• Debugging Debugging

IntegrationIntegration•• MemoryMemory•• PeripheralsPeripherals

CostCost•• Device costDevice cost•• System costSystem cost•• Development costDevelopment cost•• Time to market Time to market

Page 38: Microprocessors

Different Needs? Multiple Families!Different Needs? Multiple Families!

C2000C2000(C20x/24x/28x)(C20x/24x/28x)

‘C1x ‘C2x‘C1x ‘C2x

C6000C6000(C62x/64x/67x)(C62x/64x/67x)

‘C3x ‘C4x ‘C8x‘C3x ‘C4x ‘C8x

Multi Channel and Multi Channel and Multi Function App'sMulti Function App'sComm InfrastructureComm InfrastructureWireless BaseWireless Base--stationsstationsDSLDSLImagingImagingMultiMulti--media Serversmedia ServersVideoVideo

Max Max PerformancePerformancewith with

Best Best EaseEase--ofof--UseUse

C5000C5000(C54x/55x)(C54x/55x)

‘C5x‘C5x

Lowest CostControl SystemsControl Systems

Motor ControlMotor ControlStorageStorageDigital Ctrl Systems

Lowest Cost

EfficiencyBest MIPS perBest MIPS per

Watt / Dollar / SizeWatt / Dollar / SizeWireless phonesWireless phonesInternet audio playersInternet audio playersDigital still cameras Digital still cameras ModemsModemsTelephonyTelephonyVoIP

Efficiency

Digital Ctrl Systems VoIP

Page 39: Microprocessors

'C6000 Block Diagram'C6000 Block Diagram

CPU

Internal BusesInternal Buses

PPEERRIIPPHHEERRAALLSS

InternalInternalMemoryMemory

ExternalExternalMemoryMemory

Page 40: Microprocessors

'C6000 System Block Diagram'C6000 System Block Diagram

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Regs (B

0R

egs (B0 -- B

15)B

15)

Regs (A

0R

egs (A0 -- A

15)A

15)

CPUCPU

PPEERRIIPPHHEERRAALLSS

Internal BusesInternal Buses

InternalInternalMemoryMemory

ExternalExternalMemoryMemory

Page 41: Microprocessors

What Problem Are We Trying To Solve?What Problem Are We Trying To Solve?

Digital sampling of Digital sampling of an analog signal:an analog signal:

A

tt

Most DSP algorithms can be Most DSP algorithms can be expressed with MAC:expressed with MAC:

countcount

i = 1i = 1Y = Y = ΣΣ aaii * x* xii

for (i = 1; i < count; i++){for (i = 1; i < count; i++){sum += m[i] * n[i]; }sum += m[i] * n[i]; }

DACDACxx YYADCADC DSPDSP

What does it take to do this fast … and easy?What does it take to do this fast … and easy?

Page 42: Microprocessors

Fastest Execution of MACsFastest Execution of MACsThe ‘C6x roadmap ... from 200 to 2400 MMACsThe ‘C6x roadmap ... from 200 to 2400 MMACs

Ease of C ProgrammingEase of C ProgrammingEven using natural C, the ‘C6000 Architecture can Even using natural C, the ‘C6000 Architecture can perform 2 to 4 MACs per cycleperform 2 to 4 MACs per cycleCompiler generates 80Compiler generates 80--100% efficient code100% efficient code

MultiplyMultiply--Accumulate (MAC) in Natural C CodeAccumulate (MAC) in Natural C Code

for (i = 0; i < count; i++){for (i = 0; i < count; i++){sum += m[i] * n[i]; }sum += m[i] * n[i]; }

Fast MAC using only CFast MAC using only C

How does the ‘C6000 achieve such performance from C?How does the ‘C6000 achieve such performance from C?

Page 43: Microprocessors

Sample Compiler BenchmarksSample Compiler BenchmarksGreat out-of-box experience Completely natural C code (non ’C6x specific)Code available at: www.ti.com/sc/c6000compilerVersus hand-coded assembly based on cycle count

How does the ‘C6000 achieve such performance from C?How does the ‘C6000 achieve such performance from C?

Page 44: Microprocessors

'C6000 Architecture: Built for Speed'C6000 Architecture: Built for Speed

A0A0

A31A31

....A15A15

....

.M1.M1.M1

.L1.L1.L1

.D1.D1.D1

.S1.S1.S1

.M2.M2.M2

.L2.L2.L2

.D2.D2.D2

.S2.S2.S2

B0B0

B31B31

....B15B15

....

Controller/DecoderController/DecoderController/Decoder

MemoryMemory‘C6000 Compiler ‘C6000 Compiler excels at excels at Natural CNatural CWhile While dualdual--MACMAC speeds speeds math intensive algorithms, math intensive algorithms, flexibility of 8 independent flexibility of 8 independent functional unitsfunctional units allows the allows the compiler to quickly perform compiler to quickly perform other types of processingother types of processingAll ‘C6000 instructions are All ‘C6000 instructions are conditionalconditional allowing efficient allowing efficient hardware pipelininghardware pipeliningInstruction set and CPU Instruction set and CPU hardware orthogonality allow hardware orthogonality allow the compiler to achieve 80the compiler to achieve 80--100% efficiency100% efficiency

Page 45: Microprocessors

Fastest MAC using Natural CFastest MAC using Natural C

;** ;** ----------------------------------------------------------------------------------------------------**LOOP:LOOP: ; PIPED LOOP KERNEL; PIPED LOOP KERNEL

LDDWLDDW .D1.D1 A4++,A7:A6A4++,A7:A6|||| LDDWLDDW .D2.D2 B4++,B7:B6B4++,B7:B6|||| MPYSPMPYSP .M1X.M1X A6,B6,A5A6,B6,A5|||| MPYSPMPYSP .M2X.M2X A7,B7,B5A7,B7,B5|||| ADDSPADDSP .L1.L1 A5,A8,A8A5,A8,A8|||| ADDSPADDSP .L2.L2 B5,B8,B8B5,B8,B8|| [A1]|| [A1] BB .S2.S2 LOOPLOOP|| [A1]|| [A1] SUBSUB .S1.S1 A1,1,A1A1,1,A1;** ;** ----------------------------------------------------------------------------------------------------**

float mac(float *m, float *n, int count)float mac(float *m, float *n, int count){ int i, float sum = 0;{ int i, float sum = 0;

for (i=0; i < count; i++) {for (i=0; i < count; i++) {sum += m[i] * n[i]; } …

A0A0

A31A31

....A15A15

....

.M1.M1.M1

.L1.L1.L1

.D1.D1.D1

.S1.S1.S1

.M2.M2.M2

.L2.L2.L2

.D2.D2.D2

.S2.S2.S2

B0B0

B31B31

....B15B15

....

Controller/DecoderController/DecoderController/Decoder

MemoryMemory

sum += m[i] * n[i]; } …

Page 46: Microprocessors

'C6000 System Block Diagram'C6000 System Block Diagram

ExternalExternalMemoryMemory

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

PPEERRIIPPHHEERRAALLSS

Internal BusesInternal Buses

InternalInternalMemoryMemory

Looking at the internal buses ...Looking at the internal buses ...

Page 47: Microprocessors

‘C6000 Internal Buses‘C6000 Internal Buses

PCPCProgram AddrProgram Addr x32x32

Program DataProgram Data x256x256

DMADMA

DMA AddrDMA Addr -- ReadReadDMA DataDMA Data -- ReadRead

DMA AddrDMA Addr -- WriteWriteDMA DataDMA Data -- WriteWrite

AAregsregs

BBregsregs

Data AddrData Addr -- T1T1 x32x32

Data DataData Data -- T1T1 x32/64x32/64

Data AddrData Addr -- T2T2 x32x32

Data DataData Data -- T2T2 x32/64x32/64

InternalInternalMemoryMemory

ExternalExternalMemoryMemory

PeripheralsPeripherals

Page 48: Microprocessors

'C6000 System Block Diagram'C6000 System Block Diagram

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

ExternalExternalMemoryMemory

Next, the internal memory ...Next, the internal memory ...

Page 49: Microprocessors

‘C6711 Memory‘C6711 Memory

FFFF_FFFFFFFF_FFFF

0000_00000000_000064KB Internal64KB Internal

OnOn--chip Peripheralschip Peripherals0180_00000180_0000

128MB External2

128MB External3

8000_00008000_00009000_00009000_0000A000_0000A000_0000B000_0000B000_0000

128MB External0

128MB External1

64K64KProg / DataProg / Data

(Level 2)(Level 2)CPUCPU

4K4KProgramProgramCacheCache

4K4KDataData

CacheCache

cache logic

cache details

Page 50: Microprocessors

‘C6711 Cache Logic‘C6711 Cache Logic

CPU requestsCPU requestsdatadata

Is data in L1?Is data in L1? Is data in L2?Is data in L2?Copy DataCopy Data

fromfromExternal MemExternal Mem

to L2to L2

Copy DataCopy Datafrom L2 to L1from L2 to L1

Send DataSend Datato CPUto CPU

NoNo

YesYesYesYes

NoNo

Page 51: Microprocessors

‘C6711 Cache Details‘C6711 Cache Details

Level 1 ProgramLevel 1 Program•• Always cacheAlways cache•• 1 way cache 1 way cache

(direct mapped)(direct mapped)•• Zero waitZero wait--statestate•• Line size:Line size: 512 bits512 bits

(or 16 instr)(or 16 instr)Level 1 DataLevel 1 Data•• Always cacheAlways cache•• 2 way cache2 way cache•• Zero waitZero wait--statestate•• Line size:Line size: 256 bits256 bits

Level 2Level 2•• Unified (prog or data)Unified (prog or data)•• RAM or cacheRAM or cache•• 11--4 way cache4 way cache•• 32 data bytes in 4 cycles32 data bytes in 4 cycles•• 16 instr. in 5 cycles16 instr. in 5 cycles•• Line Size:Line Size: 1024 bits1024 bits

(or 128 bytes)(or 128 bytes)

CPU

L1Prog(4KB)

L1Data(4KB)

L2L2UnifiedUnified

(64KB)(64KB)

256256

8/16/32/648/16/32/64

128128

256256

Page 52: Microprocessors

'C6000 System Block Diagram'C6000 System Block Diagram

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

PPEERRIIPPHHEERRAALLSS

Internal BusesInternal Buses

InternalInternalMemoryMemory

ExternalExternalMemoryMemory

Looking at each peripheral ...Looking at each peripheral ...

Page 53: Microprocessors

'C6000 Peripherals'C6000 Peripherals

ExternalExternalMemoryMemory

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

McBSP’sMcBSP’sUtopiaUtopia

GPIOGPIO

VCPVCPTCPTCP

DMA, EDMADMA, EDMA(Boot)(Boot)

TimersTimers

PLLPLL

XB, PCI,XB, PCI,Host PortHost Port

EMIFEMIF

Page 54: Microprocessors

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

SDRAMSDRAM

AsyncAsync

SBSRAMSBSRAM

EMIFEMIF

EMIFEMIF

External Memory Interface (EMIF)Glueless access to async/sync memoryWorks with PC100 SDRAM (cheap, fast, and easy!)Byte-wide data access16, 32, or 64-bit bus widths

External Memory Interface (External Memory Interface (EMIFEMIF))Glueless access to async/sync memoryGlueless access to async/sync memoryWorks with PC100 SDRAM (cheap, fast, and easy!)Works with PC100 SDRAM (cheap, fast, and easy!)ByteByte--wide data accesswide data access16, 32, or 6416, 32, or 64--bit bus widthsbit bus widths

Page 55: Microprocessors

N/AN/A64M Bytes64M Bytes(16(16--bits wide)bits wide)C6712C6712

L1PL1P == 16 KB16 KBL1DL1D == 16 KB16 KBL2L2 == 1 MB1 MB

L1PL1P == 4 KB4 KBL1DL1D == 4 KB4 KBL2L2 == 64 KB64 KB

P = 384 KBP = 384 KBD = 512 KBD = 512 KB

P = 256 KBP = 256 KBD = 128 KBD = 128 KB

P = 64 KBP = 64 KBD = 64 KBD = 64 KB

InternalInternal

256M Bytes256M Bytes(64(64--bits wide)bits wide)

128M Bytes128M Bytes(32(32--bits wide)bits wide)

52M Bytes52M Bytes(32(32--bits wide)bits wide)

52M Bytes52M Bytes(32(32--bits wide)bits wide)

52M Bytes52M Bytes(32(32--bits wide)bits wide)

EMIF (A)EMIF (A)size of rangesize of range

64M Bytes64M Bytes(16(16--bits wide)bits wide)

C6414C6414C6415C6415C6416C6416

N/AN/AC6211C6211C6711C6711C6712C6712

N/AN/AC6203C6203

N/AN/AC6202C6202

N/AN/AC6201C6201C6204C6204C6205C6205C6701C6701

EMIFBEMIFBsize of rangesize of range

DevicesDevices

52M Bytes52M Bytes(32(32--bits wide)bits wide) N/AN/A

L1PL1P == 4 KB4 KBL1DL1D == 4 KB4 KBL2L2 == 64 KB64 KB

N/AN/A

Internal and External MemoryInternal and External Memory

Page 56: Microprocessors

ExternalExternalMemoryMemory

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

XBUS, PCI,XBUS, PCI,Host PortHost Port

EMIFEMIF

Parallel Peripheral InterfaceHPI: Dedicated, slave-only, async 16/32-bit bus allows

host-µP access to C6000 memoryXBUS: Similar to HPI but provides …

Master/slave and sync modesGlueless i/f to FIFOs (up to single-cycle xfer rate)

PCI: Standard 32-bit, 33MHz PCI interfaceThese interfaces provide means to bootstrap the C6000

Parallel Peripheral InterfaceParallel Peripheral InterfaceHPI:HPI: Dedicated, slaveDedicated, slave--only, async 16/32only, async 16/32--bit bus allows bit bus allows

hosthost--µµP access to C6000 memoryP access to C6000 memoryXBUS:XBUS: Similar to HPI but provides …Similar to HPI but provides …

Master/slave and sync modesMaster/slave and sync modesGlueless i/f to FIFOs (up to singleGlueless i/f to FIFOs (up to single--cycle xfer rate)cycle xfer rate)

PCI:PCI: Standard 32Standard 32--bit, 33MHz PCI interfacebit, 33MHz PCI interfaceThese interfaces provide means to bootstrap the C6000These interfaces provide means to bootstrap the C6000

HPI / XBUS / PCIHPI / XBUS / PCI

Page 57: Microprocessors

GPIOGPIO

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

GPIOGPIO

XB, PCI,XB, PCI,Host PortHost Port

EMIFEMIF

General Purpose Input/Output (GPIO)‘C64x provides 8 or 16 bits of general purpose bitwise I/OUse to observe or control the signal of a single-pin

General Purpose Input/Output (GPIO)General Purpose Input/Output (GPIO)‘‘C64x provides 8 or 16 bits of general purpose bitwise I/OC64x provides 8 or 16 bits of general purpose bitwise I/OUse to observe or control the signal of a singleUse to observe or control the signal of a single--pinpin

ExternalExternalMemoryMemory

Page 58: Microprocessors

McBSP and UtopiaMcBSP and Utopia

ExternalExternalMemoryMemory

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

McBSP’sMcBSP’sUtopiaUtopia

GPIOGPIO

XB, PCI,XB, PCI,Host PortHost Port

EMIFEMIF

Multi-Channel Buffered Serial Port (McBSP)2 (or 3) full-duplex, synchronous serial-portsUp to 100 Mb/sec performanceSupports multi-channel operation (T1, E1, MVIP, …)

Utopia (C64x)ATM connection50 MHz wide area network connectivity

MultiMulti--Channel Buffered Serial Port (Channel Buffered Serial Port (McBSPMcBSP))2 (or 3) f2 (or 3) fullull--duplex, synchronous serialduplex, synchronous serial--portsportsUp to 100 Mb/sec performanceUp to 100 Mb/sec performanceSSupportsupports multimulti--channel operation (T1, E1, MVIP, …)channel operation (T1, E1, MVIP, …)

Utopia (Utopia (C64xC64x))ATM connectionATM connection50 MHz wide area network connectivity50 MHz wide area network connectivity

Page 59: Microprocessors

DMA / EDMADMA / EDMA

ExternalExternalMemoryMemory

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

McBSP’sMcBSP’sUtopiaUtopia

GPIOGPIO

DMA, EDMADMA, EDMA(Boot)(Boot)

XB, PCI,XB, PCI,Host PortHost Port

EMIFEMIF

Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to another4 / 16 / 64 channels (transfer parameter sets)Transfers can be triggered by any interrupt (sync)Operates independent of CPUOn reset, provides bootstrap from memory

Direct Memory Access (DMA / EDMA) Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to anotherTransfers any set of memory locations to another4 / 16 / 64 channels (transfer parameter sets)4 / 16 / 64 channels (transfer parameter sets)Transfers can be triggered by any interrupt (sync)Transfers can be triggered by any interrupt (sync)Operates independent of CPUOperates independent of CPUOn reset, provides bootstrap from memoryOn reset, provides bootstrap from memory

Page 60: Microprocessors

Timer / CounterTimer / Counter

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

McBSP’sMcBSP’sUtopiaUtopia

GPIOGPIO

DMA, EDMADMA, EDMA(Boot)(Boot)

TimersTimers

XB, PCI,XB, PCI,Host PortHost Port

EMIFEMIF

Timer / CounterTwo (or three) 32-bit timer/countersCan generate interruptsBoth input and output pins

Timer / CounterTimer / CounterTwo (or three) 32Two (or three) 32--bit timer/countersbit timer/countersCan generate interruptsCan generate interruptsBoth iBoth input and output pinsnput and output pins

ExternalExternalMemoryMemory

Page 61: Microprocessors

VCP / TCP VCP / TCP ---- 3G Wireless3G Wireless

ExternalExternalMemoryMemory

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

McBSP’sMcBSP’sUtopiaUtopia

GPIOGPIO

VCPVCPTCPTCP

DMA, EDMADMA, EDMA(Boot)(Boot)

TimersTimers

XB, PCI,XB, PCI,Host PortHost Port

EMIFEMIFTurbo Coprocessor (TCP)Supports 35 data channels at 384 kbps3GPP / IS2000 Turbo coderProgrammable parameters include mode, rate and frame length

Viterbi Coprocessor (VCP)Supports >500 voice channels at 8 kbpsProgrammable decoder parameters include constraint length, code rate, and frame length

Turbo Coprocessor (TCP)Supports 35 data channels at 384 kbpsSupports 35 data channels at 384 kbps3GPP / IS2000 Turbo coder3GPP / IS2000 Turbo coderProgrammable parameters include mode, rate and frame lengthProgrammable parameters include mode, rate and frame length

Viterbi Coprocessor (VCP)Supports >500 voice channels at 8 kbpsSupports >500 voice channels at 8 kbpsProgrammable decoder parameters include constraint length, Programmable decoder parameters include constraint length, code rate, and frame lengthcode rate, and frame length

Page 62: Microprocessors

PLLPLL

ExternalExternalMemoryMemory

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

McBSP’sMcBSP’sUtopiaUtopia

GPIOGPIO

VCPVCPTCPTCP

DMA, EDMADMA, EDMA(Boot)(Boot)

TimersTimers

PLLPLL

XB, PCI,XB, PCI,Host PortHost Port

EMIFEMIF

PLLExternal clock multiplierReduces EMI and costPin selectable

PLLPLLExternal clock multiplierExternal clock multiplierReduces EMI and costReduces EMI and costPin selectablePin selectable

InputCLKIN

OutputCLKOUT1- Output rate of PLL- Instruction (MIP) rateCLKOUT2- 1/2 rate of CLKOUT1

InputInputCLKINCLKIN

OutputOutputCLKOUT1CLKOUT1-- Output rate of PLLOutput rate of PLL-- Instruction (MIP) rateInstruction (MIP) rateCLKOUT2CLKOUT2-- 1/2 rate of CLKOUT11/2 rate of CLKOUT1

Page 63: Microprocessors

'C6000 Peripherals'C6000 Peripherals

ExternalExternalMemoryMemory

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

Internal BusesInternal Buses

InternalInternalMemoryMemory

McBSP’sMcBSP’sUtopiaUtopia

GPIOGPIO

VCPVCPTCPTCP

DMA, EDMADMA, EDMA(Boot)(Boot)

TimersTimers

PLLPLL

XB, PCI,XB, PCI,Host PortHost Port

EMIFEMIF

Page 64: Microprocessors

C6000 RoadmapC6000 RoadmapPe

rfor

man

ce

Highest

Performance

Time

Software CompatibleSoftware CompatibleFloating PointFloating PointFloating Point

Multi-coreMultiMulti--corecore C64x™ DSP1.1 GHz

C64xC64x™™ DSPDSP1.1 GHz1.1 GHz

C64x™ DSPC64xC64x™™ DSPDSP2nd Generation2nd Generation

General General PurposePurpose C6414C6414C6414 C6415C6415C6415 C6416C6416C6416

MediaMediaGatewayGateway

3G Wireless 3G Wireless InfrastructureInfrastructure

C6201C6201

C6701C6701

C6202C6202C6203C6203

C6211C6211C6711C6711

C6204C6204

1st Generation1st Generation

C6205C6205

C6712C6712

C62xC62x™™

C67xC67x™™

Page 65: Microprocessors

P erf

orm

a nc e

Time

C67x

3 GFLOPS and beyond

C6712

600MFLOPS

C6711

900 MFLOPS

C6701

1 GFLOPS

150 MFLOPSC32

C31

C30

C33

’C6000 Floating’C6000 Floating--PointPoint

Page 66: Microprocessors

TI FloatingTI Floating--Point InnovationPoint Innovation

TI Floating Point TI Floating Point -- A History of Firsts:A History of Firsts:First commerciallyFirst commercially--successful floatingsuccessful floating--point DSP point DSP ‘C30 (1987)‘C30 (1987)First floatingFirst floating--point DSP with multiprocessing support point DSP with multiprocessing support ‘C40 (1991)‘C40 (1991)First $10 floatingFirst $10 floating--point DSP point DSP ‘C32 (1995)‘C32 (1995)First 1First 1--GFLOPS DSP GFLOPS DSP ‘C6701 (1998)‘C6701 (1998)First $5 floatingFirst $5 floating--point DSP point DSP ‘C33 (1999)‘C33 (1999)First 2First 2--level cache floatinglevel cache floating--point DSP point DSP ‘C6711 (1999)‘C6711 (1999)First to offer 600 MFLOPS for under $10First to offer 600 MFLOPS for under $10 ‘C6712 (2000)‘C6712 (2000)

Page 67: Microprocessors

What Problem Are We Trying To Solve?What Problem Are We Trying To Solve?

Digital sampling of Digital sampling of an analog signal:an analog signal:

A

tt

4040

i = 1i = 1Y = Y = ΣΣ aaii * x* xii

DACDACxx YYADCADC DSPDSP

Most DSP algorithms can be Most DSP algorithms can be expressed as:expressed as:

What are the two primary instructions?What are the two primary instructions?

Page 68: Microprocessors

The Core of DSP : Sum of ProductsThe Core of DSP : Sum of Products

MultMultMult

ALUALUALUMPYMPY a, x, proda, x, prodADDADD y, prod, yy, prod, y

y =y =4040

∑∑ aann xxnnn = 1n = 1

**

ALUAALLUU

.M.M.M

MPYMPY .M.M a, x, proda, x, prod.L.L.L ADDADD .L .L y, prod, yy, prod, y

Note:Note:You don’t have to You don’t have to specify functional specify functional units (.M or .L)units (.M or .L)

Where are the variables?Where are the variables?

The ’C6000The ’C6000Designed to Designed to

handle DSP’shandle DSP’smathmath--intensiveintensive

calculationscalculations

Page 69: Microprocessors

Working Variables : The Register FileWorking Variables : The Register File

Register File ARegister File Ay =y = ∑∑ aann xxnn

n = 1n = 1**

MPYMPY .M.M a, x, proda, x, prodADDADD .L.L y, prod, yy, prod, y

4040

aaxx

prodprodyy......

16 re

gist

ers

16 re

gist

ers

.M.M.M

.L.L.L

3232--bitsbits

How are the number of iterations specified?How are the number of iterations specified?

Page 70: Microprocessors

Loops: Coding on a RISC ProcessorLoops: Coding on a RISC Processor

1.1. Program flow: Program flow: the branch instructionthe branch instruction

2.2. Initialization: Initialization: setting the loop countsetting the loop count

3.3. Decrement: Decrement: subtract 1 from the loop countersubtract 1 from the loop counter

B loop B loop

SUB cnt, 1, cnt SUB cnt, 1, cnt

MVK 40, cnt MVK 40, cnt

Page 71: Microprocessors

The “S” Unit : For Standard OperationsThe “S” Unit : For Standard Operations

.M.M.M

.L.L.L

.S.S.S

Register File ARegister File A

3232--bitsbits

16 re

gist

ers

16 re

gist

ers

aaxx

prodprodyy......

cntcnt

y =y =4040

∑∑ aann xxnnn = 1n = 1

**

MVKMVK .S.S 40, cnt40, cntloop:loop:

MPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cntBB .S.S looploop

How is the loop terminated?How is the loop terminated?

Page 72: Microprocessors

Conditional Instruction ExecutionConditional Instruction Execution

To minimize branching, To minimize branching, allall instructions are conditionalinstructions are conditional

[condition][condition] BB looploop

Code SyntaxCode Syntax Execute if:Execute if:[ cnt ][ cnt ] cnt cnt ≠≠ 00[ !cnt ][ !cnt ] cnt = 0cnt = 0

Execution based on [zero/nonExecution based on [zero/non--zero] value of specified variablezero] value of specified variable

Note: if condition is false, execution replaced with nopNote: if condition is false, execution replaced with nop

Page 73: Microprocessors

Loop Control via Conditional BranchLoop Control via Conditional Branch

.M.M.M

.L.L.L

.S.S.S

Register File ARegister File A

3232--bitsbits

aaxx

prodprodyy......

cntcnt

y =y =4040

∑∑ aann xxnnn = 1n = 1

**

MVKMVK .S.S 40, cnt40, cntloop:loop:

MPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cnt

[cnt][cnt] BB .S.S looploop

How are the a and x array values brought in from memory?How are the a and x array values brought in from memory?

Page 74: Microprocessors

Memory Access via “.D” UnitMemory Access via “.D” Unit

.M

16 re

gist

ers

16 re

gist

ers

Register File ARegister File Aaaxx

prodprodyy

cntcnt

*ap*ap*xp*xp*yp*yp

.D .D .D

.M .M

.L .L .L

.S .S .S y =y =

4040

∑∑ aann xxnnn = 1n = 1

**

MVKMVK .S.S 40, cnt40, cntloop:loop:

LDHLDH .D.D *ap , a*ap , aLDHLDH .D.D *xp , x*xp , xMPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cnt

[cnt][cnt] BB .S.S looploop

Data Memory:Data Memory:x(40), a(40), yx(40), a(40), y How do we increment through the arrays?How do we increment through the arrays?

Page 75: Microprocessors

AutoAuto--Increment of PointersIncrement of Pointers

Register File ARegister File Aaaxx

prodprodyy

cntcnt

*ap*ap*xp*xp*yp*yp

y =y =4040

∑∑ aann xxnnn = 1n = 1

**

MVKMVK .S.S 40, cnt40, cntloop:loop:

LDHLDH .D.D *ap*ap++++, a, aLDHLDH .D.D *xp*xp++++, x, xMPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cnt

[cnt][cnt] BB .S.S looploop

How do we store results back to memory?How do we store results back to memory?

.M .M .M

.L .L .L

.S .S .S

Data Memory:Data Memory:x(40), a(40), yx(40), a(40), y

.D .D .D

16 re

gist

ers

16 re

gist

ers

Page 76: Microprocessors

Storing Results Back to MemoryStoring Results Back to Memory

Register File ARegister File Aaaxx

prodprodyy

cntcnt

*ap*ap*xp*xp*yp*yp

.M .M .M

.L .L .L

.S .S .S

Data Memory:Data Memory:x(40), a(40), yx(40), a(40), y

.D .D .D

y =y =4040

∑∑ aann xxnnn = 1n = 1

**

MVKMVK .S.S 40, cnt40, cntloop:loop:

LDHLDH .D.D *ap++, a*ap++, aLDHLDH .D.D *xp++, x*xp++, xMPYMPY .M.M a, x, proda, x, prodADDADD .L .L y, prod, yy, prod, ySUBSUB .L.L cnt, 1, cntcnt, 1, cnt

[cnt][cnt] BB .S.S looploopSTWSTW .D.D y, *ypy, *yp

But wait But wait -- that’s only half the story...that’s only half the story...

Page 77: Microprocessors

Dual Resources : Twice as NiceDual Resources : Twice as Nice

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

A15A15

A5A5A6A6A7A7

aann

xxnn

prdprdsumsum

cntcnt

....

*a*a*x*x*y*y

.M1.M1.M1

.L1.L1.L1

.S1.S1.S1

.D1.D1.D1

.M2.M2.M2

.L2.L2.L2

.S2.S2.S2

.D2.D2.D2

Register File BRegister File BB0B0B1B1B2B2B3B3B4B4

B15B15

B5B5B6B6B7B7....

3232--bitsbits

........

3232--bitsbits

Our final view of the sum of products example...Our final view of the sum of products example...

Page 78: Microprocessors

‘C6000 System Block Diagram‘C6000 System Block Diagram

.D1.D1

.M1.M1

.L1.L1

.S1.S1

.D2.D2

.M2.M2

.L2.L2

.S2.S2

Register Set BRegister Set B

Register Set ARegister Set A

CPUCPU

PPEERRIIPPHHEERRAALLSS

Internal BusesInternal Buses

InternalInternalMemoryMemory

ExternalExternalMemoryMemory

To summarize each units’ instructions ...To summarize each units’ instructions ...

Page 79: Microprocessors

‘C62x RISC‘C62x RISC--like instruction setlike instruction set

No Unit UsedIDLEIDLENOPNOP

.S Unit.S UnitNEGNEGNOT NOT ORORSETSETSHLSHLSHRSHRSSHLSSHLSUBSUBSUB2SUB2XORXORZEROZERO

ADDADDADDKADDKADD2ADD2ANDANDBBCLRCLREXTEXTMVMVMVCMVCMVKMVKMVKHMVKH

.L Unit.L UnitNOTNOTORORSADDSADDSATSATSSUBSSUBSUBSUBSUBCSUBCXORXORZEROZERO

ABSABSADDADDANDANDCMPEQCMPEQCMPGTCMPGTCMPLTCMPLTLMBDLMBDMVMVNEGNEGNORMNORM

.M Unit.M UnitSMPYSMPYSMPYHSMPYH

MPYMPYMPYHMPYHMPYLHMPYLHMPYHLMPYHL

.D Unit.D UnitNEGNEGSTBSTB (B/H/W) (B/H/W) SUBSUBSUBABSUBAB (B/H/W) (B/H/W) ZEROZERO

ADDADDADDABADDAB (B/H/W)(B/H/W)LDBLDB (B/H/W)(B/H/W)

MVMV

.L .L .L

.D .D .D

.S .S .S

.M .M .M

Page 80: Microprocessors

‘C67x : Superset of Fixed‘C67x : Superset of Fixed--PointPoint

No Unit UsedIDLEIDLENOPNOP

.S Unit.S UnitNEGNEGNOT NOT ORORSETSETSHLSHLSHRSHRSSHLSSHLSUBSUBSUB2SUB2XORXORZEROZERO

ADDADDADDKADDKADD2ADD2ANDANDBBCLRCLREXTEXTMVMVMVCMVCMVKMVKMVKHMVKH

ABSSPABSSPABSDPABSDPCMPGTSPCMPGTSPCMPEQSPCMPEQSPCMPLTSPCMPLTSPCMPGTDPCMPGTDPCMPEQDPCMPEQDPCMPLTDPCMPLTDPRCPSPRCPSPRCPDPRCPDPRSQRSPRSQRSPRSQRDPRSQRDPSPDPSPDP

.L Unit.L UnitNOTNOTORORSADDSADDSATSATSSUBSSUBSUBSUBSUBCSUBCXORXORZEROZERO

ABSABSADDADDANDANDCMPEQCMPEQCMPGTCMPGTCMPLTCMPLTLMBDLMBDMVMVNEGNEGNORMNORM

ADDSPADDSPADDDPADDDPSUBSPSUBSPSUBDPSUBDPINTSPINTSPINTDPINTDPSPINTSPINTDPINTDPINTSPRTUNCSPRTUNCDPTRUNCDPTRUNCDPSPDPSP

.M Unit.M UnitSMPYSMPYSMPYHSMPYH

MPYMPYMPYHMPYHMPYLHMPYLHMPYHLMPYHL

MPYSPMPYSPMPYDPMPYDPMPYIMPYIMPYIDMPYID

.D Unit.D UnitNEGNEGSTBSTB (B/H/W) (B/H/W) SUBSUBSUBAB SUBAB (B/H/W) (B/H/W) ZEROZERO

ADDADDADDABADDAB (B/H/W)(B/H/W)LDBLDB (B/H/W)(B/H/W)LDDWLDDWMVMV

.L .L .L

.D .D .D

.S .S .S

.M .M .M

Page 81: Microprocessors

‘C64x ‘C64x →→ Superset of ‘C62xSuperset of ‘C62x

.L .L .L

.S .S .S

.D .D .D

.M .M .M

.S Unit.S UnitPACK2PACK2PACKH2PACKH2PACKLH2PACKLH2PACKHL2PACKHL2UNPKHU4UNPKHU4UNPKLU4UNPKLU4SWAP2SWAP2SPACK2SPACK2SPACKU4SPACKU4

SADD2SADD2SADDUS2SADDUS2SADD4SADD4ANDNANDNSHR2SHR2SHRU2SHRU2SHLMBSHLMBSHRMBSHRMB

CMPEQ2CMPEQ2CMPEQ4CMPEQ4CMPGT2CMPGT2CMPGT4CMPGT4BDECBDECBPOSBPOSBNOPBNOPADDKPCADDKPC

.L Unit.L UnitSHLMBSHLMBSHRMBSHRMBMVK(5MVK(5--bit)bit)

ABS2ABS2ADD2ADD2ADD4ADD4MAXMAXMINMINSUB2SUB2SUB4SUB4SUBABS4SUBABS4ANDNANDN

PACK2PACK2PACKH2PACKH2PACKLH2PACKLH2PACKHL2PACKHL2PACKH4PACKH4PACKL4PACKL4UNPKHU4UNPKHU4UNPKLU4UNPKLU4SWAP2/4SWAP2/4

.D Unit.D UnitLDDWLDDWLDNWLDNWLDNDWLDNDWSTDWSTDWSTNWSTNWSTNDWSTNDWMVK(5MVK(5--bit)bit)

ADD2ADD2SUB2SUB2ANDANDANDNANDNORORXORXORADDADADDAD

.M .M .M

.M Unit.M UnitMVDMVDBITC4BITC4BITRBITRDEALDEALSHFLSHFLMPYHIMPYHIMPYLIMPYLIMPYHIRMPYHIRMPYLIRMPYLIR

AVG2AVG2AVG4AVG4ROTLROTLSSHVLSSHVLSSHVRSSHVRBITC4BITC4BITRBITRDEALDEALSHFLSHFL

MPY2/SMPY2MPY2/SMPY2DOTP2DOTP2DOTPN2DOTPN2DOTPRSU2DOTPRSU2DOTPNRSU2DOTPNRSU2DOTPU4DOTPU4DOTPSU4DOTPSU4GMPY4GMPY4XPND2/4XPND2/4

DoubleDouble--sizesizeRegister setsRegister sets

(A16(A16--A31)A31)(B16(B16--B31)B31)

Advanced Advanced Instruction Instruction

PackingPacking(minimizes(minimizes

codecode--size)size)

Advanced Advanced EmulationEmulationFeaturesFeatures

Page 82: Microprocessors

Different Needs? Multiple Families!Different Needs? Multiple Families!

C2000C2000(C20x/24x/28x)(C20x/24x/28x)

‘C1x ‘C2x‘C1x ‘C2x

C6000C6000(C62x/64x/67x)(C62x/64x/67x)

‘C3x ‘C4x ‘C8x‘C3x ‘C4x ‘C8x

Multi Channel and Multi Channel and Multi Function App'sMulti Function App'sComm InfrastructureComm InfrastructureWireless BaseWireless Base--stationsstationsDSLDSLImagingImagingMultiMulti--media Serversmedia ServersVideoVideo

Max Max PerformancePerformancewith with

Best Best EaseEase--ofof--UseUse

C5000C5000(C54x/55x)(C54x/55x)

‘C5x‘C5x

Lowest CostControl SystemsControl Systems

Motor ControlMotor ControlStorageStorageDigital Ctrl Systems

Lowest Cost

EfficiencyBest MIPS perBest MIPS per

Watt / Dollar / SizeWatt / Dollar / SizeWireless phonesWireless phonesInternet audio playersInternet audio playersDigital still cameras Digital still cameras ModemsModemsTelephonyTelephonyVoIP

Efficiency

Digital Ctrl Systems VoIP

Page 83: Microprocessors

C6000 RoadmapC6000 RoadmapPe

rfor

man

ce

Highest

Performance

Time

Software CompatibleSoftware CompatibleFloating PointFloating PointFloating Point

Multi-coreMultiMulti--corecore C64x™ DSP1.1 GHz

C64xC64x™™ DSPDSP1.1 GHz1.1 GHz

C64x™ DSPC64xC64x™™ DSPDSP2nd Generation2nd Generation

General General PurposePurpose C6414C6414C6414 C6415C6415C6415 C6416C6416C6416

MediaMediaGatewayGateway

3G Wireless 3G Wireless InfrastructureInfrastructure

C6201C6201

C6701C6701

C6202C6202C6203C6203

C6211C6211C6711C6711

C6204C6204

1st Generation1st Generation

C6205C6205

C6712C6712

C62xC62x™™

C67xC67x™™

Page 84: Microprocessors

For More Information . . .For More Information . . .Website:Website: www.ti.comwww.ti.com

dspvillage.comdspvillage.comFTP:FTP: ftp://ftp.ti.com/pub/tms320bbsftp://ftp.ti.com/pub/tms320bbsFAQ:FAQ: http://wwwhttp://www--k.ext.ti.com/sc/technical_support/knowledgebase.htm k.ext.ti.com/sc/technical_support/knowledgebase.htm

Device informationDevice information TI & METI & MEApplication notesApplication notes News and eventsNews and eventsTechnical documentationTechnical documentation TrainingTraining

InternetInternet

Phone:Phone: 972972--644644--55805580Email:Email: [email protected]@ti.com

Information and support for Information and support for allall TI Semiconductor products/toolsTI Semiconductor products/toolsSubmit Submit suggestionssuggestions and errata for tools, silicon and documentsand errata for tools, silicon and documents

USA USA -- Product Information Center ( PIC )Product Information Center ( PIC )

Software Registration/Upgrades:Software Registration/Upgrades: 972972--293293--50505050Hardware Repair/Upgrades:Hardware Repair/Upgrades: 281281--274274--22852285Enroll in Technical Training:Enroll in Technical Training: www.ti.com/sc/trainingwww.ti.com/sc/training

(choose (choose Design WorkshopsDesign Workshops))

Other ResourcesOther Resources

Page 85: Microprocessors

Key C6000 ManualsKey C6000 Manuals

HardwareHardwareSPRU189SPRU189 -- CPU and Instruction Set Ref. GuideCPU and Instruction Set Ref. GuideSPRU190SPRU190 -- Peripherals Ref. GuidePeripherals Ref. GuideSPRU401SPRU401 -- Peripherals Chip Support Lib. Ref.Peripherals Chip Support Lib. Ref.SoftwareSoftwareSPRU198SPRU198 -- Programmer’s GuideProgrammer’s GuideSPRU303SPRU303 -- C6000 DSP/BIOS User’s GuideC6000 DSP/BIOS User’s GuideCode GenerationCode GenerationSPRU186SPRU186 -- Assembly Language Tools User’s GuideAssembly Language Tools User’s GuideSPRU187SPRU187 -- Optimizing C Compiler User’s GuideOptimizing C Compiler User’s Guide

Refer to the Refer to the C6000 Family UpdateC6000 Family Update handout for full listhandout for full list

Page 86: Microprocessors

Looking for Literature on DSP?Looking for Literature on DSP?

“A Simple Approach to Digital Signal Processing”“A Simple Approach to Digital Signal Processing”by Craig Marven and Gillian Ewers; by Craig Marven and Gillian Ewers; ISBN 0ISBN 0--47114711--52435243--99

“DSP Primer (Primer Series)”“DSP Primer (Primer Series)”by C. Britton Rorabaugh; by C. Britton Rorabaugh; ISBN 0ISBN 0--07050705--40044004--77

“A DSP Primer : With Applications to Digital Audioand Computer Music” by Ken Steiglitz; ISBN 0-8053-1684-1

“DSP First : A Multimedia Approach”James H. McClellan, Ronald W. Schafer, Mark A. Yoder;ISBN 0-1324-3171-8

Page 87: Microprocessors

Looking for Literature on ‘C6000 DSP?Looking for Literature on ‘C6000 DSP?

“Digital Signal Processing Implementation “Digital Signal Processing Implementation using the TMS320C6000TM DSP Platform”using the TMS320C6000TM DSP Platform”

by Naim Dahnoun; ISBN 0201by Naim Dahnoun; ISBN 0201--6191661916--44

“C6x“C6x--Based Digital Signal Processing”Based Digital Signal Processing”by Nasser Kehtarnavaz and Burc Simsek;by Nasser Kehtarnavaz and Burc Simsek;ISBN 0ISBN 0--1313--088310088310--77

Page 88: Microprocessors

Embedded System Design

MPUFPGA

ASICFPGA

Page 89: Microprocessors

Embedded System Design

MPUFPGA

Page 90: Microprocessors

Microprocessor Unit (MPU)Simple – not much area of FPGA

Probably not a 32-bit RISCMaybe an 8-bit or 16-bit stack-based design

FastSingle clock cycle instructions where possible

Easy to programProbably not C or C++Maybe FORTH or WHYP

Page 91: Microprocessors

Embedded System Design

MPUFPGA

ASICFPGA

Goal: Design entire embedded system as a single FPGA.Use VHDL to design all hardware including the MPU.Write the MPU software in WHYP and compile to VHDL.

Page 92: Microprocessors

Potential AdvantagesMinimize overall system costMinimize development timeMinimize per unit cost

Implement only the hardware and software needed for a particular design

Page 93: Microprocessors

Xilinx Spartan FPGAs

Page 94: Microprocessors

Xilinx XC4000E FPGAs

Page 95: Microprocessors

Xilinx Spartan-II FPGAs