Top Banner
May 2005 Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors as listed in the references and have been used here for educational purpose only. ECE Department – University of Tehran Shahab adin Rahmanian TMS320C54x DSP processor
31

Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Dec 28, 2015

Download

Documents

Sheila Little
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

May 2005

Presented by:

Class Presentation ofCustom DSP Implementation Course on:

This is a class presentation. All data are copy rights of their respective authors as listed in the references and have been

used here for educational purpose only.

ECE Department – University of Tehran

Shahab adin Rahmanian

TMS320C54x DSPprocessor

Page 2: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Outline

• Introduction• Architecture• Applications• features• Instruction Set and addressing• FIR Filtering• Accelerating Polynomial Evaluation• Numerical Issues• Write code in C• Conclusion

Page 3: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Introduction

[2]

Page 4: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

TMS320C54x

• a fixed-point digital signal processor (DSP) in the TMS320 family.• Low power DSP : 0.54 mW/MIP• Acceleration for FIR and LMS filtering, code book search, polynomial

evaluation, Viterbi decoding ,Fast Fourier transform

[4]

Page 5: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Some Typical Applications• General-Purpose

– Adaptive filtering– Digital filtering– Fast Fourier transforms

• Control– Disk drive control– Laser printer control– Robotics control

• Military– Missile guidance– Radar processing– Secure communication

• Telecommunications– 1200- to 19200-bps modems– Adaptive equalizers– Cellular telephones– Echo cancellation– Video conferencing

Page 6: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Software Applications

• Circular Buffers• Single-Instruction Repeat (RPT) Loops• Extended-Precision Arithmetic

– Addition and Subtraction – Multiplication – Division– Square Root

• Floating-Point Arithmetic• Application-Oriented Operations

– Symmetric FIR Filters– Adaptive Filtering– Viterbi Algorithm for Channel Decoding

• Fast Fourier Transforms

Page 7: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Some key features • CPU

– Advanced multi bus architecture with three separate 16-bit data buses and one program bus

– 40-bit arithmetic logic unit (ALU), including a 40-bit barrel shifter and two independent 40-bit accumulators

– 17-bit × 17-bit parallel multiplier coupled to a 40-bit dedicated adder for non-pipelined single-cycle multiply/accumulate (MAC) operation

• Memory– 192K words × 16-bit maximum addressable

memory space (64K words program, 64K words data, and 64K words I/O)

– 28K words × 16-bit single-access on-chip ROM with 8K words configurable as program or data memory (’C541 only)

Page 8: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Some key features

• On-chip peripherals– On-chip phase-locked loop (PLL) clock

generator with internal oscillator or external clock source

– Two full-duplexed serial ports to support 8- and 16-bit transfers (’C541only)

– Time-division multiplexed (TDM) serial port (’C542/’C543 only)

– One 16-bit timer• Speed: 25/20-ns execution time for a single-

cycle fixed-point instruction (40 MIPS/50 MIPS) with 5-V power supply

Page 9: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

C54x Addressing Modes

ADD #0FFh

• Immediate– Operand is part of

the instruction

• Absolute– Address of operand

is part of the instruction

• Register– Operand is

specified in a register

LD *(LABEL), A

READA DATA;(data read from address in accumulator A)

Page 10: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

C54x Addressing Modes

ADD 010h,A

ADD *AR1

• Direct– Address of operand is

part of the instruction (added to implied memory page)

• Indirect– Address of operand is

stored in a register – Offset addressing– Register offset (ar1+ar0)– Autoincrement/

decrement– Bit reversed addressing– Circular addressing

ADD *AR1(10)

ADD *AR1+0

ADD *AR1+B

ADD *AR1+

ADD *AR1+0B

Page 11: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

C54X Instructions Set by Category

LogicalANDBITBITFCMPLCMPM

ORROLRORSFTASFTCSFTLXOR

ArithmeticADDMACMASMPYNEGSUB

ZERO

DataManagement

LDMAR

MV(D,K,M,P)ST

ProgramControl

BBC

CALLCC

IDLEINTRNOPRCRETRPT

RPTBRPTZTRAPXC

ApplicationSpecific

ABSABDSTDELAY

EXPFIRSLMSMAXMIN

NORMPOLYRNDSAT

SQDSTSQUR

SQURASQURS

NotesCMPL complement MAR modify address reg.CMPM compare memory MAS multiply and subtract

Page 12: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Block FIR Filtering

• y[n] = h0 x[n] + h1 x[n-1] + ... + hN-1 x[n-(n-1)]

– h stored as linear array of N elements (in prog. mem.)– x stored as circular array of N elements (in data mem.)

; Addresses: a4 h, a5 N samples of x, a6 input buffer, a7 output buffer; Modulo addressing prevents need to reinitialize regs each sample; Moving filter coefficients from program to data memory is not shownfirtask: ld #firDP,dp ; initialize data page pointer

stm #frameSize-1,brc ; compute 256 outputsrptbd firloop-1stm #N,bk ; FIR circular buffer sizeld *ar6+,a ; load input value to

accumulator bstl a,*ar4+% ; replace oldest sample

with newestrptz a,#(N-1) ; zero accumulator a,

do N tapsmac *ar4+0%,*ar5+0%,a; one tap, accumulate in a sth a,*ar7+ ; store y[n]

firloop: ret

Page 13: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Accelerating Symmetric FIR Filtering

x in twocircularbuffers

h inprogrammemory

• Coefficients in linear phase filters are either symmetric or anti-symmetric

• Symmetric coefficients using 2 mult’s 3 addsy[n] = h0 x[n] + h1 x[n-1] + h1 x[n-2] + h0 x[n-3] y[n] = h0 (x[n] + x[n-3]) + h1 (x[n-1] + x[n-2])

• Accelerated by FIRS (FIR Symmetric) instruction

Page 14: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Accelerating Symmetric FIR Filtering

; Addresses: a6 input buffer, a7 output buffer; a4 array with x[n-4], x[n-3], x[n-2], x[n-1] for N = 8; a5 array with x[n-5], x[n-6], x[n-7], x[n-8] for N = 8; Modulo addressing prevents need to reinitialize regs each samplefirtask: ld #firDP,dp ; initialize data page pointerstm #frameSize-1,brc ; compute 256 outputsrptbd firloop-1stm #N/2,bk ; FIR circular buffer sizeld *ar6+,b ; load input value to accumulator bmvdd *ar4,*a5+0% ; move old x[n-N/2] to new x[n-N/2-1]stl b,*ar4% ; replace oldest sample with newestadd *a4+0%,*a5+0%,a ; a = x[n] + x[n-N/2-1]rptz b,#(N/2-1) ; zero accumulator b, do N/2-1 tapsfirs *ar4+0%,*ar5+0%,coeffs ; b += a * h[i], do next amar *+a4(2)% ; to load the next newest samplemar *ar5+% ; position for x[n-N/2] sample sth b,*ar7+firloop:ret

Page 15: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Architecture - FIRS

Page 16: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Accelerating Polynomial Evaluation

• Function approximation and spline interpolation• Fast polynomial evaluation (N coefficients)

– y(x) = c0 + c1 x + c2 x2 + c3 x3 Expanded form

– y(x) = c0 + x (c1 + x (c2 + x (c3))) Horner’s form

– POLY reduces 2 N cycles using MAC+ADD to N cycles

; ar2 contains address of array [c3 c2 c1 c0]; poly uses temporary register t for multiplicand x; first two times poly instruction executes gives; 1. a = c(3) + x * 0 = c(3); b = c2; 2. a = c(2) + x * c(3); b = c1

ld *ar2+,16,b ; b = c3 << 16ld *ar3,t ; t = x (ar3 contains addr of x)rptz a,#3 ; a = 0, repeat next inst. 4

timespoly *ar2+ ; a = b + x*a || b = c(i-1) << 16sth a,*ar4 ; store result (ar4 is addr of y)

Page 17: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Integer Multiplication

• Does the user store the lower (1) or upper (8) result?– Both must be kept, resulting in additional resources (two

cycles ,words of code, and RAM locations) to complete the store.

– Worse, how can the double-sized result be used recursively as an input in later calculations, given that the multiplier inputs an input in later calculations, given that the multiplier inputs are single-width?

• Integer multiplication yields products larger than the inputs, as can be seen in the example below, using single digit decimal values as inputs:

Page 18: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Fractional Multiplication

• Multiplication of fractions yields products that never exceed the range of a fraction, as can be seen in the example below, using single digit decimal fractions as inputs:

• Don’t we still have a double sized result to store?– In this case, we can store just the upper result (.8) – This allows storage of result with fewer resources– Results may be used recursively

• Has accuracy been lost by dropping the lower accumulator value?

Page 19: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Accuracy vs. Precision

• Often the programmer wants to retain the fullest accuracy of a calculation, thus dropping the 16 LSB’s of the result in the previous example seems a bad choice.

• Note though, the inputs: how much accuracy do they offer?

• The product offers double precision but its’ accuracy is based on the single-width inputs.

• Thus, storing a single precision result is not only an efficient solution, but represents the limit of the accuracy of the result.

• The accumulator is double-sized for two reasons:– To allow for integer operations, which would

possibly require the LSB’s for the result.– So that sum-of-product operations will generate

accumulative noise at the 32nd vs. the 16th bit.

Page 20: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Redundant Sign Bit

• Multiplication of two signed numbers yields product with two sign bits• Extra sign bit causes problems if stored to memory as result:Wastes spaceCreates off-size Q• Solution: Fractional mode bit!• When FRCT (mode bit in ST1) is set, the multiplier output is left-shifted by one• For 16-bit ‘C54x:Q1 5*Q1 5=Q1 5

Page 21: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Accumulation• With fractions, we were able to guarantee that

no multiplicative overflow could occur, ie: F*F<=F.

• For addition, this rule does not apply, ie: F+F>F. • Therefore, we need additional measures to

manage the possibility of overflow for accumulation. Two general methods apply:– Guard Bits: the ‘C54x offers an 8-bit

extension above the high accumulator to allow valid representation of the result of up to 256 summations.

– Non-gain Systems: offer additional criteria that allow a simple solution for unlimited length summations.

Page 22: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Guard Bits and saturation

• Saturation (SAT)– SAT instruction saturates value exceeding

32-bit range in the selected accumulator:

• Guard Bits: the ‘C54x offers an 8-bit extension above the high accumulator to allow valid representation of the result of up to 256 summations.

SAT A

SAT B

Page 23: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Non-gain Systems

• Many systems can be modeled to have no DC gain:– Filters with low Q.– Any systems scaled by its’ maximum gain value.

• Input values from A/D converters are automatically fractions, if the limits of the A/D are presumed to be +/-1

• Coefficient values can similarly bonded by making the largest value the scaling factor for all other values.

• For these systems, it is known that the final value of the process is less than or equal to the input values.

• The accumulator therefore can be allowed to temporarilyoverflow, since the final result is known to be bonded +/-1.

• Allows maximum usage of selected A/D and D/A converters– D/A bits for gain are more expensive than using analog

components

Page 24: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Division• The ‘C54x does not have a single cycle 16-bit divide

instruction – Divide is a rare function in DSP– Division hardware is expensive

• The ‘C54x does have a single cycle 1-bit divide instruction: conditional subtract or SUBC– Preceded by RPT #15, a 16-bit divide is performed– Is much faster than without SUBC

• The SUBC process operates only on unsigned operands, thus software must:– Compare the signs of the input operands

• If they are alike, plan a positive quotient• If they differ, plan to negate (NEG) the quotient

– Strip the signs of the inputs– Perform the unsigned division– Attach the proper sign based on the comparison of the

inputs

Page 25: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Division Routine

B = num*den (tells sign)Strip sign of numerator

Strip sign of denominator

16 iterations1-bit divide

If result needs to be negative

Invert signStore negative result

Page 26: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Rounding

• Result of multiplication can be rounded for MPY,• and MAS operations. This is specified by appending the

instruction with an “R” suffix.• Example: MAC with rounding is MACR. Rounding consists of

adding 215 to the result and then clearing the low accumulator.

• In a long sum-of-products, only the last MAC operation should specify rounding:

•Rounding can also be achieved with a load operation:

Page 27: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Sign Extension (SXM)

Page 28: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Write code in C

• Inline Assembly– Allows direct access to assembly language from C– Useful for operating on components not used by

C, ex:

• Note: first column after leading quote is label field• Long operations should be written in ASM and called

from C– main C file retains portability– yields more easily maintained structures– eliminates risk of interfering with registers in use by C

Page 29: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Accessing MMRs from C

• Using pointers to access Memory-Mapped Registers:– Create a pointer and set its value to the assigned memory

address:

– Read and write to the register as any other pointer:

• Accessing I/O Ports from C– 1. create the port:– 2. access the port:

volatile unsigned int *SPC_REG = (volatile unsigned int *) 0x0022;

*SPC_REG=OxC8;

ioport unsigned port8000

x = port8000;port8000 = y;

Page 30: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

Summary and Conclusion

• C54x is a conventional digital signal processor– Separate data/program busses (3 reads & 1

write/cycle)– Extended precision accumulators– Single-cycle multiply-accumulate– Saturation and wraparound arithmetic– Bit-reversed and circular addressing modes

• C54x has instructions to accelerate algorithms– Communications: FIR & LMS filtering, Viterbi decoding– Speech coding: vector distances for code book search– Interpolation: polynomial evaluation

Page 31: Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.

References

[1] Texas instrument TMS320C54x DSP Design Workshop May 1997

[2] TMS320C54x User’s guide [3] www.ti.com[4] SIGNAL AND IMAGE PROCESSING ON THE

TMS320C54x DSP by Prof. Brian L. Evans [5] TMS320C54x Assembly Language Tools