Top Banner

of 29

DSP Processors: VLIW processors DSP Processors: VLIW processors Ingrid Verbauwhede Department of Electrical

May 16, 2020

ReportDownload

Documents

others

  • 1 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    DSP Processors: VLIW processors

    Ingrid Verbauwhede Department of Electrical Engineering University of California Los Angeles

    ingrid@ee.ucla.edu

  • 2 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    References

    • I. Verbauwhede, C. Nicol, “Low Power DSP’s for Wireless Communications,” Proceedings ISLPED 2000, pg. 303-310.

    • P. Faraboschi, G. Desoli, J. Fisher, “The latest word in Digital and Media Processing,” IEEE Signal Processing Magazine, March 1998, pg. 59-85

    • T. Kumura, M. Ikekawa, M. Yoshida, I. Kuroda, “VLIW DSP for Mobile Applications,” IEEE Signal Processing Magazine, Vol. 19, no. 4, July 2002, pg. 10-19

    • R. Kolagotla, et al., “High performance Dual-MAC DSP architecture,” IEEE Signal Processing Magazine, Vol. 19, no. 4, July 2002, pg. 42-53. This is the Blackfin!

  • 3 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    The DSP Market Splits

    Today’s general purpose assembly coded

    DSP

    Low cost, low power

    DSPs

    High Performance

    DSPs

    • 1-10 GOPS • 1-5 watts • < $50

    • 200-1000 MOPS • < 100 mW • $10

    • 100 MOPS • 250 mW • $40

    InfrastructureMobile Terminals

  • 4 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    BUT: DSP Software Development

    • Complex DSP architecture not amenable to compiler technology

    • Algorithms are modeled in high level language (e.g. C++)

    • Solutions are implemented and debugged in hand-optimized assembler - large development effort with minimal tool support

    HLL

    algorithmic

    model

    prototype

    code

    production

    code

    hand coded assembler

    optimize & debug

    Long, frustrating time to market

    Fragile legacy code

    Still used in handhelds, but change in basestations,

  • 5 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    Mobile Wireless Evolution

    SERVICE

    First Generation

    Mobile Telephone Service: Carphone

    Analog Cellular Technology

    Macrocellular Systems

    Past

    Second Generation

    Digital Voice + and Messaging/Data

    Services

    Fixed Wireless Loop

    Digital Cellular Technology + IN emergence

    Microcellular & Picocellular: capacity, quality

    Enhanced Cordless Technology

    Now

    Third Generation

    Integrated High Quality Audio and Data. Narrowband and Broadband Multimedia Services + IN integration

    Broader Bandwidth Efficient Radio Transmission

    Information Compression

    Higher Frequency Spectrum Utilization

    IN + Network Management integration

    Year 2000-2005

    Fourth Generation

    TelePresencing

    Education, training and dynamic information access

    Wireless- Wireline and Broadband Transparency

    Knowledge-Based Network Operations

    Unified Service Network

    Year 2010?

    TECHNOLOGY

    WCDMA UWC-136 TDMA cdma2000

    NMT TACS Analog AMPS

    GSM IS-54/ 136 TDMA IS-95/ cdmaOne PDC DECT

    We are entering the decade of wireless data communications - and World-War 3G

    Global roaming Courtesy: Chris Nicol: Bell Labs Australia

  • 6 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    Mobile Data Services

    • Carriers invest >$500 per subscriber but subscriber voice calls (and therefore revenues) are reducing.

    • Data currently 3% of wireless traffic - projected to >50% by 2005

    • Wireless Internet : Average internet connection 30 mins

    • Text Messaging: Saturating 2G voice networks

    2.5 Generation Mobile Standards [1] GPRS: Packet Data over GSM - timeslot multiplexing, multi-slots per user. EDGE: 8-PSK modulation + GPRS, 384 Kbps max to 1 user.

    3G - IMT2000 Proposals 144 Kbps Automobile, 384 Kbps Pedestrian, 2 Mbps stationary. Several Proposals - UWC 136 (200Khz, TDMA, 8-PSK = EDGE). UMTS, CDMA-2000 are both CDMA proposals.

    Courtesy: Chris Nicol: Bell Labs Australia

  • 7 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    Evolution of Mobile Wireless Network Architecture

    Base Stations

    Packet Mode

    Servers High Speed Data,

    Multimedia, Voice over IP,

    etc.

    Wireless Control Servers

    (Feature Control, Network Management,

    Billing, etc.)

    Radio Clients

    MSC

    BSC

    Internet / Advanced Services PSTN

    Circuit Mode

    Servers (Voice, Low Speed Data,

    etc.)

    PSTN

    Network Servers

    Mobile Switches

    Packet Connectivity (ATM / IP)

    2G Network IP-based 3G Network

    Mobile networks are being upgraded in preparation for the delivery of high speed data services. Courtesy: Chris Nicol: Bell Labs Australia

  • 8 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    2G Basestation Baseband Processing

    • Multiple DSPs used for baseband processing.

    • RISC Microcontroller for timing, framing, I/O control

    • Software upgradable over the network

    • DSPs dominate cost and power consumption

    DSP RISC Micro

    Controller

    I/O

    T1/E1

    DSP

    DSP

    DSP

    DSP

    DSP

    DSP

    DSP

    I/O

    I/O I/O ASIC

    DSP

    DSP

    AFE

    AFE

    Channel Equalization

    Channel De/coding Encryption

    RAM

    RAM

    Tx

    Tx Rx

    Rx

    Tx/Rx baseband processing board for 2-carrier GSM basestation

    Future trend - integrate baseband processing - low cost Pico BTS

    Courtesy: Chris Nicol: Bell Labs Australia

  • 9 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    High Performance DSP Requirements • Very high levels of DSP integer performance

    • Scalability to meet wide range of cost, power, performance.

    • Large memory and I/O bandwidth.

    • Friendly, compiler driven, programming environment.

    • Support for complex real-time synchronous applications (latency, predictable throughput, synchronization)

    • Cost & power efficient solution.

    100K

    10K

    1000

    100

    10 1997 1999 2001

    V.34

    GSM term

    ADSL 500k

    ADSL 6M

    24 ch. modem

    DAB rcvr16 HR

    GSM

    1G eth. xcvr

    set-top box

    MPEGII encode

    Soft radio

    3-D graphics?

    MOPS

    K56 PCS term

    tradit ional

    DSP

    3G Wireless

    Some DSP Applications

    Courtesy: Chris Nicol: Bell Labs Australia

  • 10 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    Compiler Driven VLIW

    Large orthogonal register set, regular interconnect

    Data memory

    Register Array

    Interconnect

    ex1 (alu)

    ex2 (alu)

    ex3 (mpy)

    ex4 (ld/st)

    exn (ld/st)

    cond/branch ex1 ex2 ex3 ….. exnInstruction format:

    Atomic RISC-like operations => heavily pipelined, high freq. clock

  • 11 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    Explicitly Parallel Instruction Computing

    Execution Clusters Data memory

    Register Array

    Interconnect

    ex1 (alu)

    ex4 (alu)

    ex5 (mpy)

    ex3 (ld/st)

    ex6 (ld/st)

    Register Array

    Interconnect

    ex2 (alu)

    Execution Sets

    1 1 1 0 1 0 1 0

    fetch set

    exec. set

  • 12 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    Texas Instruments ‘C6201

    ALU shift mpy add ALU shift mpy add

    Register Bank A (16 x 32)

    Register Bank B (16 x 32)

    Instruction Dispatch & Decode

    Program Memory (16K x 32)

    256

    Data Memory (32K x 16)

    8-way VLIW with two execution clusters

    256 bit (8x32) instruction fetch with variable length execute set

    Each 32 bit instruction individually predicated

    11 stage pipeline 1600 MIPS, 400 MMACs @ 200 MHz

  • 13 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    FIR Filter on TI ‘C6x

    loop: ldw .d1t1 *a4++,a5

    || ldw .d2t2 *b4++,b5 ||[b0] sub .s2 b0,1,b0 ||[b0] b .s1 loop || mpy .m1x a5,b5,a6 || mpyh .m2x a5,b5,b6 || add .l1 a7,a6,a7 || add .l2 b7,b6,b7

    • Outer Loop: 23 cycles, 180 bytes – 1 cycle in inner loop

    • All 8 exec units used in inner loop - maximum efficiency – 2 MACs per cycle

    Hand-coded assembly: 32-tap FIR filter

    Assembly syntax more difficult to learn. Hard to get full use of all 8 execution units at once. Software pipelining difficult to implement, and requires longer prolog/epilog (larger

    code size).

    Courtesy: Gareth Hughes: Bell Labs Australia

  • 14 EE201A, Spring 2003, Lecture 18, Ingrid Verbauwhede

    Viterbi on TI ‘C6x

    LOOP: [b1] b .s1 LOOP ||[b1] sub .s2 b1,1,b1 ||[!a2] sth .d1 b12,*+a6[8] ||[!a2] add .d2 b0,b14,b14 || cmpgt .l1 a11,a10,a1 || cmpgt .l2 b11,b10,b0 || mpy .m1x 1,b5,a4

    [a2] sub .s1 a2,1,a2 ||[!a2] sth .d1 a12,*a6++ ||[a1] add .s2 2,b0,b0 ||[b0] mpy .m2 1,b11,b12 || mpy .m1 1,a10,a12 || sub .l2x a7,b5,b10 || ldh .d2 *++b9,b5

    shl .s2 b14,2,b14 ||[a1] mpy .m1 1,a11,a12 || add .s1 a7,a4,a10 || sub .l1x b13,a4,a11 || add .l2 b13,b5,b11 || mpy .m2 1,b1