Top Banner
ICME 2004 © 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton University
61

ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Dec 30, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

ICME 2004 © 2004 Marilyn Wolf

Architectures for Multimedia Computing3: DSPs and Systems-on-Chips

Marilyn Wolf

Dept. of Electrical Engineering

Princeton University

Page 2: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

What is a DSP?

Traditional definition: Harvard architecture. Multiplier-accumulator.

Today: Marketing term (every CPU is a DSP).

Page 3: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

DSP goals

Real-time computation on streaming data. Low latency.

Small code size. Low energy/power.

Page 4: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

TI C54x architectural features

40-bit ALU + barrel shifter. Multiple internal busses: 1

instruction, 3 data, 4 address. 17 x 17 multiplier. Single-cycle exponent encoder. Two address generators with

dedicated registers.

Page 5: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

TI C54x instruction set features

Specialized instructions for Viterbi. Repeat and block repeat

instructions. Instructions that read 2, 3 operands

simultaneously. Conditional store. Fast return from interrupt.

Page 6: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Sample configurations

C5409: 100 MIPS @ 100 MHz. Address space sizes:

Data 64K words. Program 8M words.

On-board memory: RAM 16K words. ROM 32K words.

3 buffered serial ports. 1 timer.

C5510: 400 MIPS @ 200 MHz. Total address space:

8 M Words.

On-board memory: RAM 160K words. ROM 16K words.

3 McBSP serial ports. 2 timers.

Page 7: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Single-instruction repeat loop example

STM #4000h,AR2

; load pointer to source

STM #100h,AR3

; load pointer to destination

RPT #(1024-1)

MVDD *AR2+,*AR3+

; move

Page 8: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C55x organization

Instructionunit

Programflowunit

Addressunit

Dataunit

3 data read busses

3 data read address busses

program address bus

programread bus

2 data write busses

2 data write address busses

16

24

24

16

24

32

InstructionfetchData readfrom memory

D bus

Single operandread

C, D busses

Dual operandread

B bus

Dual-multiplycoefficientWrites

Page 9: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C55x low power features

Parallel execution units---longer idle shutdown times.

Multiple data widths: 16-bit ALU vs. 40-bit ALU.

Instruction caches minimizes main memory accesses.

Power management: Function unit idle detection. Memory idle detection. User-configurable IDLE domains allow

programmer control of what hardware is shut down.

Page 10: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C5x McBSP serial ports

Designed to interface to devices. Full, duplex; double-buffered transmission;

triple-buffered reception. Works with interrupts or DMA. Separate xmit, rcv clock, highly

programmable. -law, A-law companding. Direct interface to codecs, T1/E1 framers,

MVIP framers, etc.

Page 11: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

McBSP block diagram16-bit peripheral bus

Registersfor sync

control andmonitoring

Registers formultichannelcontrol andmonitoringco

mpa

nd

Data rcv, xmit clock

interrupt DMA

Clock/frame sync

Page 12: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Image/video hardware extensions

Available in 5509 and 5510. Equivalent C-callable functions for

other devices.

Available extensions: DCT/IDCT. Pixel interpolation Motion estimation.

Page 13: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

DCT/IDCT

2-D DCT/IDCT is computed from two 1-D DCT/IDCT. Put data in

different banks to maximize throughput.

block

Column DCT

interim DCTRowDCT

Page 14: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C55 DCT/IDCT coprocessor extensions

Load, compute, transfer to accumulators: ACy=copr(k8,ACx,Xmem,Ymem)

Compute, transfer, mem write: ACy=copr(k8,ACx,ACy), Lmem=ACz

Special: ACy=copr(k8,ACx,ACy)

Page 15: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Software pipelined load/compute/store for DCT

Iteration i-1

Dual_load

8compute

4Long_store

4 empty

3Dual_load

empty

Iteration i

Dual_load

8compute

4Long_store

4 empty

3Dual_load

empty

Iteration i+1

Dual_load

8compute

4Long_store

4 empty

3Dual_load

empty

op_i(0), load_i+1(0,1)op_i(1), store_i-1(0,1)op_i(2), store_i-1(2,3)op_i(2), store_i-1(4,5)op_i(2), store_i-1(6,7)op_i(2), load_i+1(2,3)…

Page 16: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C55 motion estimation

Search strategy: Full vs. non-full.

Accuracy: Full-pixel vs. half-pixel.

Number of returned motion vectors: 1 (one 16x16) vs. 4 (four 8x8).

Algorithms: 3-step algorithm (distance 4,2,1). 4-step algorithm (distance 8,4,2,1). 4-step with half-pixel refinement.

Page 17: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Four-step motion estimation breakdown

d = {8,4,2,1};

for (i=0; i<4; i++) {

compute 3 upper differences for d[i];

compute 3 middle differences for d[i];

compute 3 lower differences for d[i];

compute minimum value;

move to next d;

}

X

Page 18: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C55 motion estimation accelerator

Includes 3 16-bit pixel data paths, 3 16-bit absolute differences (ADs).

Basic operation: [ACx,ACy] =

copr(k8,ACx,ACy,Xmem,Ymem,Coeff) K8 = control bits (enable AD units, etc.) ACx, ACy = accumulated absolute differences Xmem, Ymem = pointers to odd, even lines of

the search window Pointer to two adjacent pixels from reference

window

Page 19: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C55 pixel interpolation

Given four pixels A, B, C, D, interpolate three half-pixels:

A B

C D

U

M R

Page 20: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Pixel interpolation coprocessor operations

Load pixels and compute: ACy=copr(k8,AC,Lmem)

Load pixels, compute, and store: ACy=copr(k8,AACx,Lmem) ||

Lmem=ACz

Page 21: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Motorola 56301

24-bit word. 80 MIPS @ 80 MHz, 100 MIPS @ 100 MHz. 24x24 multiplier, 56-bit barrel shifter. On-chip instruction cache controller. Nested hardware DO loops. On-chip memory:

RAM 1024 -> 4096 words. X data RAM 2048 -> 3072 words. Y data RAM 2048 -> 3072 words.

Page 22: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

TI C62/C67

Up to 8 instructions/cycle.32 32-bit registers.Function units:

Two multipliers.Six ALUs.

All instructions execute conditionally.

Page 23: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

TI C6x data operations

8/16/32-bit arithmetic.40-bit operations.Bit manipulation operations.

Page 24: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C6x system

On-chip RAM.32-bit external memory: SDRAM,

SRAM, etc.Host port.Multiple serial ports.Multichannel DMA.32-bit timer.

Page 25: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C6x block diagram

Data path 1/Reg file 1

Data path 2/Reg file 2

Execute DMA

timers

Serial

Program RAM/cache512K bits

Data RAM512K bits

JTAG

PLL

bus

Page 26: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C6x data paths

General-purpose register files (A and B, 16 words each).

Eight function units:.L1, .L2, .S1, .S2, .M1, .M2, .D1, .D2

Two load units (LD1, LD2).Two store units (ST1, ST2).Two register file cross paths (1X and 2X).Two data address paths (DA1 and DA2).

Page 27: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

C6x function units

.L 32/40-bit arithmetic. Leftmost 1 counting. Logical ops.

.S 32-bit arithmetic. 32/40-bit shift and 32-bit field. Branches. Constants.

.M 16 x 16 multiply.

.D 32-bit add, subtract, circular address. Load, store with 5/15-bit constant offset.

Page 28: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Motorola Starcore SC140

DALU includes 4 ALUs, 1 register file. AGU includes 2 address arithmetic

units (AAU), 1 address register file. Program sequencer and control unit

(PSEQ). Performance:

4 million MACs per second per MHz clock. 10 RISC MIPS per MHz clock.

Page 29: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Typical SC140 configuration

Level 1 memory expansionRAM, ROM

peripherals

DMA,Cache,

Interrupts,Level 2 mem,

Etc.

Program sequencer

ALUALUALUALU

AAUAAU

Page 30: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

SC140 Core

Programsequencer

Powermgt

Clock/PLL

AGU DALU

AddressRegister file

2 AAUs BMU

DataRegister file

4 ALUs

Page 31: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Instruction format

16-bit instructions. Up to 6 instructions per cycle. Instructions are grouped to define

allowable simultaneous operations.

MACR –D0,D1,D7 AND D4,D5 MOVE.L (R0),+N0,R6 ADDA R2,R3

DALU AGU

Page 32: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

AGU addressing

Allowable addressing modes: Linear. Modulo. Reverse-carry. Automatic updating during register

indirect. Stack.

Array addressing: base, offsett, modifier registers.

Page 33: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Motorola MC8126 multiprocessor

Four SC140 cores. 224 KB M1 memory with zero wait states.

Shared M2 memory. 476 KB running at core frequency. M2-accessible multi-core bus.

64/32-bit data, 32-bit address 60x bus. Eight-bank memory controller. Hardware semaphores.

Page 34: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

TM-1 characteristics

Characteristics Floating point

support Sub-word

parallelism support If Conversion Additional custom

operations

Page 35: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Trimedia TM-1

memory interface

video in

audio in

I2C

timers

image co-p

PCI

video out

audio out

serial

VLD co-p

VLIW CPU

Page 36: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

TM-1 VLIW CPU

register file

read/write crossbar

FU1 FU27

slot 1 slot 2 slot 3 slot 4 slot 5

...

Page 37: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Embedded vs. scientific applications

Embedded applications provide task-level parallelism.

Embedded applications run many different types of algorithms at once.

Page 38: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Scientific numerical calculation

Partition a large calculation across multiple processors:

CPU CPU CPU…

Page 39: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Task level parallelism

Map tasks on to multiprocessor:

PE 1 PE 2 PE 3+

application architecture

Page 40: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

What is an MPSoC?

Multiple processors. CPUs, DSPs, etc. Hardwired blocks. Mixed-signal.

Custom memory system. Lots of software.

Page 41: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

MPSoC applications

Sophisticated markets: High volume. Demanding performance, power

requirements. Strict price restrictions.

Often standards-driven. Multimedia is an ideal application.

Page 42: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Why multiple platforms?

People still care about cost.People care about power

consumption.Sufficiently general solutions don’t fit

on one chip.

Page 43: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

CD/MP3 player

AudioCPU

amp

Jogmemory

Errorcorrector

ServoCPU

Analogin

Analogout

FE, TE, amp

focus,tracking,sled,motor

head

drive

memory

memory

display

DAC

I2S

Page 44: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

CD mechanism

Laser, lens, sled:

lase

r

CD

detectorsdiffraction

gratingsled

track

track

focus

Page 45: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Servo control

Four main signals:focus (laser) @ 245 kHz;tracking (laser) @ 245 kHz;sled (motor): @ 800 Hz;Disc motor.

Optical pickup

Page 46: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Control and error correction

Skips caused by physical disturbance. Wait for disturbance to subside. Retry.

Read errors caused by disc/servo problems. Detect error. Choose location for retry. Retry. Fail and interpolate.

Page 47: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

DVD player

Yasuda, et al (Matsushita): two-chip solution for DVD back-end.Video MPEG 2 at Main Level (MP2@ML).Audio AC-3.External 16 Mb SDRAM.

Page 48: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

DVD back-end

MPEG2stream

demux

AC-3Audio decoder

MPEG2Video decoder

Subpicturedecoder

Colorencoder

+

SDRAM

Page 49: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Video decoder

Hostinterface

RISCcontroller

Micro-program

Streamparser

Macroblockpipeline

VideoOutput

core

PLL SDRAM interface

system

audio

Page 50: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Macroblock pipeline

Allowed processing time:[30 rows*45 macroblocks/row]/29.97 Hz

=24.71 s.

Three stages in pipeline:Variable-length decode.Inverse quantize/inverse DCT, motion

compenstation.Reconstruction combines IQ/IDCT + MC.

Page 51: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Viper set-top box SoC

MIPS Trimedia

Off-chip SDRAM

MC bridge TC bridge

Bus ctrlBus ctrl

Clocks,DMA,Reset,debug

I2C, Smcard

PCI

USB, 1394

MMIbus

MBS

2D AICP

MPEG

SPDIF

GPIO

C bridge

Page 52: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Bus requirements

Separate cache and register-access traffic.

Separate register-access traffic of the two CPUs.

Each CPU has a high-performance, low-latency path to memory.

Each CPU has low-latency access to its peripherals.

Page 53: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Bus structure

MMI bus goes to bulk memory. Point-to-point design. 64-bit, 143 MHz. Round-robin arbitration. Controller separates on-chip accesses.

PI bus for peripherals. Tri-state design. Compatible with existing IP for peripherals. Fast-PI for high-speed MIPS peripherals. MIPS-PI for generic MIPS peripherals. T-PI for Trimedia peripherals.

Page 54: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

TI OMAP

Targets communications, multimedia.

Multiprocessor with DSP, RISC.

C55x DSP

OMAP 5910:

ARM9

MMU

Memory ctrl

MPUinterface

SystemDMA

control

bridge

I/O

Page 55: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

ST Nomadik

Targets mobile multimedia.

A multiprocessor-of-multiprocessors.

ARM9

Mem

ory

syst

em

I/O

bri

dges

Audioaccelerator

Videoaccelerator

heterogeneousmultiprocessors

Page 56: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

ST MMDSP+

Embedded processor core used in multiple chips: Runs at 175 MHz. 1 cycle per instruction. 2-level instruction cache. 16/24-bit fixed point. 32-bit floating point. C programmed Fully synthesizable.

Page 57: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Nomadik video accelerator

MMDSP+data

RAMinstrRAM

Xbus

Interruptcontroller

Picturepost

processing

Videocodec

Pictureinput

processing

Localdatabus

MasterAHBDMA

Page 58: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Nomadik audio processor

SlaveAHB

Timers, GPIO, etc.

MMDSP+Y RAMX RAMInstrcache

ARMDMA

DMA1 DMA2

MasterAHB

X Bus

Y Bus

Page 59: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Nomadik support

Supports OMAPI standard. Standard includes mid-level driver

APIs.

Nomadik defines: Base operating system. Base drivers. Multimedia accelerators and their

drivers.

Page 60: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

FPGAs

Can be used to provide hardware acceleration. First-generation Replay.

Platform FPGAs provide CPU, FPGA fabric.

FPGA trade-offs: Low design cost. Higher incremental cost in large volumes. Higher power consumption than custom.

Page 61: ICME 2004© 2004 Marilyn Wolf Architectures for Multimedia Computing 3: DSPs and Systems-on- Chips Marilyn Wolf Dept. of Electrical Engineering Princeton.

Xilinx Virtex II Pro

Up to 4 PowerPC 405s per chip:

5 stage pipe, static branch prediction, etc.

Separate instruction, data caches.

MMU. Timers.

PowerPC

FPGA fabric