What Choices Make A Killer Video Processor Architecture? Jonah Probell [email protected] Ultra Data Corp www.ultradatacorp.com
Mar 29, 2015
What Choices Make A Killer Video Processor Architecture?
Jonah [email protected]
Ultra Data Corpwww.ultradatacorp.com
© Copyright 2004 Jonah Probell slide 2
Outline• Overview of Ultra Data UD3000• Software programmability• Parallelism
– VLIW– SIMD– Multiprocessing
• Appropriate use of on- and off-chip memory– Optimal organization of data structures in DRAM
• Deterministic performance– 5-port regfile– 2-port on-chip memory– DMA controller instead of caches
© Copyright 2004 Jonah Probell slide 3
Nobody’s Video Decoder ChipSDRAM
high-speed interconnect
Video Video Decode Decode
ProcessorProcessor
Peripheral bus
bridge
Host / audio processor
SDRAM controller
Video post-processing
peripheral bus
Video outputS-video / raw 24-bit RGB or 8/16-bit
YCrCb
Audio outputI2S /
SPDIF / raw
I2C, SATA, timers
DVD optical interface
SATA & I2C busses Optics sled
Audio / Video DACs
© Copyright 2004 Jonah Probell slide 4
The Ultra Data UD3000
Outer Loop
Processor 0
Crossbar Switch Fabric
System Bus Bridge
Inner Loop Processor
1instruction extensions
Inner Loop Processor
0
Inner Loop Processor
2
Smart2-DDMA
Controller
2-portDMEM
2-portDMEM
… FIFOFIFO …Test
&Set
Outer Loop
Processor 1
instruction extensions
© Copyright 2004 Jonah Probell slide 5
H.264 Main Profile Decode
ILP 0
DMA ctrl
ILP 1
OLP 1
ILP 2
OLP 0 CABACCAVLC
interpolation
inverse transform
applydeltas
Deblockingthresholds
DeblockingFilter
load prediction source store block
© Copyright 2004 Jonah Probell slide 6
The Inner Loop Processor
Data Aligner
IMEM
Control Unit•32-bit RISC•Program Counter•Loads & Stores
Vector Unit•64-bit SIMD data•Multiply Acc•Data packing
3-portRegfile
5-portRegfile
Switch Fabric
32
32
32
64
© Copyright 2004 Jonah Probell slide 7
Video Codec StandardsITU-T
standards
ITU-T / MPEGjoint standards
MPEGstandards
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
H.261 H.263
H.262 / MPEG-2
H.264 / MPEG-4 Part 10 AVC
MPEG-1 MPEG-4
VP3On2 Technologies
standards
DivX Networks standard
DivX
VP4 VP5 VP6
Microsoft standardWindows
Media Video
© Copyright 2004 Jonah Probell slide 8
VLIW Parallelism
loadmultiplyloadmultiplyloadmultiplyshiftstoreaddbranch
sequential DSP program
program sequencer
regfile
data memory
ALU+ - x& | !
>> <<
load multiplyload multiplyload multiplystore shiftbranch add
VLIW DSP program
© Copyright 2004 Jonah Probell slide 9
SIMD Parallelism
frame of macroblocks
macroblock of pixels
8x8 block of pixels
4x4 block of pixels
© Copyright 2004 Jonah Probell slide 10
Multiprocessor Parallelism
video codec
motion estimation prediction
transform & compression
deblocking
software
system
CPU 0
CPU 1
CPU 2
hardware
symmetric parallel multiprocessing
video codec
motion estimation prediction
transform & compression
deblocking
software
system
CPU 0
CPU 1
CPU 2
hardware
pipelined multiprocessing
© Copyright 2004 Jonah Probell slide 11
Data Bandwidths
bitstream
source
SDRAM temporary data
storage
display devicevideo chip
© Copyright 2004 Jonah Probell slide 12
DRAM Optimal Data Ordering
DRAM
:
1k byte rows
Frame mapped to DRAM rows as a C-style two-dimentional
array
Frame mapped to DRAM rows as square groups
© Copyright 2004 Jonah Probell slide 13
Deterministic Performance
use of processor time
statistical-performanceprocessor (worst case)
statistical-performanceprocessor (best case)
deterministic-performancelatency-hiding processor
contention latency
transfer latency
processing
© Copyright 2004 Jonah Probell slide 14
The Inner Loop Processor
Data Aligner
IMEM
Control Unit•32-bit RISC•Program Counter•Loads & Stores
Vector Unit•64-bit SIMD data•Multiply Acc•Data packing
3-portRegfile
5-portRegfile
Switch Fabric
32
32
32
64
© Copyright 2004 Jonah Probell slide 15
The Ultra Data UD3000
Outer Loop
Processor 0
Crossbar Switch Fabric
System Bus Bridge
Inner Loop Processor
1instruction extensions
Inner Loop Processor
0
Inner Loop Processor
2
Smart2-DDMA
Controller
2-portDMEM
2-portDMEM
… FIFOFIFO …Test
&Set
Outer Loop
Processor 1
instruction extensions
© Copyright 2004 Jonah Probell slide 16
A Killer Video Processor Architecture
• Software programmability• Parallelism
– VLIW– SIMD– Multiprocessing
• Appropriate use of on- and off-chip memory– Optimal organization of data structures in DRAM
• Deterministic performance– 5-port regfile– 2-port on-chip memory– DMA controller instead of caches
© Copyright 2004 Jonah Probell slide 17
AcknowledgementsThis presentation is © Copyright 2004 Jonah Probell ALL RIGHTS RESERVED. Certain information for this document was derived from publicly available documents of Ultra Data Corp., UB Video Inc., On2 Technologies Inc., and Wikipedia. All trademarks mentioned in this document are property of their respective owners and are hereby acknowledged.
Jonah [email protected]
(781) 209-0886