Emerging Architectures for HD Video Transcoding Jeremiah Golston CTO, Digital Entertainment Products Texas Instruments
Emerging Architectures for HD Video Transcoding
Jeremiah GolstonCTO, Digital Entertainment ProductsTexas Instruments
OverviewThe Need for Transcoding
System Challenges
Transcoding Approaches and Issues
Optimization Approaches
Conclusions
Connected Home Vision
Consumers want theirdevices to work together
and share content
Consumers want theirdevices to work together
and share content
MEDIAPre-Recorded
ContentPersonal Media
MOBILE MULTIMEDIAEntertainment,
Personal Pictures and Video,Services
BROADCASTServices,
Entertainment
BROADBANDEntertainment,
E-Business, Services
Consumers want their devices to work together
and share content
Codec Trends By Application
Real Video, H.264 baseline, AVS-MMPEG-4 simple profileCellular Media
MPEG-4, H.264DV-25, MPEG-2Digital Video Camcorders
H.264 baselineMotion JPEG and MPEG-4 simple profile
Digital Still Cameras
WMV9, H.264 main profile, On2 VP6
MPEG-1, low-res MPEG-2 (bandwidth limitations)
DSL-Based Video on Demand
Moving to H.264 high profileto boost HD channel capacity
MPEG-2Satellite
Opportunity for adv CODECs in regions without installed base
MPEG-2 MP@ML, MP@HLDigital Terrestrial TV
H.264, VC-1 required for HD-DVD and Blu-Ray DVD
MPEG-2 MP@MLDVD
Frequent updates, PC platform has allowed support for proprietary codecs
Windows Media, Real Video, DivX, MPEG-4
Internet Streaming
H.264 baselineH.263 and H.261Videophone/Videoconferencing
JPEG2000, H.264 baseline, WMV9Motion JPEG, H.263,MPEG-4 simple profile
Security/Surveillance
Future Codec ConsiderationsCurrent AlgorithmsApplication
Transcoding: Conversions between codec formats, bit rates and resolutions
Comparison of Codecs
In-loopIn-loopIn-loopPostAnnex J in-loop
PostNoneIn-loopDe-blocking filter
Field & Frame
Field & Frame
Field & Frame
Field & Frame
FrameField & Frame
FrameFramePrediction Modes
Prog/IntrProg/IntrProg/IntrProg/IntrProgProg/IntrProgProgFormats supported
YesNoYesNoNoNoNoNoSpatial Intra Prediction
16x16, 16x8, 8x16, 8x8,8x4, 4x8, 4x4
16x16, 8x816x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4
16x16,8x8
16x16,8x8
16x16, 16x8
16x1616x16Vector Block size
8x8 integer8x8, 8x4, 4x8, 4x4 int DCT
4x4 & 8x8 integer
8x8 DCT8x8 DCT8x8 DCT8x8 DCT8x8 DCT
Transform
¼ pel¼ pel¼ pel¼ pel½ pel½ pel½ pelInt. PelMV resolution
Adaptive VLC
Multiple table VLC
UVLC, CAVLC, CABAC
VLCVLC, SACVLCVLCVLCEntropy Coding
I, P, BI, P, BI, P, BI, P, BI, P, BI, P, BI, P, BI, PPicture coding type
AVSWMV/VC-1H.264MPEG-4H.263MPEG-2MPEG-1H.261Features
FeaturesCore
– ARM926EJ-S™ (MPU) Core – TMS320C64x+™ DSP CoreMemory
– On-Chip L1/SRAM: 112KB DSP, 40KB ARM– On-Chip L2/SRAM: 64 KB DSPPeripherals
– Video Encode/Decode• H.264 BP D1 encoding, simultaneous H.264 BP
CIF coding• H.264 MP@L3, 30-fps SD decoding, VC1/WMV9
full D1 SD decoding, MPEG-2 MP@ML SD decoding, MPEG-4 ASP full D1 SD decoding
– Video Processing Subsystem• Front end – Resizer, Image processing engine,
16-bit digital input• Back end – Integrated OSD,
four video DACs, 24-bit digital RGB output– The Right Peripherals for Your Video, Audio,
Storage and Connectivity Needs • Package: 361-Pin BGA
Benefits• The highly integrated DM6446 Digital Video processor
enables OEMs and ODMs to quickly bring new products to market at low consumer price points
TMS320DM6446 ProcessorVideo Encode and Decode Application Processing
ApplicationsVideo conferencing, video phones, video surveillance, digital media adaptors and IP set-top boxes
Peripherals
ARM Subsystem
DSP Subsystem
EDMA
ATA/ Compact
Flash
Async EMIF/NAND/
SmartMediaMMC/
SDDDR2
Controller(16b/32b)
Program/Data Storage
ARM926EJ-S 300 MHz
CPU
C64x+TM
DSP 600 MHz
Core
Switched Central Resource (SCR)
Video-Imaging
Coprocessor
WDTimer
System
PWMx3
Timerx2
Connectivity
VLYNQEMAC WithMDIO
USB 2.0
PHY
SPI
I2CSerial Interfaces
UART x3
AudioSerialPort
CCD Controller Video Interface
PreviewHistogram/3A
Resizer
On-ScreenDisplay (OSD)
10b DAC10b DAC10b DAC10b DACVideo
Enc(VENC)
Video Processing Subsystem
Back End
Front End
TI DM6446 platform for D1 30fps (720X480) for YUV 4:2:0Decoder performance numbers are for typical bitstreamsEncoder performance can vary as a result of feature set usedVideo camcorder quality assumed in examples aboveThe C64x+™ on the DM6446 can be clocked at 594 MHz For 4:2:0 video, 30 frames/sec
260 MHz350 MHzWMV9/VC-1 Main Profile
450 MHz590 MHzH.264 Main Profile
300 MHz410 MHzH.264 Base Profile
100 MHz250 MHzH.263 / MPEG-4 SP
DecoderEncoderVideo Codec
DM6446 DSP MHz Consumption
Three 1920x1080 4:2:0 frames = ~9 MBytes
HD Versus SD Decode Video Reference Memory Requirements ComparisonVideo
Memory
Memory
SD MPEG-4
HDH.264
Minimum Reference Frame Buffer Requirements
Reference Index selects
from 3 reference
frames
Single reference frameOne 720x480 4:2:0 frame = ~0.5 MByte
18x Increase in memory requirementNOTE: Neither figure includes additional display buffering and other decoder buffers like stream buffer, tables, etc.
Real-Time Transcoding
MPEG2
HD StorageV: MPEG2A: MPEG2 AAC-LC 5.1
2 Hours16 GB
MPEG2
HD Storage V: H.264 HP A: AC3 5.1 2 Hours
8 GB
HD
V: H.264 MP QVGAA: MPEG4 AAC-LC
V: H.264 BP VGAA: MPEG4 AAC-HE
Real-time HD Transcoding
Typical STB Application
Real-time HD Transcoding STB Application
V: WMA9 MP D1 A: WMA
HD
System ChallengesRequires multi-format HD decode and encode capabilities
Achieving high quality re-encode on low-cost device
Huge I/O bandwidth requirementse.g., H.264 HD decoder by itself requires ~1.4 GBytes/s of I/OBroadcast encoder uses 10s of GBytes/s of I/O for high-quality motion estimation
Artifacts in original bitstream can get compounded
HD Encode System Tradeoffs
32-64 MBytes
100s of Mbytes
Multiple GBytes
Solution Memory Requirements
Low-complexity encoder
Low latency, best resolution for available bandwidth
High quality for high-action sports
Key Priorities
H.264 Baseline Profile
720p304-8 mbps
Single-chip 450 MHz Low-power SOC
Digital Video Camcorder
H.264 Baseline Profile
720p30>1 mbps
Multiple 720 MHz DSPs
Video-Conferencing
MPEG-2, H.264 High Profile
1080i10-20 mbps
10s of 1 GHz DSPs & FPGAs
Broadcast
Typical Codec
TypicalResolution
Video Bitrate
2006 Processor Requirements
HD EncodeApplication
Brute-Force TranscodingVideoVideo
DecoderDecoderTranscodedBitstream
EncodedBitstream
VideoVideoEncoderEncoder
Pro’sSimple to implement
Con’sLose key information needed to maintain best quality
Frame type and mode informationHigh-quality motion vectors created by head-end professional encoder
High computational demandsDon’t leverage available complexity shortcuts I/O bandwidth requirements can be too high for embedded systems
Transcoding MPEG-2 to MPEG-4 for Wireless Video
Wireless device has limited resources
Processing powerMemory Display capability
Change GOP structure in MPEG2 to IPPP… structure
Save memory Reduce decoding complexitySmooth bit rate
Frame size down-sampling Large bit-rate reductionFit the display size of most mobile devices
I
B BP
B B B B B BP P
I
P P P P P P P P P P P PI
SD 720 x 480 QVGA320 x 240
Optimized Transcoding
TranscodedBitstream
EncodedBitstream
EntropyEntropyEncoderEncoder
RateRateControlControl
FrameFramePredictionPrediction
InverseInverseTransformTransform
ΣΣΣ
InverseInverseQuantizationQuantization
ForwardForwardQuantizationQuantization
VideoVideoDecoderDecoder
FrameFrameBufferBuffer
ResizeResize
MotionMotionEstimationEstimation
ForwardForwardTransformTransform
ΣΣΣCodingCodingControlControl
ResizeResizeMotion Vectors
Frame Type & MB Modes
Rate Allocation, Quant Levels
Video Encoder
I/OFull Encode ProcessingOptimized Transcode FunctionMemory
Baseball 1: Broadcast Encoder Source
Motion vector is stable and motion vector refinement is adequate
10's of GHz DSPs and FPGAs for encodingSearch ranges +/-500 horizontal +/- 250 vertical
High transcode quality obtained with simple motion vector refinement
Baseball 2: Software Encoder Source
Motion vector is random in still areaEncoder is not considering Motion vector penaltyEven simple MV recovery algorithm yields some benefit
Simulation : MV Recovery reduces 0.5Mbps- 9.19Mbps @36.4db 8.66Mbps @ 36.88db- 5.80Mbps @ 34.8db 5.34Mbps @ 35.17db
Bit rate improvements possible with additional motion vector recovery beyond simple refinement
Mid-Filter Functions
MPEG-2Decoder
H.264Encoder
Info (motion type, motion vectors, DCT type, Q scale)
De-ringingFilter
De-blockingFilter
Bitstream Reconstructedmacroblock
Best transcoding quality and bitrate requires filtering between decode and re-encodeDe-ringing reduces mosquito noise in the sourceDe-blocking reduces block edge artifacts
De-ringing Example
MPEG-2 Decoded Image (magnified) Filtered Image (magnified)
De-ringing30.86 db @ 10.34 mbps 31.85 db @ 10.34 mbps
1 db gain from using the mid-filter in transcoding
Potential Transcode SolutionsCombine Decoder and Encoder Devices at System Level
Consumer-class encoders typically don’t support broadcast qualityThrowing lots of key information away
Fixed Combination Transcoder ASICHD MPEG-2 -> HD H.264
Doesn’t support universal multi-format decoderOnly supports 1 of the critical emerging transcode requirements
Integrate Multi-format Decoder + Encoder Hardware BlocksFixed rate control, mode decisions, vector scaling, etc
Very difficult due to # of transcode scenarios and maturity of R&D on transcode algorithms
Multi-format encoders not common on market
High Performance Media DSP+Accelerator CombinationRate control, motion estimation control including vector re-use algorithms, & mode decisions in programmable DSP + high-performance accelerators for multi-format decode & encode
Transcode Task PartitioningDecode Control
Rate Control
Encode Control
ME Decisions
Mode Decisions
HD Decode Acceleration HD Encode Acceleration
Loop De-blocking
Entropy Decoding
IDCT/Inverse Quant
Motion Compensation IDCT/Iquant
Motion Estimation Intra Prediction
DCT/Quant
Entropy EncodingLoop De-blocking
MB info (Mode, MVs, etc)
MB data
Picture Layer Processing
Error Concealment
DSP
HD /SD DISPLAY
Video out
HDD
PCI
Main CPUSTB / DVD SOC
DDR2-533
DVD BD
Digital Tuner/ demodulator/CAS/Demux
HDD
MS/SD i/f
BT656 outBT656 in
StreamI/O
DVD SD
Ethernet
32bit
STB DVR/DVD Recorder Transcode System Diagram Concept
MPEG-2 at 18 mbpsrequires ~8 GBytes/hourto store, 200 GByte HDD allows 25 hours of recording
H.264 at 9 mbps increases recording timeto 50 hours for samesize HDD
TranscoderVideo Decoder
CompositeS-Video