Top Banner
Emerging Architectures for HD Video Transcoding Jeremiah Golston CTO, Digital Entertainment Products Texas Instruments
22

Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Aug 19, 2018

Download

Documents

buikhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Emerging Architectures for HD Video Transcoding

Jeremiah GolstonCTO, Digital Entertainment ProductsTexas Instruments

Page 2: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

OverviewThe Need for Transcoding

System Challenges

Transcoding Approaches and Issues

Optimization Approaches

Conclusions

Page 3: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Connected Home Vision

Consumers want theirdevices to work together

and share content

Consumers want theirdevices to work together

and share content

MEDIAPre-Recorded

ContentPersonal Media

MOBILE MULTIMEDIAEntertainment,

Personal Pictures and Video,Services

BROADCASTServices,

Entertainment

BROADBANDEntertainment,

E-Business, Services

Consumers want their devices to work together

and share content

Page 4: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Codec Trends By Application

Real Video, H.264 baseline, AVS-MMPEG-4 simple profileCellular Media

MPEG-4, H.264DV-25, MPEG-2Digital Video Camcorders

H.264 baselineMotion JPEG and MPEG-4 simple profile

Digital Still Cameras

WMV9, H.264 main profile, On2 VP6

MPEG-1, low-res MPEG-2 (bandwidth limitations)

DSL-Based Video on Demand

Moving to H.264 high profileto boost HD channel capacity

MPEG-2Satellite

Opportunity for adv CODECs in regions without installed base

MPEG-2 MP@ML, MP@HLDigital Terrestrial TV

H.264, VC-1 required for HD-DVD and Blu-Ray DVD

MPEG-2 MP@MLDVD

Frequent updates, PC platform has allowed support for proprietary codecs

Windows Media, Real Video, DivX, MPEG-4

Internet Streaming

H.264 baselineH.263 and H.261Videophone/Videoconferencing

JPEG2000, H.264 baseline, WMV9Motion JPEG, H.263,MPEG-4 simple profile

Security/Surveillance

Future Codec ConsiderationsCurrent AlgorithmsApplication

Transcoding: Conversions between codec formats, bit rates and resolutions

Page 5: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Comparison of Codecs

In-loopIn-loopIn-loopPostAnnex J in-loop

PostNoneIn-loopDe-blocking filter

Field & Frame

Field & Frame

Field & Frame

Field & Frame

FrameField & Frame

FrameFramePrediction Modes

Prog/IntrProg/IntrProg/IntrProg/IntrProgProg/IntrProgProgFormats supported

YesNoYesNoNoNoNoNoSpatial Intra Prediction

16x16, 16x8, 8x16, 8x8,8x4, 4x8, 4x4

16x16, 8x816x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4

16x16,8x8

16x16,8x8

16x16, 16x8

16x1616x16Vector Block size

8x8 integer8x8, 8x4, 4x8, 4x4 int DCT

4x4 & 8x8 integer

8x8 DCT8x8 DCT8x8 DCT8x8 DCT8x8 DCT

Transform

¼ pel¼ pel¼ pel¼ pel½ pel½ pel½ pelInt. PelMV resolution

Adaptive VLC

Multiple table VLC

UVLC, CAVLC, CABAC

VLCVLC, SACVLCVLCVLCEntropy Coding

I, P, BI, P, BI, P, BI, P, BI, P, BI, P, BI, P, BI, PPicture coding type

AVSWMV/VC-1H.264MPEG-4H.263MPEG-2MPEG-1H.261Features

Page 6: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

FeaturesCore

– ARM926EJ-S™ (MPU) Core – TMS320C64x+™ DSP CoreMemory

– On-Chip L1/SRAM: 112KB DSP, 40KB ARM– On-Chip L2/SRAM: 64 KB DSPPeripherals

– Video Encode/Decode• H.264 BP D1 encoding, simultaneous H.264 BP

CIF coding• H.264 MP@L3, 30-fps SD decoding, VC1/WMV9

full D1 SD decoding, MPEG-2 MP@ML SD decoding, MPEG-4 ASP full D1 SD decoding

– Video Processing Subsystem• Front end – Resizer, Image processing engine,

16-bit digital input• Back end – Integrated OSD,

four video DACs, 24-bit digital RGB output– The Right Peripherals for Your Video, Audio,

Storage and Connectivity Needs • Package: 361-Pin BGA

Benefits• The highly integrated DM6446 Digital Video processor

enables OEMs and ODMs to quickly bring new products to market at low consumer price points

TMS320DM6446 ProcessorVideo Encode and Decode Application Processing

ApplicationsVideo conferencing, video phones, video surveillance, digital media adaptors and IP set-top boxes

Peripherals

ARM Subsystem

DSP Subsystem

EDMA

ATA/ Compact

Flash

Async EMIF/NAND/

SmartMediaMMC/

SDDDR2

Controller(16b/32b)

Program/Data Storage

ARM926EJ-S 300 MHz

CPU

C64x+TM

DSP 600 MHz

Core

Switched Central Resource (SCR)

Video-Imaging

Coprocessor

WDTimer

System

PWMx3

Timerx2

Connectivity

VLYNQEMAC WithMDIO

USB 2.0

PHY

SPI

I2CSerial Interfaces

UART x3

AudioSerialPort

CCD Controller Video Interface

PreviewHistogram/3A

Resizer

On-ScreenDisplay (OSD)

10b DAC10b DAC10b DAC10b DACVideo

Enc(VENC)

Video Processing Subsystem

Back End

Front End

Page 7: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

TI DM6446 platform for D1 30fps (720X480) for YUV 4:2:0Decoder performance numbers are for typical bitstreamsEncoder performance can vary as a result of feature set usedVideo camcorder quality assumed in examples aboveThe C64x+™ on the DM6446 can be clocked at 594 MHz For 4:2:0 video, 30 frames/sec

260 MHz350 MHzWMV9/VC-1 Main Profile

450 MHz590 MHzH.264 Main Profile

300 MHz410 MHzH.264 Base Profile

100 MHz250 MHzH.263 / MPEG-4 SP

DecoderEncoderVideo Codec

DM6446 DSP MHz Consumption

Page 8: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Three 1920x1080 4:2:0 frames = ~9 MBytes

HD Versus SD Decode Video Reference Memory Requirements ComparisonVideo

Memory

Memory

SD MPEG-4

HDH.264

Minimum Reference Frame Buffer Requirements

Reference Index selects

from 3 reference

frames

Single reference frameOne 720x480 4:2:0 frame = ~0.5 MByte

18x Increase in memory requirementNOTE: Neither figure includes additional display buffering and other decoder buffers like stream buffer, tables, etc.

Page 9: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Real-Time Transcoding

MPEG2

HD StorageV: MPEG2A: MPEG2 AAC-LC 5.1

2 Hours16 GB

MPEG2

HD Storage V: H.264 HP A: AC3 5.1 2 Hours

8 GB

HD

V: H.264 MP QVGAA: MPEG4 AAC-LC

V: H.264 BP VGAA: MPEG4 AAC-HE

Real-time HD Transcoding

Typical STB Application

Real-time HD Transcoding STB Application

V: WMA9 MP D1 A: WMA

HD

Page 10: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

System ChallengesRequires multi-format HD decode and encode capabilities

Achieving high quality re-encode on low-cost device

Huge I/O bandwidth requirementse.g., H.264 HD decoder by itself requires ~1.4 GBytes/s of I/OBroadcast encoder uses 10s of GBytes/s of I/O for high-quality motion estimation

Artifacts in original bitstream can get compounded

Page 11: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

HD Encode System Tradeoffs

32-64 MBytes

100s of Mbytes

Multiple GBytes

Solution Memory Requirements

Low-complexity encoder

Low latency, best resolution for available bandwidth

High quality for high-action sports

Key Priorities

H.264 Baseline Profile

720p304-8 mbps

Single-chip 450 MHz Low-power SOC

Digital Video Camcorder

H.264 Baseline Profile

720p30>1 mbps

Multiple 720 MHz DSPs

Video-Conferencing

MPEG-2, H.264 High Profile

1080i10-20 mbps

10s of 1 GHz DSPs & FPGAs

Broadcast

Typical Codec

TypicalResolution

Video Bitrate

2006 Processor Requirements

HD EncodeApplication

Page 12: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Brute-Force TranscodingVideoVideo

DecoderDecoderTranscodedBitstream

EncodedBitstream

VideoVideoEncoderEncoder

Pro’sSimple to implement

Con’sLose key information needed to maintain best quality

Frame type and mode informationHigh-quality motion vectors created by head-end professional encoder

High computational demandsDon’t leverage available complexity shortcuts I/O bandwidth requirements can be too high for embedded systems

Page 13: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Transcoding MPEG-2 to MPEG-4 for Wireless Video

Wireless device has limited resources

Processing powerMemory Display capability

Change GOP structure in MPEG2 to IPPP… structure

Save memory Reduce decoding complexitySmooth bit rate

Frame size down-sampling Large bit-rate reductionFit the display size of most mobile devices

I

B BP

B B B B B BP P

I

P P P P P P P P P P P PI

SD 720 x 480 QVGA320 x 240

Page 14: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Optimized Transcoding

TranscodedBitstream

EncodedBitstream

EntropyEntropyEncoderEncoder

RateRateControlControl

FrameFramePredictionPrediction

InverseInverseTransformTransform

ΣΣΣ

InverseInverseQuantizationQuantization

ForwardForwardQuantizationQuantization

VideoVideoDecoderDecoder

FrameFrameBufferBuffer

ResizeResize

MotionMotionEstimationEstimation

ForwardForwardTransformTransform

ΣΣΣCodingCodingControlControl

ResizeResizeMotion Vectors

Frame Type & MB Modes

Rate Allocation, Quant Levels

Video Encoder

I/OFull Encode ProcessingOptimized Transcode FunctionMemory

Page 15: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Baseball 1: Broadcast Encoder Source

Motion vector is stable and motion vector refinement is adequate

10's of GHz DSPs and FPGAs for encodingSearch ranges +/-500 horizontal +/- 250 vertical

High transcode quality obtained with simple motion vector refinement

Page 16: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Baseball 2: Software Encoder Source

Motion vector is random in still areaEncoder is not considering Motion vector penaltyEven simple MV recovery algorithm yields some benefit

Simulation : MV Recovery reduces 0.5Mbps- 9.19Mbps @36.4db 8.66Mbps @ 36.88db- 5.80Mbps @ 34.8db 5.34Mbps @ 35.17db

Bit rate improvements possible with additional motion vector recovery beyond simple refinement

Page 17: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Mid-Filter Functions

MPEG-2Decoder

H.264Encoder

Info (motion type, motion vectors, DCT type, Q scale)

De-ringingFilter

De-blockingFilter

Bitstream Reconstructedmacroblock

Best transcoding quality and bitrate requires filtering between decode and re-encodeDe-ringing reduces mosquito noise in the sourceDe-blocking reduces block edge artifacts

Page 18: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

De-ringing Example

MPEG-2 Decoded Image (magnified) Filtered Image (magnified)

De-ringing30.86 db @ 10.34 mbps 31.85 db @ 10.34 mbps

1 db gain from using the mid-filter in transcoding

Page 19: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Potential Transcode SolutionsCombine Decoder and Encoder Devices at System Level

Consumer-class encoders typically don’t support broadcast qualityThrowing lots of key information away

Fixed Combination Transcoder ASICHD MPEG-2 -> HD H.264

Doesn’t support universal multi-format decoderOnly supports 1 of the critical emerging transcode requirements

Integrate Multi-format Decoder + Encoder Hardware BlocksFixed rate control, mode decisions, vector scaling, etc

Very difficult due to # of transcode scenarios and maturity of R&D on transcode algorithms

Multi-format encoders not common on market

High Performance Media DSP+Accelerator CombinationRate control, motion estimation control including vector re-use algorithms, & mode decisions in programmable DSP + high-performance accelerators for multi-format decode & encode

Page 20: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Transcode Task PartitioningDecode Control

Rate Control

Encode Control

ME Decisions

Mode Decisions

HD Decode Acceleration HD Encode Acceleration

Loop De-blocking

Entropy Decoding

IDCT/Inverse Quant

Motion Compensation IDCT/Iquant

Motion Estimation Intra Prediction

DCT/Quant

Entropy EncodingLoop De-blocking

MB info (Mode, MVs, etc)

MB data

Picture Layer Processing

Error Concealment

DSP

Page 21: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

HD /SD DISPLAY

Video out

HDD

PCI

Main CPUSTB / DVD SOC

DDR2-533

DVD BD

Digital Tuner/ demodulator/CAS/Demux

HDD

MS/SD i/f

BT656 outBT656 in

StreamI/O

DVD SD

Ethernet

32bit

STB DVR/DVD Recorder Transcode System Diagram Concept

MPEG-2 at 18 mbpsrequires ~8 GBytes/hourto store, 200 GByte HDD allows 25 hours of recording

H.264 at 9 mbps increases recording timeto 50 hours for samesize HDD

TranscoderVideo Decoder

CompositeS-Video

Page 22: Emerging Architectures for HD Video Transcoding · 4x8, 4x4 16x16, 16x16, 8x8 ... Program/Data Storage ARM 926EJ-S 300 MHz CPU C64x+TM DSP ... High Performance Media DSP+Accelerator

Conclusions

HD transcoding presents major challenges for emerging video processing architectures

Intelligent transcoding enables best quality within embedded I/O budgets

Combining programmable DSP w/ HD video acceleration provides optimized architecture for transcoding