Recent Developments in Video Compression Standards and their …peace.snu.ac.kr/ESTIMedia/pdf/keynote_B.pdf · 2017-01-31 · Recent Developments in Video Compression Standards and

Recent Developments in Video Compression Standards and their Impact on Embedded Platforms: from Scalable to Multi-view Video CodingIole Moccagatta, PhDMultimedia Group, IMEC, Kapeldreef 75, B-3001, Leuven, [email protected]

ESTIMedia 2006October 26th, 2006

Recent Developments in Video Compression Standards and their Impact on Embedded Platforms: from Scalable to Multi-view Video Coding

EstiMedia 2006 - © imec 2006

Outline

• The MPEG-4 video standards• H.264/MPEG-4 AVC• Scalability in H.264 Ann. G/MPEG-4 SVC • MPEG-4 MVC: context, motivation, and coding

principles• FVV system and application scenarios• H.264/MPEG-4 AVC complexity• H.264 Ann. G/MPEG-4 SVC and MVC complexity• Impact on embedded platforms• Conclusions



MPEG, VCEG, and the Joint Video Team (JVT)

ISO/IECSC29/WG11

(MPEG)

ITU-TSG16/Q.6(VCEG)

MPEG-4

MPEG-7

MPEG-21H.261

H.262/MPEG-2

H.263

H.263+

……

JVT

H.264/AVC

SVC

MVC



The H.264/MPEG-4 Video Standards

• MPEG-4 Part 2– IS: 2004 (status: 3rd edition)

• ITU-T Rec. H.264/MPEG-4 Part 10 Advanced Video Coding (AVC)– IS: 2005 (status: 3rd edition)

• AVC Amd. 1: Support of colour spaces– FDIS: October 2006 (status: FPDAM)

• AVC Amd. 2: Advanced 4:4:4 profiles– FDIS: January 2007 (status: PDAM)

• ITU-T Rec. H.264 Ann. G/AVC Amd. 3: Scalable Video Coding (SVC)– FDIS: January 2007 (status: PDAM)

• AVC Amd. 4: Multi-view Video Coding (MVC)– FDIS: January 2008 (status: WD)



H.264/MPEG-4 AVC Codec Structure



H.264/MPEG-4 AVC: Motion Compensated Prediction



Additional Features of Mot. Comp.



H.264/MPEG-4 AVC: Multiple Reference Frames

• Multiple picture buffer– FIFO or sliding window

– adaptive memory control

– 16 pictures max (memory is constrained)

• per-8x8 reference control• Bi-predicted picture: 2

sets of motion vector per block



New Types of Temporal Referencing

• Known dependencies (MPEG-1, MPEG-2, etc.)

• New dependencies– referencing order and display order are decoupled

• IBBPBBP.. vs. IBBPBBBBPBP... – referencing type and picture type are decoupled

• B frames can be used as reference



There is more ....

• More coding tools– Motion vector prediction using motion vectors from 4 neighboring blocks

– Adaptive Weighted prediction (generalized B slices)

• each prediction sample can be weighted• an offset can be added

– Interlaced coding

• field or frame coding• macroblock adaptive frame/field coding

– Context-adaptive Binary Arithmetic Codec (CABAC)

• More error resilience and network adaptation tools– Parameter set structure

– Network Adaptation Layer (NAL) syntax structure

– Arbitrary Slice Ordering (ASO)

– Data Partitioning (DP)

– Redundant Slices

– SP/SI synchronization/switching slices

• More sideband information– Supplemental Enhancement Information (pan-scan, cropping, etc.)

– Video Usability Information (aspect ratio of luma sample, overscan, etc.)



H.264/MPEG-4 AVC Coding Efficiency Performance

[Sullivan, SPIE, Aug. 2004]

Fig. 7: (a) – (e) Comparison of R-D curves for MPEG-2 (MP2), MPEG-4 Part 2 ASP (MP4 ASP) and H.264/AVC (MP4 AVC). I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3)



Dramatic Increase of Heterogeneous Devices



Solution: from Simulcast to Scalable Video Coding

00101101101001011010110101101100010110110010110111101010001011011000111011110011100110111001

00101101101001011010110101101100010110110010110111101010001011011000111011110011100110111001



Requirement: (Extended) Spatial scalability

001011011010010110101101011011000101101100101101111010100010110110001110111100

11100110111001



Requirement: Temporal Scalability

001011011010010110101101011011000101101100101101111010100010110110001110111100

11100110111001



Requirement: Quality Scalability

001011011010010110101101011011000101101100101101111010100010110110001110111100

11100110111001



History and Current Status

66, BrisbaneOct ‘03

71, Hong KongJan ‘05

76, MontreuxApr ‘06

CfP[N5958]

WD

79, MarakechJan ‘07

FDIS

WaveletExploration Group

N8043

MPEG-21 Part 13

MPEG-4 Part 10 Amd 3

FDIS

• Temporal scalability:Hierarchical B-frames

• Spatial scalability: Layered approachESS

• Quality scalability: Layered approach for CGSMGSBitplane coding for FGS



SVC Block Diagram

spatialdecimation (2)

MC & Intraprediction


base layercoding

base layercoding

mux

inter-layer prediction techniques

texture

motion

texture

motion

0010110110100101101011010110110001011011001011011110101


base layercoding


texture

motion

progressive SNR refinement

texture coding


texture coding

H.264/AVC compatible bitstream


texture coding

spatialdecimation (2)

generates M (max=3) FGS layers

generates N CGS layers



Temporal Scalability in MPEG-4 SVC: Hierarchical B-frames

key picture(IDR)

key picture

Group of pictures (GOP)

T2

T1

T0

T3

temporal layers

time



Spatial Scalability in MPEG-4 SVC


spatialdecimation



entropycoding

entropycoding

multiplexer

0010110110100101101011010110110001011011001011011110101

texture

motion

texture

motion



Inter-layer Prediction Techniques

• Layered approach

• Three techniques:– Inter-layer Intra Texture Prediction

• un-constrained (multiple loop decoding at target layer multiple mot. comp.) and constrained (single loop decoding)

– Inter-layer Motion Prediction

• macroblock partitioning, scaled motion vectors and reference indices of base layer are used in enhancement layer (base layer mode)

• for each motion vector a quarter-sample motion vector refinement is additionally transmitted and added to the derived motion vectors (1/4pel refinement mode)

– Inter-layer Residual Prediction

• only code the difference between current layer residual information and previous layer (up-sampled) residual information

• Some concepts already existed in MPEG-2/4 for spatial scalability



Quality Scalability in MPEG-4 SVC: Coarse Grain Scalability



entropycoding

entropycoding

multiplexerinter-layer prediction techniques

0010110110100101101011010110110001011011001011011110101

texture

motion

texture

motion

QP2

QP1

QP2 < QP1

QP = Quantization Parameter



Bringing it Together: Combined Scalability

no strict notion of layer

CIF@30Hz

QCIF@30HzCIF@15Hz

[Schwarz, ICME’05]

3D Scalability Space



Embedded Scalability: Coding Efficiency Cost

[Schwarz, ICME’05]

The difference between the two

points is the cost of embedded scalability!

Note: the original target (from MPEG Req.) was 10% cod. eff. loss in exchange of

embedded scalability



MPEG-4 Multi-view Video Coding - Context and Motivations

2D color TV3D color TV

2D 3D

Interactive multi-view video

Passive single-view video

Passive Interactive

1-view N-view

Courtesy of Philips and The Matrix



Interactive Multi-view Realization: Free View-point Video (FVV)

Courtesy of HHI and Microsoft Research



Video Resource Management Info

MVC Video Elementary Stream Info

Timing Info

Camera Parameters Info

MVC Decoder

View Generation

Shared Memory

Video Resource Management Info

MVC Video Elementary Stream Info

Timing Info

Camera Parameters Info

MVC Decoder

View Generation

Shared Memory

• Basic components of an example FVV system

• Example architecture of a FVV decoder

Video Capture Correction MVC

EncoderMVC

DecoderView

Generation Display

FVV System and MVC Codec



Application Scenarios

Entertainmente.g. concert, sport,

movie, game...

Educatione.g. instruction video, cultural archieves

Medical surgeryViewing with

exploration e.g.

museum, shopping

Surveillance

Immersive video conferenceAdvertisements

Event broadcasting



Synchronized Multi-view Video Streams

T0 T1 T2 Tn-1 Tn Tn+1 Tn+2

Time

View

S0

S1

S2

SN

Multi-view image

Multi-view video



Promising Coding Tools Currently Considered for MVC Standard

• Hierarchical B pictures for temporal dependencies and an adapted prediction scheme (HHI proposal)

• MVC encoder optimization• Block level illumination

compensation (5 competing proposals)

– Imperfectly calibrated cameras

– Different perspective projection direction

– Different reflection effects

• View synthesis prediction (2 different proposals)

Temporal Prediction

View Prediction

time

view

View Interpolation

View Warping



Hierarchical B Pictures for MVC: Coding Structure (GOP size 8)

[Sm

olić

, JV

T-T

100,

July

2006]

temporal prediction

inte

r-vie

w p

red

icti

on

combined prediction



Complexity Issues for H.264/AVC BP Decoder (1/2)

• Intra prediction– dependency of 4x4 prediction from previously reconstructed 4x4 blocks brakes

macroblock-based pipeline (scheduling issues)

– intra prediction requires pixel-level access

– heavy computational complexity (interpolation)

• Inter prediction– small blocks (2x2 chroma) increase memory bandwidth (small burst do not allow

to hide penalty for switching lines)

– small randomly fetches waste memory bandwidth for wide memory organization

– high computational complexity and increased memory bandwidth (4x4 -> 9x9) of 6-tap filter for ½ pel

• De-block filtering– complex state machine

– pixel-dependent computation

– independent luma and chroma computation requires fast engine



Complexity Issues for H.264/AVC BP Decoder (2/2)

• Too many book-keeping/low bandwidth operations– variable geometry in neighboring blocks

– reference picture re-ordering

– etc.

• ASO and FMO– implement de-blocking as second pass (double speed, scheduling issues)

– linked list of buffers during bit stream parsing (delay)



Complexity Issues for H.264/AVC BP Encoder

• Intra prediction– select best 4x4 requires to reconstruct previous blocks

• Inter prediction– select best geometry and motion vector precision requires testing (3+4*4)*3

cases

– short cuts are required (ex: select best mode based on 1 or ½ pel, them compute ¼ pel on winner, reduce search range, etc.)

– complexity increases linearly as function of multiple reference frames

• Coding gain vs. complexity tradeoff: adding more tools increase complexity while coding efficiency saturate



H.264/AVC BP Decoder: Complexity Estimation from Profiling

• Assumption: complexity measured as time complexity (i.e. number of operations required to execute a specific implementation of an algorithm)

• Profiling performed on 600-MHz P3 PC• Fare comparison is claimed

– Comparable SW optimization in both decoders

– Similar motion estimation and mode decision processes for both encoders

[Horo

witz,

IEEE C

SVT,

July

2003]

* I- and P-frames * UVLC (~CAVLC)* five ref. frames



H.264/AVC BP Decoder: Memory Requirement from Theoretical Evaluation

• Memory requirements (bytes)– Frame buffers (ex: reconstructed and ref. frames, etc.)

– Buffers storing one MB lines of MB (ex: intra prediction, etc.)

– Buffers @ MB level (ex: transform coeff., etc.)

– Constant data (ex: tables, etc.)

• Note: formulas are derived from algorithmic analysis

[Horo

witz,

IEEE C

SVT,

July

2003]

w = pict. width h = pict. heightn = # ref. frame



H.264/AVC BP Encoder: Complexity Estimation from Profiling

• Profiled on a reduced instruction set computing (RISC) platform (one PE, 1GHz Ultra Sparc II CPU and 8 Gbytes RAM)

• Disclaimer: non-optimized SW, no algorithm optimization (ex: integer-pel full-search ME)– Note: complexity analysis based on MPEG ref. SW (JM) typically overstates the actual

complexity of the H.264/AVC encoder by an order of magnitude, and that of the decoder by a factor of 2 to 3 [Shafer, EBU Tech Review, Jan ‘03]

• Requirements– memory transfer req. = 460 GB/s

– computational req. = 300 GIPS

[Chie

n,

IEEE C

om

m.

Mag

., A

ug.

05]



Complexity Issues for MPEG-4 SVC (1/2)

• Inherited motion compensation memory access complexity from H.264/AVC – multiple ref. frames

– hierarchical B-frames

• Inter-layer prediction– un-constrained inter-layer intra texture

prediction: multiple motion compensation (multiple loop, one mot. comp./layer)

• motion compensation memory access complexity * (# layers), scaled by spatial decimation factor across layers

– inter-layer motion prediction: ok

– inter-layer residual prediction: memory access complexity to fetch lower layer’s residual

temporal scalability

spatial and quality (CGS) scalability

inter-layer prediction techniquesspatial

decimation

entropycoding

entropycoding

multiplexer




entropycoding

entropycoding

multiplexer





Complexity Issues for MPEG-4 SVC (2/2)

• Efforts to reduce complexity are under way– multi loop vs. single loop (# of MCs loops): recently removed from the

standard to reduce decoder complexity

– Motion Compensated Temporal Filtering (MCTF) = hierarchical B-frames + update step

• removed update step at the decoder side removed requirement support for MCTF at the decoder side

• may still be used as pre-processing and/or enhancement tool at the encoder side



Complexity issues for MPEG-4 MVC

• Inter-viewpoint prediction + inter-viewpoint & temporal prediction– introduced to enhance compression of the simultaneous and multiple video streams by exploiting

inter-viewpoint interpolation

– add-ups to “classical” temporal interpolation

– stress memory bandwidth requirements: think H.264/AVC B-frames, but coming from neighboring view as wall

• memory hierarchy is very important!

• Intermediate view synthesis– required by FVV system that use MPEG-4 MVC as key coding technology

– against traditional encoder vs. decoder complexity distribution

– new technology for consumer market apps. where cost is key factor for success

• new solutions are needed!

• Simultaneous decoding of multiple frames to enable “Matrix-like bullet time” visual effects

• Efforts to reduce complexity are under way– algorithm’s complexity reduction

• ex: simplified prediction structures to reduce the number of reference candidates– speed-ups approaches

• ex: speed-up of block-level illumination compensation



Impact on Embedded Platforms (1/3)

• Memory access complexity (i.e. memory bandwidth)– sources of the increased complexity:

• multiple ref. frames• hierarchical B-frames• inter-layer prediction• inter-viewpoint prediction

– impact of memory access on energy: computation doesn’t cost much, but bandwidth feasts on energy [Wilson, EDN, Sept. 06]

• How to address memory bandwidth requirements – reduce data transfer between processing components, and reduce storage

requirements

• ex1: optimal memory organization/hierarchy• ex2: memory organization that increases locality maximize

$ performances– Data Transfer and Storage Exploration (DTSE) approaches can help to

investigate space of possible solution and find the best trade-offs




• Throughput requirement from high spatial resolution

Operation frequency of a generic algorithmic component, assuming efficiency/processing capability of this component of 1 clk./sample

• Computational power (OPS) requirement from algorithmic complexity– Ex: 300 GIPS from H.264/AVC BP Encoder at CIF 30fps

1 clk./sample = 1.5clk./pixel1.5clk./pixel * #MB/sec ex: 1.14 MHz @ CIF




• How to address joint throughput and computational power requirements:– can not satisfy these requirements with Task Level Parallelism (TLP, i.e. functional

split of the pipeline) alone

• Data Level Parallelism (DLP, such as inner loop-level parallelism) and/or Instruction Level Parallelism (ILP, i.e. VLIW) is a must have

– DLP ex: multimedia instruction set extensions (e.g., Intel’s MMX and SSE)• H/W acceleration is a maybe

– ex1: Application Specific Instruction Set Processor (ASIP)– ex2: H/W for CABAC, CA-VLC, etc.

– multi-core architectures to beat energy/power consumption

• load balancing is key issue (see clock islands in H/W)– need clever partition

» make full use of all the resources» switch-off what is not needed when is not needed

– design vs. run time approach vs. combined approach» design time: TL and DL parallelism» run time: RTOS with multi-threading

"Von Neumann is a poor use of scaling — all the energy is going on the communication between the processor and the memory. It’s much better to use 20

microprocessors running at 100MHz than one at 2GHz"[Hugo De Man]



Conclusions

• H.264/MPEG-4 Part 10 AVC• SVC extension of H.264/MPEG-4 AVC (FDIS Jan ’07)• MVC extension of H.264/MPEG-4 AVC (FDIS Jan ’08)

• Memory access complexity significantly increased due to frame/layer/view-point prediction– need to minimize data transfer between processing components as well as

storage requirements

• Computational complexity increased due to combination of high spatial resolution and algorithmic complexity of codec’s tools– DLP and ILP are a must have

– H/W acceleration may be necessary

– multi-core architecture to reduce power

Recent Developments in Video Compression Standards and their …peace.snu.ac.kr/ESTIMedia/pdf/keynote_B.pdf · 2017-01-31 · Recent Developments in Video Compression Standards and

Documents