Video Compression 2004

8/2/2019 Video Compression 2004

1/59

April 22, 2004 Page 1 John G. Apostolopoulos

VideoCoding

Video Compression

MIT 6.344, Spring 2004

John G. ApostolopoulosStreaming Media Systems Group

Hewlett-Packard [email protected]


2/59

John G. ApostolopoulosPage 2

VideoCoding

April 22, 2004

Overview of Next Three Lectures

Video Compression (Thurs, 4/22)

Principles and practice of video coding Basics behind MPEG compression algorithms Current image & video compression standards

Video Communication & Video Streaming I(Tues, 4/27) Video application contexts & examples: DVD and Digital TV Challenges in video streaming over the Internet

Techniques for overcoming these challenges

Video Communication & Video Streaming II(Thurs, 4/29) Video over lossy packet networks and wireless links Error-

resilient video communications

Today


3/59


VideoCoding

April 22, 2004

Outline of Todays Lecture

Motivation for compression

Brief review of generic compression system (from prior lecture) Brief review of image compression (from last lecture) Video compression

Exploit temporal dimension of video signal Motion-compensated prediction Generic (MPEG-type) video coder architecture Scalable video coding

Overview of current video compression standards What do the standards specify? Frame-based video coding: MPEG-1/2/4, H.261/3/4

Object-based video coding: MPEG-4


4/59


VideoCoding

April 22, 2004

Motivation for Compression:Example of HDTV Video Signal

Problem:

Raw video contains an immense amount of data Communication and storage capabilities are limited

and expensive Example HDTV video signal:

720x1280 pixels/frame, progressive scanning at60 frames/s:

20 Mb/s HDTV channel bandwidth Requires compression by a factor of 70 (equivalent

to .35 bits/pixel)

sGbcolor

bits

pixel

colors frames

frame

pixels / 3.1

83

sec

601280720 =


5/59


VideoCoding

April 22, 2004

Achieving Compression

Reduce redundancy and irrelevancy

Sources of redundancy Temporal: Adjacent frames highly correlated Spatial: Nearby pixels are often correlated with

each other Color space: RGB components are correlated

among themselves Relatively straightforward to exploit

Irrelevancy Perceptually unimportant information Difficult to model and exploit


6/59


VideoCoding

April 22, 2004

Spatial and Temporal Redundancy

Why can video be compressed? Video contains much spatial and temporal redundancy.

Spatial redundancy: Neighboring pixels are similar Temporal redundancy: Adjacent frames are similar

Compression is achieved by exploiting the spatial and temporal redundancy inherent to video.


7/59


VideoCoding

April 22, 2004








8/59


VideoCoding

April 22, 2004

Generic Compression System

A compression system is composed of three key building blocks: Representation Concentrates important information into a few parameters

Quantization

Discretizes parameters Binary encoding Exploits non-uniform statistics of quantized parameters Creates bitstream for transmission

Representation(Analysis) Quantization BinaryEncoding

OriginalSignal

CompressedBitstream


9/59


VideoCoding

April 22, 2004

Generic Compression System (cont.)

Generally, the only operation that is lossy is thequantization stage

The fact that all the loss (distortion) is localized to asingle operation greatly simplifies system design

Can design loss to exploit human visual system (HVS)

properties

Representation(Analysis) Quantization

OriginalSignal

CompressedBitstream

BinaryEncoding

Generallylossless Lossy Lossless


10/59


11/59


12/59


VideoCoding

April 22, 2004








13/59


VideoCoding

April 22, 2004

Video Compression

Video : Sequence of frames(images) that are related

Related along the temporal dimension Therefore temporal redundancy exists

Main addition over image compression

Temporal redundancy Video coder mustexploit the temporal redundancy


14/59


VideoCoding

April 22, 2004

Temporal Processing

Usually high frame rate:Significant temporal redundancy

Possible representations along temporal dimension: Transform/subband methods

Good for textbook case of constant velocity uniform

global motion Inefficient for nonuniform motion, I.e. real-world motion Requires large number of frame stores

Leads to delay (Memory cost may also be an issue)

Predictive methods Good performance using only 2 frame stores However, simple frame differencing in not enough


15/59


VideoCoding

April 22, 2004

Video Compression

Goal: Exploit the temporal redundancy Predict current frame based on previously coded frames Three types of coded frames:

I-frame: Intra-coded frame, coded independently of allother frames

P-frame: Predictively coded frame, coded based onpreviously coded frame

B-frame: Bi-directionally predicted frame, coded basedon both previous and future coded frames

I frame P-frame B-frame


16/59


VideoCoding

April 22, 2004

Temporal Processing:Motion-Compensated Prediction

Simple frame differencingfailswhen there is motion

Must account for motion Motion-compensated (MC) prediction

MC-prediction generally provides significant improvements

Questions: How can we estimate motion? How can we form MC-prediction?


17/59


VideoCoding

April 22, 2004

Temporal Processing:Motion Estimation

Ideal situation:

Partition video into moving objects Describe object motion Generally very difficult

Practical approach: Block-Matching Motion Estimation Partition each frame into blocks, e.g. 16x16 pixels Describe motion of each block

No object identification required Good, robust performance


18/59


VideoCoding

April 22, 2004

Block-Matching Motion Estimation

1615

14

13

1211

109

87

6

5

432

1

1615

1413

1211

109

87

65

43

21

Reference Frame Current Frame

Motion Vector(mv1, mv2)

Assumptions:

Translational motion within block:

All pixels within each block have the same motion

ME Algorithm:1) Divide current frame into non-overlapping N1xN2 blocks2) For each block, find thebest matching block in reference frame

MC-Prediction Algorithm: Use best matching blocks of reference frame as prediction of

blocks in current frame

),,(),,( 221121 ref cur k mvnmvn f k nn f =


19/59


VideoCoding

April 22, 2004

Block Matching:Determining the Best Matching Block

For each block in the current frame search for best matchingblock in the reference frame

Metricsfor determining best match:

Candidate blocks: Strategies for searchingcandidate blocks for best match

Full search: Examine all candidate blocks Partial (fast) search: Examine a carefully selected subset

Estimate of motion for best matching block:motion vector

( )[ ]( ) =

21 ,

2221121 ),,(,,

nn Block ref cur k mvnmvn f k nn f MSE

( )( ) = 21 , 221121 ),,(,,nn Block ref cur k mvnmvn f k nn f MAE ( ) areapixel32,32e.g.,in,blocksAll


20/59


VideoCoding

April 22, 2004

Motion Vectors and Motion Vector Field

Motion vector

Expresses therelative horizontal and vertical offsets(mv 1,mv 2 ), or motion, of a given block from oneframe to another

Each block has its own motion vector Motion vector field

Collection of motion vectors for all the blocks in aframe


21/59


VideoCoding

April 22, 2004

Example of Fast Motion Estimation Search:3-Step (Log) Search

Goal: Reduce number of searchpoints

Example: Dots represent search points Search performed in 3 steps

(coarse-to-fine):Step 1:Step 2:Step 3:

Best match is found at each step

Next step: Search is centeredaround the best match of prior step

Speedup increases for largersearch areas

( ) pixels4( ) pixels2( ) pixels1

( ) areasearch7,7


22/59


23/59


VideoCoding

April 22, 2004

Practical Half-Pixel Motion Estimation Algorithm

Half-pixel ME (coarse-fine) algorithm:

1) Coarse step: Perform integer motion estimation on blocks; findbest integer-pixel MV

2) Fine step:Refine estimate to find best half-pixel MV

a) Spatially interpolate the selected region in reference frameb) Compare current block to interpolated reference frameblock

c) Choose the integer or half-pixel offset that provides bestmatch

Typically, bilinear interpolation is used for spatial interpolation

l d f


24/59


VideoCoding

April 22, 2004

Example: MC-Prediction for TwoConsecutive Frames

Previous Frame(Reference Frame)

Current Frame(To be Predicted)

161514

13

1211

109

87

6

5

432

1

16 15 1413

12 11 109

8 7 65

4 3 21

Reference Frame redicted Frame

E l MC P di i f T


25/59


VideoCoding

April 22, 2004

Example: MC-Prediction for TwoConsecutive Frames (cont.)

Prediction ofCurrent Frame

Prediction Error(Residual)


26/59


VideoCoding

April 22, 2004

Block Matching Algorithm: Summary Issues:

Block size? Search range? Motion vector accuracy?

Motion typically estimated only fromluminance Advantages:

Good, robust performance for compression Resulting motion vector field is easy to represent (one MV

per block) and useful for compression Simple, periodic structure, easy VLSI implementations

Disadvantages: Assumes translational motion model Breaks down for

more complex motion

Often produces blocking artifacts (OK for coding withBlock DCT)


27/59


28/59

Vid C i


29/59


VideoCoding

April 22, 2004

Video Compression

Main addition over image compression: Exploit the temporal redundancy

Predict current frame based on previously coded frames

Three types of coded frames: I-frame: Intra-coded frame, coded independently of all

other frames P-frame: Predictively coded frame, coded based on

previously coded frame B-frame: Bi-directionally predicted frame, coded based

on both previous and future coded frames

I frame P-frame B-frame

d Example Use of I P B frames:


30/59


VideoCoding

April 22, 2004

Example Use of I-,P-,B-frames:MPEG Group of Pictures (GOP)

Arrows show prediction dependencies between frames

MPEG GOP

I0

B1

B2

P3

B4

B5

P6

B7

B8

I9

Vid


31/59


VideoCoding

April 22, 2004

Summary of Temporal Processing

Use MC-prediction (P and B frames) to reduce temporalredundancy

MC-prediction usually performs well; In compression have asecond chance to recover when it performs badly MC-prediction yields:

Motion vectors MC-prediction error or residual Code error with

conventional image coder Sometimes MC-prediction mayperform badly

Examples: Complex motion, new imagery (occlusions) Approach:1. Identify frame or individual blocks where prediction fail2. Code without prediction

Vid


32/59


VideoCoding

April 22, 2004

Basic Video Compression Architecture

Exploiting the redundancies:

Temporal: MC-prediction (P and B frames) Spatial: Block DCT Color: Color space conversion

Scalar quantization of DCT coefficients Zigzag scanning, runlength and Huffman coding of the

nonzero quantized DCT coefficients


33/59

Video


34/59


VideoCoding

April 22, 2004

Example Video Decoder

Huffman Decoder

Motion Compensation

Buffer YUV to RGB

Reconstructed Frame Residual

MV data

Output Video Signal

Input Bitstream

MC-Prediction

Inverse DCT

Inverse Quantize

Frame Store

Previous Reconstructed

Frame

Video


35/59


VideoCoding

April 22, 2004





Overview of current video compression standards What do the standards specify? Frame-based video coding: MPEG-1/2/4, H.261/3/4 Object-based video coding: MPEG-4

Video Motivation for Scalable Coding


36/59


VideoCoding

April 22, 2004

Motivation for Scalable Coding

Basic situation:1. Diverse receiversmay request the same video

Different bandwidths, spatial resolutions, frame rates,

computational capabilities2. Heterogeneous networksand a priori unknown network conditions Wired and wireless links, time-varying bandwidths

When you originally code the video you dont know which clientor network situation will exist in the future Probably have multiple different situations, each requiring adifferent compressed bitstream

Need a different compressed video matched to each situation Possible solutions:

1. Compress & storeMANYdifferent versions of thesame video2. Real-time transcoding(e.g. decode/re-encode)

3. Scalable coding

Video


37/59


VideoCoding

April 22, 2004

Scalable Video Coding

Scalable coding: Decompose video intomultiple layers of prioritized

importance Code layers intobase and enhancement bitstreams Progressively combineone or more bitstreamsto produce

different levels of video quality Example of scalable coding with base and two enhancementlayers: Can produce three different qualities

1. Base layer2. Base + Enh1 layers3. Base + Enh1 + Enh2 layers

Scalability with respect to: Spatial or temporal resolution, bitrate, computation, memory

Higher quality

Video


38/59


VideoCoding

April 22, 2004

Example of Scalable Coding Encode image/video into three layers:

Encoder Base Enh1 Enh2

Low-bandwidth receiver: Send only Base layer

Decoder Low ResBase

Medium-bandwidth receiver: Send Base & Enh1 layers

Decoder Med ResBase Enh1

Decoder High ResBase Enh1 Enh2

High-bandwidth receiver: Send all three layers

Can adapt to different clients and network situations

Video


39/59


VideoCoding

April 22, 2004

Scalable Video Coding (cont.)

Three basic types of scalability (refine video quality

along three different dimensions): Temporal scalability Temporal resolution Spatial scalability Spatial resolution SNR (quality) scalability Amplitude resolution

Each type of scalable coding provides scalability of onedimension of the video signal

Can combine multiple types of scalability to providescalability along multiple dimensions

Video


40/59


VideoCoding

April 22, 2004

Scalable Coding: Temporal Scalability

Temporal scalability:Based on the use of B-framesto

refine thetemporal resolution B-frames are dependent on other frames However,no other frame depends on a B-frame Each B-frame may be discarded without affecting

other frames

PI B B PB B IB B

MPEG GOP

0 1 2 3 4 5 6 7 8 9

Video


41/59


VCoding

April 22, 2004

Scalable Coding: Spatial Scalability

Spatial scalability : Based on refining thespatial resolution Base layer is low resolutionversion of video Enh1contains coded differencebetween upsampled

base layer and original video Also called: Pyramid coding

2

EncBase layer

Enh layerEnc

2Dec

Dec

2

DecLow-Res Video

High-Res VideoOriginal Video

Video Scalable Coding: SNR (Quality)


42/59


Coding

April 22, 2004

g (Q y)Scalability

SNR (Quality) Scalability:Based on refining the

amplitude resolution Base layer uses a coarse quantizer Enh1 applies a finer quantizer to the difference

between the original DCT coefficients and thecoarsely quantized base layer coefficients

I frame P-frame

EI frame EP frame

Note: Base & enhancementlayers are at the same spatiaresolution

Video


43/59


Coding

April 22, 2004

Summary of Scalable Video Coding

Three basic types of scalable video coding: Temporal scalability

Spatial scalability SNR (quality) scalability

Scalable coding produces different layers with prioritized

importance Prioritized importance is key for a variety of applications: Adapting to different bandwidths, or client resources

such as spatial or temporal resolution or computationalpower

Facilitates error-resilience by explicitly identifying mostimportant and less important bits

Video


44/59


Coding

April 22, 2004






Video


45/59


Coding

April 22, 2004

Motivation for Standards

Goal of standards:

Ensuring interoperability:Enabling communicationbetween devices made by different manufacturers Promoting a technology or industry Reducing costs

Videod


46/59


Coding

April 22, 2004

What do the Standards Specify?

Encoder Bitstream Decoder

VideoC di


47/59


Coding

April 22, 2004

What do the Standards Specify?

Not the encoder Not the decoder Just the bitstream syntax and the decoding process (e.g. use IDCT,

but not how to implement the IDCT) Enables improved encoding & decoding strategies to be

employed in a standard-compatible manner

Encoder Bitstream Decoder

Scope of Standardization

(Decoding Process )

VideoC di

Current Image and Video


48/59


Coding

April 22, 2004

Compression StandardsStandard Application Bit Rate

JPEG Continuous-tone still-imagecompression

Variable

H.261 Video telephony andteleconferencing over ISDN

p x 64 kb/s

MPEG-1 Video on digital storage media(CD-ROM)

1.5 Mb/s

MPEG-2 Digital Television 2-20 Mb/s

H.263 Video telephony over PSTN 33.6-? kb/sMPEG-4 Object-based coding, synthetic

content, interactivityVariable

JPEG-2000 Improved still image compression Variable

H.264 / MPEG-4 AVC

Improved video compression 10s to 100s kb/s

VideoC di g

Comparing Current Video Compression


49/59


Coding

April 22, 2004

Standards

Based on the same fundamental building blocks Motion-compensated prediction (I, P, and B frames) 2-D Discrete Cosine Transform (DCT) Color space conversion Scalar quantization, runlengths, Huffman coding

Additional toolsadded for different applications: Progressive or interlaced video Improved compression, error resilience, scalability, etc.

MPEG-1/2/4, H.261/3/4: Frame-based coding MPEG-4:Object-based coding and Synthetic video

VideoCoding

MPEG Group of Pictures (GOP)


50/59


Coding

April 22, 2004

Structure Composed of I, P, and B frames Arrows show prediction dependencies Periodic I-frames enable random access into the coded bitstream Parameters: (1) Spacing between I frames, (2) number of B frames

between I and P frames

MPEG GOP

I0

B1

B2

P3

B4

B5

P6

B7

B8

I9

VideoCoding


51/59


Coding

April 22, 2004

MPEG Structure

MPEG codes video in a hierarchy of layers. The

sequence layer is not shown.

P

GOP Layer Picture Layer

Macroblock Layer

Block Layer

8x8 DCT4 8x8 DCT

1 MV

Slice Layer

B

B

P

B

B

I

VideoCoding MPEG 2 P fil d L l


52/59


Coding

April 22, 2004

MPEG-2 Profiles and Levels

Goal: To enable more efficient implementations for

different applications (interoperability points) Profile : Subset of the tools applicable for a family ofapplications

Level : Bounds on the complexity for any profile

Simple Main HighProfile

Level

Low

Main

High

DVD & SD Digital TV:Main Profile at Main Level(MP@ML)

HDTV: Main Profile atHigh Level (MP@HL)

VideoCoding

MPEG 4 N l Vid C di


53/59


Coding

April 22, 2004

MPEG-4 Natural Video Coding

Extension of MPEG-1/2-type algorithms to codearbitrarily shaped objects

[MPEG Committee]

Frame-based Coding

Object-based Coding

Basic Idea: Extend Block-DCT and Block-ME/MC-predictionto code arbitrarily shaped objects

VideoCoding


54/59


Coding

April 22, 2004

Example ofMPEG-4

Scene(Object-basedCoding)

[MPEG Committee]

VideoCoding

Example MPEG-4 Object Decoding Process


55/59


Coding

April 22, 2004

[MPEG Committee]

VideoCoding Sprite Coding (Backgro nd Prediction)


56/59


Coding

April 22, 2004

Sprite Coding (Background Prediction)

Sprite: Large background image

Hypothesis: Same background exists for many frames,changes resulting from camera motion and occlusions One possible coding strategy:

1. Code & transmit entire sprite once2. Only transmit camera motion parameters for each

subsequent frame Significant coding gain for some scenes

VideoCoding Sprite Coding Example


57/59


g

April 22, 2004

Sprite Coding Example

Sprite (background) ForegroundObject

ReconstructedFrame [MPEG Committee]

VideoCoding Review of Todays Lecture


58/59


g

April 22, 2004

Review of Today s Lecture

Motivation for compression Brief review of generic compression system (from prior lecture) Brief review of image compression (from last lecture) Video compression

Exploit temporal dimension of video signal

Motion-compensated prediction Generic (MPEG-type) video coder architecture Scalable video coding


VideoCoding References and Further Reading


59/59


g

April 22, 2004

References and Further Reading

General Video Compression References: J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'',

Wiley Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Inc., New York, 1999.

V. Bhaskaran and K. Konstantinides,Image and Video CompressionStandards: Algorithms and Architectures, Boston, Massachusetts:

Kluwer Academic Publishers, 1997. J.L. Mitchell, W.B. Pennebaker, C.E. Fogg, and D.J. LeGall,MPEG Video Compression Standard , New York: Chapman & Hall, 1997.

B.G. Haskell, A. Puri, A.N. Netravali,Digital Video: An Introduction to MPEG-2,Kluwer Academic Publishers, Boston, 1997.

MPEG web site:http://drogo.cselt.stet.it/mpeg

Video Compression 2004

Documents