Top Banner
1 © 1995-99 Sarnoff Corporation MPEG-2 Video Compression Reproduction in any form requires written permission from the Sarnoff Corporation. November 29, 1999 Michael Isnardi Sarnoff Corporation e-mail: [email protected]
138
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: mpeg-2VideoCompression

1 © 1995-99Sarnoff Corporation

MPEG-2 VideoCompression

Reproduction in any form requires written permission from the Sarnoff Corporation.

November 29, 1999

Michael IsnardiSarnoff Corporation

e-mail: [email protected]

Page 2: mpeg-2VideoCompression

2 © 1995-99Sarnoff Corporation

MPEG Video Outline• Introduction

– Video Basics– Human Vision Basics– Colorimetry Basics– Video Compression Basics

• MPEG-1 Video• MPEG-2 Video• Rate Control, VBV, Stat Mux• Practicing the Art of MPEG• ATSC Video Constraints and Extensions

Page 3: mpeg-2VideoCompression

3 © 1995-99Sarnoff Corporation

Video Basics...dissection of image into scanning lines..

VideoCameraVideo

CameraVideo Cable

a single scan line

Voltage(proportional

to brightness)

Time

forehead

waveform of scan line shown

wall wall

active videosync and blanking

Video Monitor

hair hair

Page 4: mpeg-2VideoCompression

4 © 1995-99Sarnoff Corporation

The Scanning Raster

Vertical Blanking

HorizontalBlanking

ActiveVideo

525lines

(NTSC)

625lines(PAL-

Europe)

Page 5: mpeg-2VideoCompression

5 © 1995-99Sarnoff Corporation

The Progressive Raster

Vertical Blanking

ActiveVideo

Scan lines viewed edge-on

Frame Period

Note: All scan linesare sampled ateach time instant.

time

y

x

Page 6: mpeg-2VideoCompression

6 © 1995-99Sarnoff Corporation

The Interlaced Raster

Vertical Blanking

ActiveVideo

Scan lines viewed edge-on

Frame Period

Note: Alternate scanlines are sampledat each time instant.

time

y

xField Period

Nominal Frame Rates: 30 Hz (NTSC), 25 Hz (PAL-Europe)

Page 7: mpeg-2VideoCompression

7 © 1995-99Sarnoff Corporation

Common Rasters for Video Coding

Luminance values shown.

480 lines(NTSC)

or576 lines

(PAL)

720 pixels

ActiveVideo

Interlaced Raster(30 frames/sec NTSC,25 frames/sec PAL)

“601”

288 lines

360 pixels

Progressive Raster(30 frames/sec)

“CIF”(Common Intermediate Format)

ActiveVideo

240 lines(NTSC)

or288 lines

(PAL)

360 pixels

Progressive Raster(30 frames/sec NTSC,25 frames/sec PAL)

“SIF”(Source Input Format)

ActiveVideo

Page 8: mpeg-2VideoCompression

8 © 1995-99Sarnoff Corporation

Why Interlace?• Background

– In 1930’s, interlaced scanning was developed as a bandwidthsaving technique.

– Persistence of vision causes two fields to fuse into singleimage, without flicker.

– All broadcasting today uses interlaced scanning.

• Advantages:– High vertical detail retained for still portions of the scene.

• Drawbacks:– Reduced vertical detail for moving areas– Flicker at edges of objects (e.g., text), which is why computer

industry uses progressive scanning for monitors.– More complicated signal processing for resizing, frame rate

conversion, etc.

Page 9: mpeg-2VideoCompression

9 © 1995-99Sarnoff Corporation

Human Vision Basics• Human Visual System (HVS) has

limitations that can be exploited forvideo system design:– limited response to black-and-white detail– even more limited response to color detail– image motion appears fluid at rates above 24 Hz– foveal flicker not annoying at picture rates above 24 Hz– limited ability to track rapidly moving objects– insensitivity to “noise”

• at object edges• in highly detailed areas of a scene• in bright areas of a scene• immediately after scene changes

Page 10: mpeg-2VideoCompression

10 © 1995-99Sarnoff Corporation

Colorimetry Basics

Color VideoCamera

Color VideoCamera

Color Video Monitor

R’G’B’

RGBto

YC1C2

RGBto

YC1C2

YC1C2

YC1C2to

RGB

YC1C2to

RGB

R’G’B’

• In broadcast and studio applications, the gamma-corrected RGB “taking”primaries are transformed to YC1C2 “transmission” primaries.

• Y is the luminance (luma) component; C1 and C2 are the chrominance(chroma, or color difference) components.

• To exploit the HVS’ reduced spatial response to chroma, C1 and C2 arefurther bandlimited in spatial frequency compared to Y.

• The exact transformation matrix is system-dependent.

transmissionchannel(s)

gamma-correctedsignals

Page 11: mpeg-2VideoCompression

11 © 1995-99Sarnoff Corporation

CCIR Rec. 601 Transformation

YCrCb601:

0.30 0.59 0.11 0.50 -0.42 -0.08-0.17 -0.33 0.50

=R’G’B’

CrCb

0.00 0.710.56 0.00

= B’-YR’-Y

R’-Y,Cr

B’-Y,Cb

• In 8-bit implementations,– Y occupies 220 levels: [16, 235]– Cr, Cb occupy 225 levels: [16, 240]

Page 12: mpeg-2VideoCompression

12 © 1995-99Sarnoff Corporation

VideoCompression

Basics

Page 13: mpeg-2VideoCompression

13 © 1995-99Sarnoff Corporation

What is Video Compression?...Orange Juice Analogy...

Concentrate:Shipped, Stored

and Sold

H2O

OJ

Fresh-Squeezed!

H2OWater is the redundant element.

Tastes LikeFresh-

Squeezed!

In video compression, the encoder removes spatial andtemporal redundancy; the decoder puts it back in.

Page 14: mpeg-2VideoCompression

14 © 1995-99Sarnoff Corporation

Video Compression Techniques• Remove spatial and temporal redundancy

that exist in natural video imagery– correlation itself can be removed in a lossless fashion– important for medical applications– only realizes about 2:1 compression efficiency

• Exploit limitations in Human VisualSystem– limited luminance and very limited color response– reduced sensitivity to noise in high frequencies (e.g., edges of

objects)– reduced sensitivity to noise in brighter areas– goal is to throw away bits in a psychovisually lossless manner– can realize 50:1 or more compression efficiency

Page 15: mpeg-2VideoCompression

15 © 1995-99Sarnoff Corporation

Major Image and VideoCompression Technologies

• DCT Based Int’l Standards, Economy of Scale– Motion JPEG Studio Applications– H.261 Videoconferencing– MPEG-1 CD-ROM Multimedia– MPEG-2 DTV Broadcast, DVD

• Subband/Wavelet– EZW VLBR and browsing applications

• Other– DVI/Indeo Multimedia– Fractal Multimedia– DPCM Broadcast– Lossless (e.g., special JPEG mode) Medical

Page 16: mpeg-2VideoCompression

16 © 1995-99Sarnoff Corporation

Evolution of VideoCompression Standards

• JPEG (Joint Photographics Experts Group)- mostly used for coding still images- introduced DCT and Quantization as part of "Tool Kit"- "Motion JPEG" is intra frame only, low compression, and

low delay

• H.261 (px64)- used for video teleconferencing- px64 kbps (p=1, ..., 32)- introduced motion compensated DCT (I and P frames)- medium compression, low delay

• MPEG-1, MPEG-2- used for digital storage media and broadcast- 1-15+ Mbps- introduced concept of B frames and field modes- high compression, medium delay

Page 17: mpeg-2VideoCompression

17 © 1995-99Sarnoff Corporation

Coding Efficiency

• How does one compare the efficiency ofvarious video compression methods?

• For example, the following videoencoders all have the same quality.Which has the best coding efficiency?Which one has the worst?

Parameter Coder 1 Coder 2 Coder 3 Coder 4Image Size (HxV) 720x480 544x480 480x480 1920x1080Bit Rate (R) 6 Mbps 4 Mbps 6 Mbps 19 MbpsFrame Rate (F) 29.97 fps 30 fps 24 fps 29.97 fpsChroma Format 4:2:0 4:2:2 4:4:4 4:2:0

Page 18: mpeg-2VideoCompression

18 © 1995-99Sarnoff Corporation

Normalized Bit Rate

• A meaningful comparative metric is thenormalized bit rate, in units of bits/colorpixel.

Normalized Bit Rate = C � RH � V � F

bits/color pixel

where C = Chroma Format Factor(C = 1/3 for 4:4:4, 1/2 for 4:2:2, 2/3 for 4:2:0)

R = Bit Rate (bits/second)F = Frame Rate (frames/second)H, V = Horizontal and Vertical Size (luma pixels/frame)

Page 19: mpeg-2VideoCompression

19 © 1995-99Sarnoff Corporation

Coding Efficiency Example

• Now let’s compare the four coders usingNormalized Bit Rate:

Parameter Coder 1 Coder 2 Coder 3 Coder 4Image Size (HxV) 720x480 544x480 480x480 1920x1080Bit Rate (R) 6 Mbps 4 Mbps 6 Mbps 19 MbpsFrame Rate (F) 29.97 fps 30 fps 24 fps 29.97 fpsChroma Format 4:2:0 4:2:2 4:4:4 4:2:0Norm. Bit Rate 0.39 0.26 0.36 0.20

This coder has the worst codingefficiency. It uses the most bits/pixel.

This coder has the best codingefficiency. It uses the fewest bits/pixel.

Page 20: mpeg-2VideoCompression

20 © 1995-99Sarnoff Corporation

MPEG Video

Page 21: mpeg-2VideoCompression

21 © 1995-99Sarnoff Corporation

• MPEG = Moving Picture Experts Group• Part of the International Standards Organization (ISO)• Aim was to create the best video compression standards

for multimedia and broadcast applications• MPEG-1 Video aimed at SIF resolution

– 352x240, 30 Hz, non-interlaced, 1.5 Mb/s– CD-ROM applications

• MPEG-2 Video aimed at CCIR-601 resolution– 720x480, 30 Hz, interlaced, 4-10 Mb/s– broadcast applications, including HDTV

• MPEG-1 and MPEG-2 are International Standards

What is MPEG Video?

Page 22: mpeg-2VideoCompression

22 © 1995-99Sarnoff Corporation

MPEG-2 Video: Background

• MPEG-2 work started in November, 1991

• Standard optimized at “NTSC quality” CCIR-601 video @ 10 Mbps 39 algorithms competed in subjective tests, some very different from MPEG-1.

• Large attendance, typically 175-200 participants.More than 75 organizations, including representatives of CE, telco, computer,broadcasting and universities.

• Design focus on interlaced CCIR-601 (720x480 pixels) video @ 4 to 9Mbps.

• Targeted at broadcast and DVD applications.

• Extensible to lower and higher resolutions1) downward compatibility with MPEG-12) includes support of HDTV formats

• MPEG-2 Video (ISO/IEC 13818-2) promoted to International Standard inNovember, 1995.

Page 23: mpeg-2VideoCompression

23 © 1995-99Sarnoff Corporation

MPEG International Standards• MPEG-1 (ISO/IEC 11172)

– 11172-1: Systems– 11172-2: Video– 11172-3: Audio– 11172-4: Conformance– 11172-5: Software

• MPEG-2 (ISO/IEC 13818)– 13818-1: Systems– 13818-2: Video– 13818-3: Audio– 13818-4: Conformance– 13818-5: Software– 13818-6: Digital Storage Media - Command & Control (DSM-CC)– 13818-7: Non-Backward Compatible Audio– 13818-9: Real-Time Interface– 13818-10: DSM-CC Conformance

These standards areavailable from ISO

and ANSI

Page 24: mpeg-2VideoCompression

24 © 1995-99Sarnoff Corporation

MPEG-2 HDTVBroadcast

MPEG-1 vs. MPEG-2 Operating Points

5 10 15 20Bit Rate (Mb/s)

720x48030 Hz

360x24030 Hz

1280x72030 Hz

1920x108030 Hz

MPEG-1CD-ROM

MPEG-2 StandardDefinition Broadcast

Image Size &Frame Rate

MPEG-2 StandardDefinition Production

• • •

Page 25: mpeg-2VideoCompression

25 © 1995-99Sarnoff Corporation

MPEG-2: a superset of MPEG-1

MPEG-1 Syntax Elements+ Interlace Tools+ New Syntax Structures+ Scalable Modes+ Profiles & Levels

MPEG-2 =

Page 26: mpeg-2VideoCompression

26 © 1995-99Sarnoff Corporation

MPEG-2 Interlace Tools

• Broadcast video is interlaced• MPEG-1 does not handle interlaced

video efficiently• MPEG-2 adds key interlace tools:

– Field Picture Structure– Field DCT– Field Prediction Modes– Alternate Zig-Zag Scan– 3:2 Pulldown Support– Field-Based Pan-and-Scan Support

Page 27: mpeg-2VideoCompression

27 © 1995-99Sarnoff Corporation

• MPEG only specifies bitstream syntax and decodingprocess

• Encoding algorithms (e.g., Motion Estimation, RateControl and Mode Decisions) are open to inventionand proprietary techniques

• MPEG is asymmetric in that much less computationalpower is required in the decoder.

• Example:– SDTV MPEG-2 encode: 20 GIPS– SDTV MPEG-2 decode: 600 MIPS

Key Points about MPEG Video

Page 28: mpeg-2VideoCompression

28 © 1995-99Sarnoff Corporation

DCT Q Q-1 DCT-1

Rate Control VLDVLC

Motion Est

Motion Comp

MPEG Syntax

MPEG Building Blocks

Page 29: mpeg-2VideoCompression

29 © 1995-99Sarnoff Corporation

MPEG Video LayersSequence

(Display Order)

GOP(Display Order,

N=12, M=3)

Picture

Slice

Macroblock

Y Cr

Cb

4 50 12 3Y Blocks Cr Block Cb Block

B B B B B B B BI P P P

Note:Y = LumaCr = Red-YCb = Blue-Y

Page 30: mpeg-2VideoCompression

30 © 1995-99Sarnoff Corporation

MPEG Video Layers (cont’d)• Important syntax elements in each layer:

Sequence Picture Size; Frame RateBit Rate; Buffering RequirementsProgrammable Coding Parameters

GOP Random Access UnitSMPTE Time-Code

Picture Timing information (buffer fullness, temporal reference), Coding type (I, P, or B)

Slice Intra-frame addressing informationCoding re-initialization (error resilience)

Macroblock Basic coding structure, Coding method, Motion Vectors, Quantization

Block DCT coefficients

Page 31: mpeg-2VideoCompression

31 © 1995-99Sarnoff Corporation

Key Concepts

• For a given bit rate, the following coding parametersgreatly affect picture quality:– GOP Structure

• longer GOP’s improve picture quality but decreaserandom access (i.e., lengthen channel change time)

• dynamic GOP’s can be used creatively to handle scenechanges and other effects

– MV Search Range• Wider searches are better, but more costly• A large search range is a must for fast action (e.g.,

sports)– Rate Control

• Mode decisions greatly affect number of coded bits• Proprietary schemes will continue to dominate

Page 32: mpeg-2VideoCompression

32 © 1995-99Sarnoff Corporation

Typical MPEG Encoder Structure

"0"

Re-SequencedInput

DCT

MotionEstimator

Motion Vectors

DCT -1

To VLCEncodercoefficients

motion vectors

Embedded Decoder

Q

-1QPredictedImage

Quantization ParametersFrom RateController

FrameMemory 1

FrameMemory 2

MotionCompensated

Prediction

intra

inter

PredictionError

ReconstructedImage

Page 33: mpeg-2VideoCompression

33 © 1995-99Sarnoff Corporation

Sequence• For CD-ROM applications, sequences can be used to

indicate relatively long clips (e.g. shots, scenes orentire movies)

• For broadcast applications, sequence headers areusually sent frequently (e.g., every GOP) so that keybitstream info is obtained at channel changes

Video 1

Video 2

Viewer changes channels here...

...but decoder must wait until nextSEQ header to start decoding

1 GOP

SEQ Header +GOP Header +I Frame Pic Header

Page 34: mpeg-2VideoCompression

34 © 1995-99Sarnoff Corporation

MPEG-2 Structures• Sequence Structures

– Progressive Sequences: contain frames pictures– Non-Progressive Sequences: may contain frame and field

pictures

• Frame Structures– Progressive Frame: its two fields come from same time

instant– Non-Progressive Frame: its two fields come from different

times

• Picture Structures– Frame Picture– Field Picture: must occur in pairs; a frame = two field

pictures– Both frame and field pictures may be used in the same

non-progressive sequence.

Page 35: mpeg-2VideoCompression

35 © 1995-99Sarnoff Corporation

Sequence Types

Progressive Frame PictureNon-Progressive Frame PictureComposed of two Field Pictures

• MPEG-2 allows both Progressive and Non-Progressive Sequences.• A Non-Progressive Sequence may contain both Frame Pictures and

Field Pictures.

Page 36: mpeg-2VideoCompression

36 © 1995-99Sarnoff Corporation

• Contains three types of pictures:- Intra (I) pictures intraframe-only spatial DCT- Predicted (P) pictures DCT with forward prediction- Bi-directional (B) pictures DCT with bi-directional prediction

Group of Pictures (GOP)

I B B P B B P B B P B B I

Time

Forward Prediction

Bi-directional Prediction

Page 37: mpeg-2VideoCompression

37 © 1995-99Sarnoff Corporation

Anchor Pictures

• I and P pictures– stored in two frame buffers in encoder and decoder– form the basis for prediction of P and B pictures

I B B P B B P B B P B B I

Time

Anchor Pictures

Page 38: mpeg-2VideoCompression

38 © 1995-99Sarnoff Corporation

I Pictures– DCT coded without reference to any other pictures– stored in a frame buffer in encoder and decoder– used as basis of prediction for entire GOP

I B B P B B P B B P B B I

Time

I PictureAll these P and B pictures depend on

the preceding I picture

Page 39: mpeg-2VideoCompression

39 © 1995-99Sarnoff Corporation

P Pictures– DCT coded with reference to the preceding anchor picture– stored in a frame buffer in encoder and decoder– use forward prediction only

I B B P B B P B B P B B I

Time

This P picturedepends onthis I picture

Forward Prediction

This P picturedepends onthis P picture

Page 40: mpeg-2VideoCompression

40 © 1995-99Sarnoff Corporation

B Pictures– DCT coded with reference to either the preceding anchor

picture, the following anchor picture, or both– use forward, backward or bi-directional prediction

I B B P B B P B B P B B I

Time

This B picturedepends on

this I picture and this Ppicture

Bi-directional Prediction

This B picturedepends on

this P picture and this Ppicture

Page 41: mpeg-2VideoCompression

41 © 1995-99Sarnoff Corporation

Forward Prediction– a forward-predicted macroblock depends on decoded

pixels from the immediately preceding anchor picture– can be used to code macroblocks in P and B pictures

I B B P B B P B B P B B I

Time

– the arrows, as shown, indicate direction of motion– if arrows are reversed, they indicate coding dependencies

Page 42: mpeg-2VideoCompression

42 © 1995-99Sarnoff Corporation

Backward Prediction– a backward-predicted macroblock depends on decoded

pixels from the immediately following anchor picture– can only be used to code macroblocks in B pictures

I B B P B B P B B P B B I

Time

Page 43: mpeg-2VideoCompression

43 © 1995-99Sarnoff Corporation

Bi-directional (Interpolated) Prediction– a bi-directionally-predicted macroblock depends on

decoded pixels from the anchor pictures immediatelyfollowing and immediately preceding

– can only be used to code macroblocks in B pictures

I B B P B B P B B P B B I

Time

Page 44: mpeg-2VideoCompression

44 © 1995-99Sarnoff Corporation

GOP Rules

• A GOP must contain at least one I picture• This I picture may be followed by any number of

I and P pictures• Any number of B pictures may occur between

anchor pictures, and B pictures may precede thefirst I picture

• A GOP, in coding order, must start with an Ipicture

• A GOP, in display, order must start with an I or Bpicture and must end with an I or P picture

Page 45: mpeg-2VideoCompression

45 © 1995-99Sarnoff Corporation

Regular and Irregular GOP’s• Regular GOP’s are defined by N and M*:

– N is the I picture interval– M is the anchor picture interval. There are M-1 B pictures between anchor

pictures

• Irregular GOP’s are not defined by N and M, but are still allowedas long as they follow the GOP Rules.

Regular: N=1, M=1(12 GOP’s shown)

I

B B BI P P

B B B B B B B BI P P P

B B B B B BB BI PP P

All GOP’s in Display Order

Regular: N=6, M=2(2 GOP’s shown)

Regular: N=12, M=3(1 GOP shown)

Irregular

II III III III

B B BI P P

*N and M are not MPEGsyntax elements and arenot used in any way by

the specification.

Page 46: mpeg-2VideoCompression

46 © 1995-99Sarnoff Corporation

Closed and Open GOP’s

• Closed GOP’s can be decoded independently, withoutusing decoded pictures in previous GOP’s.

• Open GOP’s require such pictures to be available.

B BI PRegular: N=4, M=2(3 GOP’s shown)

B PBI B B PI

Closed GOP’s

B BI PRegular: N=4, M=2(3 GOP’s shown)

B PBI B B PI

Open GOP’s

Note that first B picture depends on last anchor picturefrom previous GOP.

Note that first B picture must be restricted to usebackward prediction only.

Page 47: mpeg-2VideoCompression

47 © 1995-99Sarnoff Corporation

GOP Picture Orderings• Two Distinct Picture Orderings

– Display Order (input to encoder, output of decoder)– Coding Order (output of encoder, input to decoder)– These are different if B frames are present– B frames must be reordered so that “future” anchor pictures are

available for prediction. Note that reordering causes DELAY!

GOPDisplay Order

Input to EncoderB B B B B B B BI P P P

GOPCoding Order

Output of EncoderB BB B B B B BI P P P

GOPDisplay Order

Output of DecoderB B B B B B B BI P P P

Page 48: mpeg-2VideoCompression

48 © 1995-99Sarnoff Corporation

Slice Structures• A slice is a collection of macroblocks in raster scan order.• Restriction on slice sizes:

- MPEG-1 has none. Can be single MB or entire picture.- MPEG-2 restricts a slice to be contained within a row of macroblocks

• MPEG-2 allows gaps between slices in “General SliceStructure”

• MPEG-2 defines “Restricted Slice Structure”, in which nogaps are allowed. This is used in most Profiles and Levels.

A

B C

D E F

G

H I

J

K L M N O P Q

R S

T

U

W

V

X Y Z

Example ofRestricted Slice Structure

Page 49: mpeg-2VideoCompression

49 © 1995-99Sarnoff Corporation

Chroma Formats and Picture Sizes

Y2Hx2V

CrHxV

4:2:0(Required in

MPEG-1)

4:4:4 (Option in MPEG-2)

4:2:2(Option inMPEG-2)

CD-ROM and Broadcast Apps.

Y2Hx2V

CbHxV

CrHx2V

CbHx2V

Y2Hx2V

Cr2Hx2V

Cb2Hx2V

StudioApps.

Page 50: mpeg-2VideoCompression

50 © 1995-99Sarnoff Corporation

Macroblock Structures

4:2:0

4:2:2

4:4:4

0 12 3

Cr

4

Y Cb

5

0 12 3

Cr

4

Y Cb

5

0 12 3

Y

6 7

Cr

4

Cb

56 7

810

911

12 Blocks

8 Blocks

6 Blocks

luma chroma

Spatial SamplingRelationship

MPEG-1chroma

MPEG-2chroma

Page 51: mpeg-2VideoCompression

51 © 1995-99Sarnoff Corporation

• DCT is an orthogonal transformation

• 2-D DCT is separable in x and y dimensions

• Has good energy compaction properties

• Close to Karhunen-Loeve Transform (KLT), which is optimal but dependson image statistics.

• Efficient hardware realization

• Theoretically lossless, but slightly lossy in practice due to round off errors

Image Transform domain

8x8 coefficients

Spatialdomain

8x8 pixels

Spatialdomain

8x8 pixels

ReconstructedImage

Discrete Cosine Transform (DCT)

8x8Forward

DCT

8x8Inverse

DCT

Page 52: mpeg-2VideoCompression

52 © 1995-99Sarnoff Corporation

255 255 255 255 255 255 255 255

255 187 204 255 255 255 255 255

255 122 20 102 230 255 255 255

255 213 255

0 0 17 94

0

0 0 0 0 0

255 255 128 0 0 0 0 0

153 0 0 35 136

255 196 0 0

0 0 0 0255 247 43

255 255 82

• Transforms 8x8 pixel block into 8x8 frequency coefficient matrix

• Organizes video information in a way that is easy to compress andmanipulate

• DCT applied to Intra blocks as well as motion-compensated blocks

pixels DCT coefficients

Discrete Cosine Transform (cont’d)

8x8Forward

DCT

1105 238 358 158 30 -56 -49 -31

548 -379 -143 19 71 66 32 9

207 103 -171 -81 -58 7 24 31

-52 -20 -21

-18 -3 9 -4

1

49 -1 -18 -9 8

-27 9 -24 28 34 -24 -4 3

162 -34 -66 -18 -20

-33 13 71 -52

23 -28 -3 -611 -56 56

-5 -14 -11

“DC” low horizontal high

low

vertical

high

Page 53: mpeg-2VideoCompression

53 © 1995-99Sarnoff Corporation

8x8 Blocks and Their TransformsMPEG Flower Garden Block of

8x8 Pixels

SinglePixel

Their DCTCoefficients

DC

Flat Area

Vertical Edge

Horizontal Edge

Diagonal Line

Page 54: mpeg-2VideoCompression

54 © 1995-99Sarnoff Corporation

DCT and IDCT Formulas

f(x,y)x

y

F(u,v)u

v

2-DDCT

DCT CoefficientsPixels

DCCoeff.

ACCoeffs.

F(u,v) = (2/N) C(u) C(v) ����f(x,y) cos[(2x+1)uπ/2N] cos[(2y+1)vπ/2N]x=0, y=0

N-1, N-1Forward DCT:

f(x,y) = (2/N) ����C(u) C(v) F(u,v) cos[(2x+1)uπ/2N] cos[(2y+1)vπ/2N]u=0, v=0

N-1, N-1Inverse DCT:

where: C(u), C(v) = {1/�2 for u,v = 0; 1 otherwise} N=8

Page 55: mpeg-2VideoCompression

55 © 1995-99Sarnoff Corporation

2-D DCT Basis Images

u (Horizontal Frequency)

v (Vert.Freq.)

0 1 2 3 4 5 6 70

1

2

3

4

5

6

7

Page 56: mpeg-2VideoCompression

56 © 1995-99Sarnoff Corporation

• Quantization can be thought of as dividing eachtransform coefficient by a frequency-dependent value,and then rounding or truncating to the nearest integer

• Inverse quantization is like multiplication

• Quantization coefficients can be tailored to noisesensitivity of Human Visual System

• Quantization is LOSSY! Reconstructed pixels usuallydiffer in value from original

• Quantization causes information to be irretrievably lost

Image DCT Reconstructed ImageQ

Quantized

coefficients Q-1 DCT -1

Quantization

Page 57: mpeg-2VideoCompression

57 © 1995-99Sarnoff Corporation

• Quantization Matrix (QM)– 8x8 matrix can be shaped so that coarser quantization of

high spatial frequencies occurs– coarser quantization of high spatial frequencies saves bits

but causes little or no subjective degradation– In MPEG-2, up to four QM’s (luma intra/non-intra and

chroma intra/non-intra) can be changed at the picture rate– Default matrices are specified and need not be sent, but

different ones can be downloaded

• Quantizer Scale (QS)– QS can change on a macroblock basis– rate control’s job is to modify QS in a way that keeps

picture quality high for a given bit rate

Quantization Tools

Page 58: mpeg-2VideoCompression

58 © 1995-99Sarnoff Corporation

MPEG-2 Quantizer Scale Types

0 10 20 30 400

20

40

60

80

100

120

quantizer_scale_code [1, 31](sent in bitstream)

LinearQuantizer Scale

(q_scale_type = 0)

NonlinearQuantizer Scale

(q_scale_type = 1)

Page 59: mpeg-2VideoCompression

59 © 1995-99Sarnoff Corporation

DC 35 1 2 1 0 0 0 0

3 2 -1 0 0 0 0 0

1 0 -1 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Quantizer ScaleQS = 40

(from Rate Controller)

Quantized DCT CoefficientsT’[u][v]

DCT FrequencyCoefficients

T[u][v]

Default IntraQuantization Matrix

QM[u][v]

A/B

A

B

Quantization Example

PointwiseDivision

and Rounding

Note: Quantization of DC termis fixed and does not depend

on QM or QS.

DC

8 16 19 22 26 27 29 34

16 16 22 24 27 29 34 37

19 22 26 27 29 34 34 38

22 37 40

32 35 40 48

58

34 38 46 56 69

27 29 35 38 46 56 69 83

22 26 27 29 34

22 26 27 29

32 35 40 4826 27 29

26 27 29

�QS

DC

276 59 89 39 7 -13 -12 -7

137 -94 -35 4 17 16 7 2

51 25 -42 -20 -14 1 5 7

-12 -5 -5

-4 0 2 -1

0

12 0 -4 -2 1

-6 2 -6 6 8 -5 -1 0

40 -8 -16 -4 -4

-8 3 17 -13

5 -7 0 -12 14 14

-1 -3 -2

�16

Page 60: mpeg-2VideoCompression

60 © 1995-99Sarnoff Corporation

Default Quantization Matrices

Intra Matrix: QMI[u][v]

Note: AC coefficients (all coefficientsexcept DC) are first multiplied by 16,

then divided by QS*QMI[u][v].

DC term is treated specially.

8 16 19 22 26 27 29 3416 16 22 24 27 29 34 3719 22 26 27 29 34 34 3822 37 40

32 35 40 4858

34 38 46 56 6927 29 35 38 46 56 69 83

DC

22 26 27 29 3422 26 27 29

32 35 40 4826 27 2926 27 29

Non-Intra Matrix: QMN[u][v]

Note: All coefficients are firstmultiplied by 16, then divided

by QS*QMN[u][v].

16 16 16 16 16 16 16 1616 16 16 16 16 16 16 1616 16 16 16 16 16 16 1616 16 16 16 16 16 16 1616 16 16 16 16 16 16 1616 16 16 16 16 16 16 1616 16 16 16 16 16 16 1616 16 16 16 16 16 16 16

Page 61: mpeg-2VideoCompression

61 © 1995-99Sarnoff Corporation

Downloadable Quant Matrices

Example ofDownloadable Matrix

(TM5 Non-Intra Matrix)

16 17 18 19 20 21 22 2324252728303133

17 18 19 20 21 22 232418 19 20 21 22 23262419 20 21 22 2327262520 21 22 232827262421 22 23302827262422 2331302827252423

• For improved quality in certaincoding situations, quantizationmatrices for Intra and Non-Intramacroblocks can bedownloaded.

• The decoder uses these insteadof the defaults (which are notsent in the bitstream)

• The example at right shows animproved Non-Intra Quant Matrixused by the MPEG-2 Test Model5 (TM5)

Page 62: mpeg-2VideoCompression

62 © 1995-99Sarnoff Corporation

Quant Matrix Effect

DC

“freq”

Before Quantization

“freq”

After Quantization

Reconstruction Levels

DC

“freq”

Before Quantization

“freq”

After Quantization

Reconstruction Levels

Flat Matrix

Tilted Matrix

Page 63: mpeg-2VideoCompression

63 © 1995-99Sarnoff Corporation

Quantization Artifacts

Shown after DCT, Quantization, Inverse Quantization and Inverse DCTusing default Intra Quantization Matrix and Linear Quantizer Scale

VerticalEdge

CornerEdge

DiagonalEdge

Original8x8 Block QS = 2 QS = 5 QS = 10 QS = 15

Page 64: mpeg-2VideoCompression

64 © 1995-99Sarnoff Corporation

• Quantization zeros out many DCTcoefficients

• Zig-Zag scanning of the quantized DCTcoefficients yields runs of zeros

• Non-Zero Levels and Runs of Zeros can becoded efficiently using VLC's

• VLC causes variable bit rate output!

Image DCT Reconstructed Image

Q

Variable bit rate

Q-1 DCT-1VLC VLD Q-1

Variable Length Coding (VLC) andDecoding (VLD)

Page 65: mpeg-2VideoCompression

65 © 1995-99Sarnoff Corporation

• Zeros of the 8x8 block are run length coded

• To optimize the runs, the block is zig-zag scanned

Zig-zag scan (MPEG-1 pattern)through quantized DCT coefficients

CorrespondingRun/Level Pairs

350, 10, 30, 10, 20, 20, 10, -13, 10, -1

End of Block

Run Length Coding

35 1 2 1 0 0 0 03 2 -1 0 0 0 0 01 0 -1 0 0 0 0 00 1 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

DC

Common Run/LevelPairs are VLC’d

DC Coefficients aredifferenced from blockto block and VLC’d

Page 66: mpeg-2VideoCompression

66 © 1995-99Sarnoff Corporation

MPEG-2 Enhancements

Motion Vectors

DCT coefficients

motionvectors

Embedded Decoder

PredictedImage

QuantizationParameters

DCT Q

IQ

IDCT

MotionEstimator +

FrameMem 1

FrameMem 2

MotionComp

“0”

Field and FramePictures

Field &Frame Prediction

Linear &Nonlinear QS

AlternateZig-Zag andVLC coding

Field &Frame DCT

VLC &BitstreamPacker

Headers

MPEG-2Video

Bitstream

Page 67: mpeg-2VideoCompression

67 © 1995-99Sarnoff Corporation

MPEG-2 Zig-Zag Scan Options

8x8 Blocks of Quantized DCT CoefficientsDC DC

Normal Zig-Zag Scan.Mandatory in MPEG-1.

Option in MPEG-2.

Alternate Zig-Zag Scan.Not used in MPEG-1.Option in MPEG-2.

For Frame DCTcoding of inter-laced video, moreenergy existshere, so run lengthcoding is moreefficient.

Page 68: mpeg-2VideoCompression

68 © 1995-99Sarnoff Corporation

MPEG-2 Field/Frame DCT Coding• Frame DCT: Normal MPEG-1 mode of coding

• Field DCT: Split into top and bottom fields

• MPEG-2 encoder may choose Field DCT on any macroblock.

• Decoder must interpret coding flag correctly, or severe errors will occur.

•••

•••

Luminance MacroblockField DCT Coding Frame DCT Coding

Note: Chrominance blocks in 4:2:0 mode are always DCT coded in Frame order

x

y

Page 69: mpeg-2VideoCompression

69 © 1995-99Sarnoff Corporation

Variable Length Coding• Huffman type “entropy” coding• Shorter codewords assigned to more probable symbols (like Morse Code)

• Used for motion vectors, run/level pairs, type of macroblocks, etc.

Example: DCT AC coefficients: 0,1 1101,1 01100,-1 1117,-1 0001001EOB 10

Example: Vectors delta coded: 0 11 0102 00103 000104 00001105 00001010...15 000000011010

Page 70: mpeg-2VideoCompression

70 © 1995-99Sarnoff Corporation

• A buffer is used to smooth out the bit rate

• Rate controller adjusts quantizer to control buffer fullness and preventoverflow and underflow of decoder’s buffer (Video Buffer Verifier)

• Buffer size affects image quality and overall delay

• Rate control algorithm is crucial for high quality compression

• Shown above is basic structure for:- Motion JPEG- Intraframe H.261- Intraframe MPEG

Image DCTReconstructed

ImageQ DCT-1VLC VLD Q -1Buffer Buffer

Rate Controller

Constant Bit-Rate

Rate Control

Page 71: mpeg-2VideoCompression

71 © 1995-99Sarnoff Corporation

• To exploit redundancy in still portions of an imagesequence, the difference between the input andthe reconstructed previous frame is coded

• Encoder gets more complex and includes copy ofdecoder (called an embedded decoder)

• Moving areas are not coded well using thisscheme, so MPEG uses Motion CompensatedPrediction.

Image DCTReconstructed

ImageQ DCT -1VLC VLD Q -1

Rate Controller

+ -

CBR

+Frame DelayQ -1

DCT -1

+Frame Delay

Predicted Image

Buf Buf-

Temporal Prediction

Page 72: mpeg-2VideoCompression

72 © 1995-99Sarnoff Corporation

Motion Compensated Prediction

Reconstructed ImageImage DCT Q DCT -1VLC VLD Q -1

Rate Controller

-+ -

CBR

+Motion

CompensatorQ -1

DCT -1

+Motion

Compensator

Predicted Image

Motion Estimation

Motion Vectors

Reconstructed Image

BUF BUF

Residual Image

Motion Vectors

• Most motion is predictable, and motioncompensation exploits this fact.

• Motion Estimation is the process by which motionvectors are computed in the encoder. It can bequite computationally intensive.

• Motion vectors are used by the MotionCompensators in the encoder and decoder toproduce Predicted Images from ReconstructedImages.

• We now have P frames.

Page 73: mpeg-2VideoCompression

73 © 1995-99Sarnoff Corporation

recon.image

A Typical Motion Estimation Architecture

– DCT/Q VLC

Q-1/DCT-1

+MC

CoarseME

inputimage

predictedimage

coarse motion vectors

• Coarse motion vectorscomputed from inputimages.

• “Refined” motion vectors,e.g., half-pel refinement,computed fromreconstructed images.

• Good compromisebetween “true motion”and small error.

• Used in MPEG-2 TestModel 5.

FineME

refined motion vectors

Page 74: mpeg-2VideoCompression

74 © 1995-99Sarnoff Corporation

How Does Motion Compensated PredictionSave Bits?

Current P or B PicturePrevious I or P Picture

XF

MVF

• Instead of sending quantized DCT coefficients of X, send:1. quantized DCT coefficients of X-F (prediction error). If prediction

is good, error will be near zero and will code with fewer bits.2. MVF, the motion vector. This will be differentially coded with

respect to its neighboring vector, and will code efficiently.

• This will typically result in 50% - 80% savings in bits.

CurrentMacroblock

Page 75: mpeg-2VideoCompression

75 © 1995-99Sarnoff Corporation

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

-100 -50 0 50 100 150 200 250 300 350 4000

0.05

0.1

0.15

0.2

0.25

-250 -200 -150 -100 -50 0 50 100 150 200 250

One Frame of Original Image Pair Prediction Error

Gray-Scale Statistics of Prediction Error

Histogram Histogram

Page 76: mpeg-2VideoCompression

76 © 1995-99Sarnoff Corporation

Forward Motion Estimation... used in P and B frames ...

Search Area

Position of "zero motion vector" MB(center of search area)

Position of "best match" MB(to half-pixel accuracy -need not be aligned to MB grid)

Position ofcurrent Macroblock(aligned to MB grid)

TimePrevious I or P Picture Current P or B Picture

MB Grid

Motion Vector(e.g., [-20.5, +20.5])

Page 77: mpeg-2VideoCompression

77 © 1995-99Sarnoff Corporation

ME Matching Metrics

• Minimum Mean Absolute Error: MMAE = min � |X - X’|

• Minimum Mean Squared Error: MMSE = min � (X - X’)2

X

X = 16x16 current MBX’ = 16x16 prediction MB

searcharea i

j

i

j

offset (k,l)

X’MBgrid

i,j

i,j

k,l

k,l

256

256

Page 78: mpeg-2VideoCompression

78 © 1995-99Sarnoff Corporation

Previous I or P Picture.Within the search area, a

good match is foundfor this still object.

Example of Forward Motion EstimationCase: Good prediction for still objects.

Search Area Macroblock Grid

Current P Picture.Current MB is shown

with heavy outline. Sincea match is found, this

MB is intercoded.

Page 79: mpeg-2VideoCompression

79 © 1995-99Sarnoff Corporation

Previous I or P Picture.Within the search area, many

good matches are found. Encodermust pick one and send appropriate

motion vector.

SearchArea

Macroblock Grid

Current P Picture.Current MB is shown

with heavy outline. Sincea match is found, this

MB is intercoded.

Example of Forward Motion EstimationCase: Dealing with featureless regions.

Page 80: mpeg-2VideoCompression

80 © 1995-99Sarnoff Corporation

Current P Picture.Current MB is shown

with heavy outline. Sincea match is found, this

MB is intercoded.

Previous I or P Picture.Within the search area, a

good match is found for thismoving object. Encoder sends

appropriate forward motion vector.

SearchArea

Macroblock Grid

Example of Forward Motion EstimationCase: Good prediction for linearly translating objects.

Page 81: mpeg-2VideoCompression

81 © 1995-99Sarnoff Corporation

Current P Picture.Current MB is shown

with heavy outline. Sinceno match is found, this

MB is intracoded.

Previous I or P Picture.Within the search area, no good matchis found. Note that a good match would

be found with a larger search area. Search areais an important encoder design parameter.

SearchArea

Macroblock Grid

Example of Forward Motion EstimationCase: A good prediction might be missed because it is outside the search area.

Page 82: mpeg-2VideoCompression

82 © 1995-99Sarnoff Corporation

Current P Picture.Current MB is shown

with heavy outline. Sincea match is found, this

MB is intercoded.

Previous I or P Picture.Within the search area, a good matchis found, but within a different object.There is no requirement that motion

vectors represent true motion ofobjects.

SearchArea

Macroblock Grid

Example of Forward Motion EstimationCase: A good prediction might come from an unrelated object.

Page 83: mpeg-2VideoCompression

83 © 1995-99Sarnoff Corporation

Previous I or P Picture

Macroblock Grid

Current P Picture Prediction Error Picture,with MB Type and MotionVectors Superimposed.

(I = Intra, P = Inter)

P P P P P P P

P P P P P P P

P P P P P P P

P P P P P

P P P P P P P

P P P P P P P

P P P P P P P

PI

Example of Forward Motion EstimationCase: Prediction Error should have low energy.

Page 84: mpeg-2VideoCompression

84 © 1995-99Sarnoff Corporation

Current B Picture.Current MB is shownwith heavy outline.

Previous I or P Picture.Searching here finds no

good match becausesome features arepartially hidden.

Next I or P Picture.Searching here finds

a good match becausefeatures are now

uncovered.

Example of Backward Motion EstimationCase: Handles uncovered objects missed by forward prediction.

Page 85: mpeg-2VideoCompression

85 © 1995-99Sarnoff Corporation

Current B PicturePrevious I or P Picture Next I or P Picture

Forward/Backward/Interpolated Decision...must be made for every non-intra macroblock in a B picture...

XF

MVF

BMVB

Define: X = Current MB F = “Best” MB in previous I or P Picture B = “Best” MB in next I or P Picture MVF = MV corresponding to F’s displacement from X MVB = MV corresponding to B’s displacement from X

Compute: “Goodness” of F, B and (F+B)/2 as predictors for X

Decide: If F is best, send MVF Forward Prediction If B is best, send MVB Backward Prediction If (F+B)/2 is best, send MVF and MVB Interpolated Prediction

Page 86: mpeg-2VideoCompression

86 © 1995-99Sarnoff Corporation

Motion Vector Coding Example

3-10

10-10

30-9

30-9

-14-11

-16-11

27-10

24-10MV x

y

3-10

70

201

00

-44-2

-2-0

431

-30�MV x

y

3-10

70

201

00

20-2

-2-0

-211

-30�MV’ x

y

Assume all [x, y] for picture in RANGE[-32, 31] => f_code = 2, MODULUS= 64.

�MV = Differential MV. [0,0] usedas predictor for first MV.

MV Field Motion Vectors (MV’s) shown for8 successive macroblocks.

Add or subtract MODULUS if out ofRANGE. Keeps all values in RANGE.

�MV’’VLC

xy

0101,000101,00000100100,10,00000100100,0110,00000100110,0110000010110,10,11,10,0111,10,11,10

Convert to VLC’s using table Table 2-B.4in the MPEG-1 Video spec. VLC’s used inthis example are for illustration only.

• Note that the vertical components of the MV’s are much morecorrelated than the horizontal components.

• Therefore, the MV differentials for the vertical componentscode with fewer bits.

Page 87: mpeg-2VideoCompression

87 © 1995-99Sarnoff Corporation

MPEG-2 Prediction Modes• Frame Prediction

– in a frame picture, field prediction or frame prediction is selectedon a macroblock basis

• Field Prediction– predictions are made independently for each field– in a field picture, all predictions are field predictions

• Dual Prime– can be used in field pictures or frame pictures– can only be used in P pictures– one MV plus a differential MV sent per macroblock

• 16x8 Motion Compensation– can only be used in field pictures– two MV’s are sent for forward or backward prediction– first MV used for upper 16x8 region, second MV for lower– four MV’s are sent for bi-directional prediction

Page 88: mpeg-2VideoCompression

88 © 1995-99Sarnoff Corporation

Allowable MPEG-2 Prediction Modes

Field Pictures

Field Prediction

16x8 MotionCompensation

Dual Prime

Frame Pictures

Field Prediction

Frame Prediction

Dual Prime

Page 89: mpeg-2VideoCompression

89 © 1995-99Sarnoff Corporation

Prediction in Frame Pictures

16x16

16x16

Reference Frame Predicted Frame

Frame PredictionBest 16x16 region inReference Picturedetermines frame MV for16x16 MB. Only modeallowed in MPEG-1.

16x8

16x8

16x8Field PredictionBest 16x8 region inTop or Bottom field inReference Picturedetermines field MV’sfor Top and Bottomportions of 16x16 MB.

Current MB

Top Field of Current MB

Bottom Field of Current MB

16x8

16x8

16x8

or

or

TopField

BottomField

Page 90: mpeg-2VideoCompression

90 © 1995-99Sarnoff Corporation

In Frame PicturesSingle MV (heavy arrow) sent inbitstream; this represents predictions fromfields of same parity. Small differentialMV’s are also sent; these represent offsetpredictions from fields of opposite parity.Same and opposite field predictions areaveraged to form final prediction for each16x8 region of current MB.

Dual-Prime Prediction

Reference Frame Predicted Frame

16x8

16x8

16x816x8

Top Field of Current MB

Bottom Field of Current MB

16x816x8

AverageTopField

BottomField

Average

In Field PicturesSingle MV (heavy arrow) sent inbitstream; this represents prediction fromfield of same parity. A small differentialMV is also sent; this represents an offsetprediction from field of opposite parity.Same and opposite field predictions areaveraged to form final prediction forcurrent 16x16 MB.

FirstField

SecondField

Average16x16

This Field notyet decoded.

16x16

16x16

Page 91: mpeg-2VideoCompression

91 © 1995-99Sarnoff Corporation

Dual-Prime Prediction in V-TTop Bottom

ReferencePicture

PredictedPicture

Vector Transmittedin Bitstream forSame Parity Fields

Top Bottom

Differential VectorTransmittedin Bitstream(limited to values-1, 0, +1)

Vector Derivedat Decoderfor OppositeParity Fields

Page 92: mpeg-2VideoCompression

92 © 1995-99Sarnoff Corporation

Concealment Motion Vectors• An MPEG-2 enhancement; not a requirement• Helps in concealing errors when data is lost• Concealment motion vectors (CMV’s), if sent,

are coded with Intra macroblocks (MB’s)• CMV’s should be used in MB’s immediately

below the one in which the CMV occurs

Group ofIntra-coded

Macroblockswith CMV’s

Use CMV’s in thisrow for MB’s below

Macroblocks inthis row are lost

Page 93: mpeg-2VideoCompression

93 © 1995-99Sarnoff Corporation

Inter/Intra Decision

• On a macroblock basis, decide whetherit's more efficient to code original signalor motion compensated prediction error

• Some pictures are coded entirelyintraframe (I-pictures). This is useful forresetting prediction loop and for editing

• Basic structure of H.261 codec

Motion Vectors

ReconstructedImageImage DCT Q DCT -1VLC VLD Q -1

RateController

-+

-

CBR

+Motion

CompensatorQ -1

DCT -1

+Motion

Compensator

MotionEstimation

ReconstructedImage

BUF BUF

0 1Intra/Inter

Mode

0

0

1

01

0

Intra/InterDecider

Motion Vectors

Page 94: mpeg-2VideoCompression

94 © 1995-99Sarnoff Corporation

Selection of Macroblock Type...following the MPEG-1 simulation model...

1. MC vs. No MC– if Motion Compensation is best, select “MC” and transmit

motion vector(s); if B picture, select forward, backward orinterpolated

– otherwise, select “No MC”; do not transmit motion vector; itis assumed to be 0

2. Intra vs. Inter– should MV found in step 1 be used? If so, select “Inter”

3. Coded vs. Not Coded– if quantized prediction error is zero, select “Not Coded”

4. Quant vs. No Quant– if quantizer scale needs to be changed, select “Quant”

Page 95: mpeg-2VideoCompression

95 © 1995-99Sarnoff Corporation

Example of MB Type Selectionfor P Pictures

Begin

MC

No MC

Non Intra

Intra

Non Intra

Coded

Not Coded

Coded

Not Coded

Quant

No Quant

Quant

No Quant

Quant

No Quant

pred-mcq

pred-mc

pred-m

intra-q

intra-d

pred-cq

pred-c

skipped

Page 96: mpeg-2VideoCompression

96 © 1995-99Sarnoff Corporation

Example of MB Type Selectionfor B Pictures

Begin

MC

No MC

Forward

Intra

Coded

Not Coded

Quant

No Quant

Quant

No Quant

pred-fcq

pred-fc

pred-f or skipped

intra-q

intra-d

BackwardCoded

Not Coded

Quant

No Quantpred-bcq

pred-bc

pred-b or skipped

InterpolatedCoded

Not Coded

Quant

No Quantpred-icq

pred-ic

pred-i or skipped

Page 97: mpeg-2VideoCompression

97 © 1995-99Sarnoff Corporation

Macroblocks and Quantizer Scale Codes• Quantizer Scale Codes are 5-bit integers sent in

every slice header and selected MB headers• Decoder uses most recent value for all subsequent

MB’s until another Quantizer Scale Code isencountered.

A single slice

These quant scales coded in bit stream

(9) (9) (9) (5) (4) (6) (6) (6)9 5 4 6SliceHeader

Decoder uses values shown in parentheses

A singleMB

Page 98: mpeg-2VideoCompression

98 © 1995-99Sarnoff Corporation

Skipped Macroblocks• MB’s cannot be skipped in I Pictures• MB’s can be skipped in P and B pictures if certain

rules applyPortion of a

P or B Picture A slice

The first MBof a slice must

be coded

The last MBof a slice must

be coded

These MB’s can be skipped if:1) all quantized DCT coeffs = 0, and2) all MV’s = 0 (in P pictures), or

all MV differentials = 0 (in B pictures)

Page 99: mpeg-2VideoCompression

99 © 1995-99Sarnoff Corporation

Forward Analysis and Resequencing

• B frames must be resequenced fromdisplay to coding order

• Basic structure of MPEG codec

Forward Analysis is a look-ahead technique that can beused to help the Rate Controller adjust quantization in amore optimal fashion

Motion Vectors

ReconstructedImageImage DCT Q DCT -1VLC VLD Q -1

RateController

-+

-

CBR

+Motion

CompensatorQ -1

DCT-1

+Motion

Compensator

MotionEstimation

ReconstructedImage

BUF BUF

0 1Intra/Inter

Mode

0

0

1

01

0

Intra/InterDecider

Reseq

ForwardAnalyzer

Reseq

Motion Vectors

Page 100: mpeg-2VideoCompression

100 © 1995-99Sarnoff Corporation

MPEG Bit Stream Structure

SequenceHeaderSequence Sequence

Picturewidth

Pictureheight

Aspectratio Bitrate Picture

rate...

GOPHeader

PictureHeader Picture Picture

Header Picture...

TemporalReference

PictureType

VBVDelay

ExtensionStart Code

PictureStructure... ...

SequenceHeader

• Sequence layer• GOP layer

Page 101: mpeg-2VideoCompression

101 © 1995-99Sarnoff Corporation

MPEG Bit Stream Structure (Cont'd.)

• Picture layer• Slice layer• Macroblock layer• Block Layer

PictureHeader

SliceHeader Macroblock Macroblock Slice

Header Macroblock ...

Address Type QuantizerScale

CodedBlock

PatternBlock BlockMotion

Vectors

...

...

MacroblockBlock

Slice

Page 102: mpeg-2VideoCompression

102 © 1995-99Sarnoff Corporation

3:2 Pulldown• MPEG-2 provides a mechanism for film-originated content to

be coded at 24 frame/sec but displayed at 30 frames/sec• The lower frame rate of film means it can be coded at the

same quality as 30 frame/sec video, but at a lower bit rate.• The repeat_first_field (rff) and top_field_first (tff) flags allow

decoders to recreate the 3:2 pulldown sequence for display.

1/24 sec

1/30 sec1/60 sec

Film Framescoded as progressiveframes at 24 frames/sec

3:2 pulldown alternatelycreates 3 and 2 displayedfields for each input frame

rff=1tff=1

rff=0tff=0

rff=1tff=0

rff=0tff=1

repeatfirst field repeat

first field

Page 103: mpeg-2VideoCompression

103 © 1995-99Sarnoff Corporation

Pan-and-Scan

16:9ReconstructedFrame

4:3Display

Rectangle

In this example the horizontalframe center offset is a positive number.

• MPEG-2 provides a mechanism for panning a displayrectangle around a reconstructed frame

• Horizontal and vertical offsets are specified to 1/16 pixelresolution and can be sent for every displayed field.

• This allows widescreen material to be viewed on 4:3displays.

frame_centre_horizontal_offset

Page 104: mpeg-2VideoCompression

104 © 1995-99Sarnoff Corporation

MPEG-2 Video Decoding Process

NOTE: This is a simplified, high-level functional diagram that integratesseveral separate diagrams in the MPEG-2 Video Spec (ISO/IEC 13818-2).

MPEG-2Bitstream

Parsing

DCTCoeffs VLD Inv

Scan Q-1 DCT-1

Zig-Zag Scan Mode

Quant Scale Factor & Quant Matrices

MotionVectors

VectorPredictors

Dual PrimeArithmetic

ChromaScaling

FramestoreAddressing

Field/Frame Prediction Selection FrameStores

Half-PelPredictionFiltering

CombinePredictions

Half-Pel Info

Sat.Decoded

Pixels

+

VLD

Page 105: mpeg-2VideoCompression

105 © 1995-99Sarnoff Corporation

Special Topics

• More About Rate Control• The Video Buffer Verifier• MPEG-2 Profiles and Levels• Statistical Multiplexing• Practicing the Art of MPEG

Page 106: mpeg-2VideoCompression

106 © 1995-99Sarnoff Corporation

Rate-Distortion Curve

Rate

Distortion0

increasingcomplexity

R1

R2

R3

• As the rate increases, the distortion decreases.• For a given distortion, the rate increases with

complexity.• At zero distortion, the source is coded at its

entropy, Rn.• At zero rate, the source is not coded. The

distortion is equal to the source energy, �n2.

�12

�22

�32

Page 107: mpeg-2VideoCompression

107 © 1995-99Sarnoff Corporation

Distortion and Quant Scale

Distortion

Quantizer Scale Code

increasingcomplexity

• As quant scale increases, so does distortion.• For a given quant scale, the distortion generally increases

with complexity.

1 5 10 15 20 25

�n2

Page 108: mpeg-2VideoCompression

108 © 1995-99Sarnoff Corporation

Bit Rate vs.Quant Scale

Rate(e.g., bits/picture) increasing

complexity

R1

R2

R3• As quant scale decreases, the bit rate increases.• For a given quant scale, the bit rate increases

with complexity.• For minimum distortion, use the smallest quant

scale.

Quantizer Scale Code1 5 10 15 20 25

Page 109: mpeg-2VideoCompression

109 © 1995-99Sarnoff Corporation

Constant Quality Encoding• For a given picture type (I, P or B), constant quality is

achieved with a fixed quant scale.• For sequences with mixed picture types, B pictures can

be coded with somewhat lower picture quality, since theyare not used as the basis for prediction.

5

10

15

I

B

frames(display order)

BP

B BP

I

B B

QuantScaleCode

Example showing B pictures with higher quant scale(i.e., lower quality).

Page 110: mpeg-2VideoCompression

110 © 1995-99Sarnoff Corporation

Constant Quality => VBR

• With a fixed quant scale, the bit rate increases withcomplexity.

• This implies variable bit rate (VBR) encoding.

100 I I

frames(display order)

I I II I I

Bits/Picture(kbits)

Constant Quality Encoding for All I-Frame Sequence- Fixed Quant Scale -

300

500

Isimplescene

moderatelycomplex scene

complexscene

Page 111: mpeg-2VideoCompression

111 © 1995-99Sarnoff Corporation

CBR => Variable Quality

• For many applications, constant bit rate (CBR) encoding isrequired.

• This can lead to highly variable image quality.

100 I I

frames(display order)

I I II I I

Bits/Picture(kbits)

300 kbit/picture (CBR) Encoding for All I-Frame Sequence- Variable Quant Scale -

300

500

Isimplescene

moderatelycomplex scene

complexscene

these picturesare just about right

these pictures needmore bits

(lower quant scaleor add stuffing)

these pictures needfewer bits

(increase quant scale)

Page 112: mpeg-2VideoCompression

112 © 1995-99Sarnoff Corporation

CBR Rate Control

• Goal is to achieve high quality at constant bit rate.• To achieve a constant bit rate, a buffer is used to

smooth out high variability in bits/frame.• In practice, I frames are often given highest quality,

since they form the basis of prediction for all otherpictures in the GOP.

• As complexity increases, the quant scale, on average, isincreased to avoid buffer overflow.

• To approach constant quality from frame to frame, bitsare “stolen” from simple frames and given to complexframes.

• To approach constant quality within a frame, bits are“stolen” from simple areas and given to complex areas.

Page 113: mpeg-2VideoCompression

113 © 1995-99Sarnoff Corporation

What is the Video Buffer Verifier (VBV)?

Video

MPEG Encoder

Output Rate Buffer

MPEG Decoder

MPEG Video Bitstream Video

Input Rate Buffer (VBV)

VBV

• The VBV is a hypothetical input rate buffer for the video decoder, which is connected tothe output of an encoder.

• The encoder keeps track of the VBV fullness, and must ensure that it does not overflow orunderflow.

• Assuming constant end-to-end delay, the encoder buffer is the mirror image of the VBV.

Page 114: mpeg-2VideoCompression

114 © 1995-99Sarnoff Corporation

B

B2

B1

Shuttered Bottom

Volume of water extracted instantaneously

Constant Flow

Tank Fullness

0 T 2T 3T 4T 5T 6T time

Tank fills at constant rate B2/2T until fullness B2 is reached. (Slope = flow rate)

Volume of water (B2-B1) is extracted instantaneously every T seconds starting at 2T.

MPEG Analogs: Tank = Video Buffer Verifier (Hypothetical Decoder Buffer) B = VBV Buffer Size (in Bits) T = Output Frame Period Constant Flow = Constant Input Bit Rate = B2/2T bits/sec Extracted Volume = Coded Bits in Each Picture (B2-B1) 2T = VBV Delay for Each Picture NOTE: In general, coded bits per picture varies greatly!

MPEG's Video Buffer Verifier Water Tank Analogy(Normal Operation)

Page 115: mpeg-2VideoCompression

115 © 1995-99Sarnoff Corporation

B

B2

B1

Shuttered Bottom

Volume of water extracted instantaneously

Constant Flow Tank Fullness

0 T 2T 3T 4T 5T 6T time

Tank fills at constant rate B2/2T.

Volume of water (B2-B1)/2 is extracted instantaneously every T seconds starting at 2T.

Overflow!

MPEG's Video Buffer Verifier Water Tank Analogy(Overflow Condition)

Page 116: mpeg-2VideoCompression

116 © 1995-99Sarnoff Corporation

B

B2

B1

Shuttered Bottom

Volume of water extracted instantaneously

Constant Flow

Tank Fullness

0 T 2T 3T 4T 5T 6T time

Tank fills at constant rate B2/2T.

Volume of water 3*(B2-B1)/2 is extracted instantaneously every T seconds starting at 2T.

Underflow!

MPEG's Video Buffer Verifier Water Tank Analogy(Underflow Condition)

Page 117: mpeg-2VideoCompression

117 © 1995-99Sarnoff Corporation

Coded Bitsfor Pict 2

B

b(1)

B = vbv_buffer_size (bits)

0 T 2T 3T 4T 5T 6T time

vbv_buffer_size (in units of

16*1024 bits)

vbv_delay(1) (in units of

90kHz clocks)

vbv_delay(2) (in units of

90kHz clocks)

vbv_delay(3) (in units of

90kHz clocks)

7T 8T

b(2)

b(3)

vbv_delay(1)vbv_delay(2)

vbv_delay(3)

vbv_delay(n) tells decoder how long to wait beforeextracting bits for n’th picture, assuming initially empty buffer.

vbv_delay(n) = 90,000*b(n)/R, where R = bit rate in bits/sec.

Note that vbv_delay(n) is therefore proportional to fullness.

Slope =

R

Slope =

R

Slope =

R

Slope =

R

NOTE: Slopes are all equal in Constant Bit Rate operation!

-T/2

SequenceHeader

GOPHeader

PictureHeader

Coded Bitsfor Pict 1

PictureHeader

PictureHeader

Coded Bitsfor Pict 3

VBV Buffer Size and VBV Delay

All bits forPicture 1

All bits forPicture 4

PictureHeader

Coded Bitsfor Pict 4

Page 118: mpeg-2VideoCompression

118 © 1995-99Sarnoff Corporation

CBR vs. VBR: VBV Models

VBVFullness

Time

CBR: VBV fills at actual bit rate

VBVFullness

Time

VBR: VBV fills at max bit rate until full, then waits

Slope= Rmax

Slope= Ract

Page 119: mpeg-2VideoCompression

119 © 1995-99Sarnoff Corporation

Profiles and Levels

Problem: A Decoder that could decode any MPEG-2 bitstreamwould be prohibitive in terms of memory andperformance. Decoder manufacturers might chooseproprietary subsets of the syntax, preventinginteroperability.

Solution: Pre-defined subsets of the syntax: Profiles & Levelscreate “compliance points”

Profile: A defined subset of syntax elements in MPEG-2(e.g, 4:2:0 only, I/P frames only, field DCT, etc.)

Level: Parameter constraints on those syntax elements (e.g.,max Picture Size, max Bit Rate, max Vertical MotionVector, max Buffer Size, etc.)

Page 120: mpeg-2VideoCompression

120 © 1995-99Sarnoff Corporation

Profiles and Levels • Profiles: Simple, Main, SNR, Spatial, High, 4:2:2• Levels: Low, Main, High-1440, High• Not all Profile/Level combinations are allowed.• Main Profile:

- B Frames supported (not so in Simple Profile)- 4:2:2 and 4:4:4 not supported- Scalable Modes not supported- Restricted slice structure

• Main Level:- max Picture size: 720x576, 30 frames/sec- max Bitrate: 15 Mbps- max Buffer size: 1.835008 Mbits

• A Compliance Point is a Profile at a Level,- e.g., Main Profile at Main Level, “MP@ML”

Page 121: mpeg-2VideoCompression

121 © 1995-99Sarnoff Corporation

Profiles and Levels

ATSCFormats

720H576V30Hz

Simple1920H1152V60Hz

1440H1152V60Hz

720H576V30Hz

352H288V30Hz

Main

720H576V30Hz

352H288V30Hz

SNR

1440H1152V60Hz

720H576V30Hz

Spatial1920H1152V60Hz

960H576V30Hz

1440H1152V60Hz

720H576V30Hz

720H576V30Hz

352H288V30Hz

HighProfile

High

High-1440

Main

Low

Level

Max H SizeMax V Size

Max Frame RateKey:

Notes: 1) A split box shows constraints on Enhancement Layer(left) and Base Layer (right)2) In general, a compliant decoder must also handle alllower Profile and Level compliance points.

SMPTE308M

720H512V/608V

30Hz

4:2:2

Page 122: mpeg-2VideoCompression

122 © 1995-99Sarnoff Corporation

Statistical Multiplexing (Stat Mux)

• Stat mux exploits the fact that the codingcomplexities of a selection of video sources,at any given time, are usually quite different.

• For a large group of video sources, theremight be only one or two “difficult” scenes atany given time.

• Stat mux uses variable bit rate (VBR)encoding to give more bits to the moredifficult scenes.

Page 123: mpeg-2VideoCompression

123 © 1995-99Sarnoff Corporation

Typical Stat Mux Encoder

• The bit rates of the individual encoders are adjusted so that thetotal bit rate is constant.

• Depending on the algorithm, the individual bit rates can beadjusted at, for instance, a picture or GOP level.

Encoder 1Video 1VBR

Bitstream 1

Encoder 2Video 2VBR

Bitstream 2

Encoder 3Video 3VBR

Bitstream 3

•••Mux

Stat MuxController

CBRBitstream Multi-Program

Multiplex

Page 124: mpeg-2VideoCompression

124 © 1995-99Sarnoff Corporation

Bit Rate and Buffer Issues

• The bit rates and buffer sizes in a stat muxsystem cannot be arbitrarily chosen.

• To prevent buffer underflow or overflow, it issufficient that the following relationship hold:

Dsize = Esizermaxrmin

where Dsize = decoder buffer sizeEsize = encoder buffer sizermax = maximum instantaneous bit ratermin = minimum instantaneous bit rate

Page 125: mpeg-2VideoCompression

125 © 1995-99Sarnoff Corporation

Why Use Stat Mux?

• Stat Mux can increase the number of codedprograms in a fixed bandwidth, withoutdecreasing the quality of any program.

• Broadcasters love this, since it meanssqueezing even more programs into achannel or transponder!

• Stat Mux R&D is still in its infancy, andalgorithms are highly proprietary.

• Existing Stat Mux products achieve this goalwith varying degrees of success.

Page 126: mpeg-2VideoCompression

126 © 1995-99Sarnoff Corporation

Practicing theArt of MPEG

Page 127: mpeg-2VideoCompression

127 © 1995-99Sarnoff Corporation

MPEG Artifacts: What to look for

• Blocky Artifacts– seen when the eye tracks a fast-moving, detailed object– may also be seen during dissolves and fades– blocky grid remains fixed while the object moves under it– caused by poor motion estimation and/or insufficient

allocation of bits

• “Mosquito Noise”– may be seen at the edges of text, logos and other sharply

defined objects– the edge causes high freqency DCT terms, which are

coarsely quantized and spread spatially when transformedback into the pixel domain

Page 128: mpeg-2VideoCompression

128 © 1995-99Sarnoff Corporation

MPEG Artifacts: What to look for (cont’d)

• Dirty Window– streaks or noise appear to remain stationary while objects

move beneath it (like looking through a dirty window)– the encoder may not be sending enough bits to code the

residual (prediction) error in P and B frames

• “Wavy Noise”– often seen during slow pans across highly detailed objects,

such as crowds in a stadium– the coarsely quantized high frequency terms cause

reconstruction errors to modulate spatially as details shiftwithin the DCT blocks.

Page 129: mpeg-2VideoCompression

129 © 1995-99Sarnoff Corporation

Where MPEG CompressionCan Perform Poorly

• For types of motion that don’t fit thelinear translation model– zooms– rotations– transparent/translucent moving objects– dissolves containing moving objects

• For other things that can’t bepredicted well– shadows– changes in brightness (fade-ins, fade-outs)– scene cuts– highly detailed, uncovered regions– noise effects– additive noise

Page 130: mpeg-2VideoCompression

130 © 1995-99Sarnoff Corporation

Tips for Higher Quality Coding• Remove Noise

– coding noise wastes valuable bits!– consider using preprocessing technology that can remove

Gaussian noise, impulse noise, NTSC/PAL decoding artifacts,film grain, film streaks, etc.

• Code film material at its original frame rate.– Use high-quality inverse telecine algorithms

• Code material at proper image size– for the same bit rate, a reduction in coding noise can be

achieved by simply reducing the horizontal image size– because of interlace, use care when reducing vertical image

size

• Use high-quality Stat Mux algorithms

Page 131: mpeg-2VideoCompression

131 © 1995-99Sarnoff Corporation

Tips for Higher Quality Encoding (cont’d)

• Rate Control– over time, improved rate control techniques will become

available– ultimately, we would like to evaluate the perceptual impact of

each mode decision, and choose the modes that result in thefewest bits with the lowest perceptual degradation

• Motion Estimation– the larger the search area, the faster the motion that can be

well predicted– this comes at a price: full search is good, but is usually too

expensive– new, hierarchical techniques are being developed that can

approach full search in terms of quality, are closer to “truemotion”, and are not fooled by brightness changes

Page 132: mpeg-2VideoCompression

132 © 1995-99Sarnoff Corporation

MPEG-1 Video:1 - 3 Mbps: CD-ROM Multimedia

Telecommunications and Near Video on Demand

MPEG-2 Video:3 - 15 Mbps: SDTV Broadcast (e.g., ATSC and DVB)

Digital Video Disk (DVD)

15 - 20 Mbps HDTV Broadcast (e.g., ATSC)

25 - 50 Mbps SDTV Production

100 - 300 Mbps HDTV Production

Major Application Areas

Page 133: mpeg-2VideoCompression

133 © 1995-99Sarnoff Corporation

What is ATSC Video?

ATSC Video = MPEG-2 Video+ ATSC Constraints+ ATSC Extensions

Page 134: mpeg-2VideoCompression

134 © 1995-99Sarnoff Corporation

ATSC Video Constraints

• Sequence Layer– Video Formats as per Table 3 in ATSC Doc. A/53,

Annex A– Bit Rate �� 19.4 Mb/s– VBV Buffer Size �� 7.99 Mbit– Chroma Format 4:2:0– Component Video Format

• Picture Layer– VBV Delay <= 0.5 sec (limits channel change

delay)

Page 135: mpeg-2VideoCompression

135 © 1995-99Sarnoff Corporation

ATSC Video FormatsHDTV

1920 x 108016:9

1280 x 72016:9

SDTV

704 x 48016:9

704 x 4804:3

640 x 4804:3

(VGA)

24, 30 and 60 frames/sec allowed18 formats all together

Page 136: mpeg-2VideoCompression

136 © 1995-99Sarnoff Corporation

ATSC Video Extension: DTV CC

• DTV Closed Captioning Sent in Video UserData

• 9600 bps: 10x more capacity than analog VBImethod (EIA-608)

• DTV CC Descriptor indicates– number of services– language of each service– whether CC is limited to Line 21 data– whether text is tailored to beginning readers– whether text is formatted to widescreen displays

• See EIA-708B for more details

Page 137: mpeg-2VideoCompression

137 © 1995-99Sarnoff Corporation

Concluding Remarks

• The MPEG video compression standard is the resultof many years of competitive and, ultimately,collaborative effort among many commercial andacademic laboratories

• MPEG video compression can increase abroadcaster’s channel capacity by 8x or more

• MPEG video compression is being used successfullyin many application areas, such as:– CD-ROM and DVD multimedia– Satellite Broadcast– Terrestrial Broadcast– Cable Broadcast– Telco Video-on-Demand Systems

Page 138: mpeg-2VideoCompression

138 © 1995-99Sarnoff Corporation

MPEG-2 Video References

• MPEG-2 Books– Mitchell, J.L., Pennebaker, W.B., Fogg, C.E., and LeGall, D.J., MPEG Video

Compression Standard, Chapman & Hall, 1997.– Haskell, B.G., Puri, A. and Netravali, A.N., Digital Video: An Introduction to

MPEG-2, Chapman & Hall, 1997.– Rao, K.R. and Hwang, J.J., Techniques & Standards for Image, Video and Audio

Coding, Prentice Hall, 1996.– Weiss, S. Merrill, Issues in Advanced Television Technology, Focal Press, 1996.

• MPEG-2 Video Specification– ISO/IEC IS 13818-2, “Generic Coding of Moving Pictures and Associated Audio

Information: Video”, January 20, 1995.

• MPEG Web sites– MPEG Pointers and Resources http://www.mpeg.org– ATSC http://www.atsc.org– DVB http://www.dvb.org