YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: VIDEO COMPRESSION  FUNDAMENTALS

1

VIDEO COMPRESSION FUNDAMENTALS

Pamela C. Cosman

Page 2: VIDEO COMPRESSION  FUNDAMENTALS

2

Compressing Digital Video

Exploit spatial redundancy within frames (like JPEG: transforming, quantizing, variable length coding)

Exploit temporal redundancy between frames Only the sun has changed position between these 2 frames

Previous Frame Current Frame

Page 3: VIDEO COMPRESSION  FUNDAMENTALS

3

Simplest Temporal Coding - DPCM

Frame 0 (still image) Difference frame 1 = Frame 1

– Frame 0 Difference frame 2 = Frame 2

– Frame 1 If no movement in the scene,

all difference frames are 0. Can be greatly compressed!

If movement, can see it in the difference images

0 1 2 3

Page 4: VIDEO COMPRESSION  FUNDAMENTALS

4

Difference Frames

Differences between two frames can be caused by Camera motion: the outlines of background or

stationary objects can be seen in the Diff Image Object motion: the outlines of moving objects can

be seen in the Diff Image Illumination changes (sun rising, headlights, etc.) Scene Cuts: Lots of stuff in the Diff Image Noise

Page 5: VIDEO COMPRESSION  FUNDAMENTALS

5

Difference Frames

If the only difference between two frames is noise (nothing moved), then you won’t recognize anything in the Difference Image

But, if you can see something in the Diff Image and recognize it, there’s still correlation in the difference image

Goal: remove the correlation by compensating for the motion

Page 6: VIDEO COMPRESSION  FUNDAMENTALS

6

Page 7: VIDEO COMPRESSION  FUNDAMENTALS

7

Page 8: VIDEO COMPRESSION  FUNDAMENTALS

8

Page 9: VIDEO COMPRESSION  FUNDAMENTALS

9

Types of Motion

Translation: simple movement of typically rigid objects

Camera pans vs. movement of objects

Frame n Frame n+1

Rotation: spinning about an axis Camera versus object

rotation Zooms –in/out

Camera zoom vs. object zoom (movement in/out)

Frame n+1 (Rotation)

Frame n+2 (Zoom)

Frame n

Page 10: VIDEO COMPRESSION  FUNDAMENTALS

10

Describing Motion

Translational Move (object) from (x,y) to (x+dx,y+dy)

Rotational Rotate (object) by (r rads) (counter/clockwise)

Zoom Move (in/out) from (object) to increase its size by

(t times)

Which is easiest? Which are we most likely to encounter?

Page 11: VIDEO COMPRESSION  FUNDAMENTALS

11

Motion Estimation

Determining parameters for the motion descriptions

For some portion of the frame, estimate its movement between 2 frames- the current frame and the reference frame

What is some portion? Individual pixels (all of them)? Lines/edges (have to find them first) Objects (must define them) Uniform regions (just chop up the frame)

Page 12: VIDEO COMPRESSION  FUNDAMENTALS

12

General Idea

For a region PC in the current frame, find a region PR in the search window in reference frame so that Error(PR,PC) is minimized

Issues: Error measures, search techniques, choice of search window, choice of reference frame, choice of region PC

Current

Frame

Portionofinterest

PC

Searchwindow

Reference

Frame

Page 13: VIDEO COMPRESSION  FUNDAMENTALS

13

Block-based Motion Estimation PC is a block of pixels (in the current frame) The search window is a rectangular segment

(in the reference frame)

T=1 (reference) T=2 (current)

Page 14: VIDEO COMPRESSION  FUNDAMENTALS

14

Motion Vectors A motion vector (MV) describes the offset between

the location of the block being coded (in the current frame) and the location of the best-match block in the reference frame

T=1 (reference) T=2 (current)

Page 15: VIDEO COMPRESSION  FUNDAMENTALS

15

Motion Compensation

The blocks being predicted are on a grid

The blocks used for prediction are NOT

16

9

8765

43211

5

9

34

6 7 8

2

1413

121110

13

14

15

15 16

1011

12

Page 16: VIDEO COMPRESSION  FUNDAMENTALS

Motion Vector Search

1. Mean squared error Select a block in the

reference frame to minimize

Σ(b(Bref)-b(Bcurr))2

2. Mean abs. error Select block to

minimize

Σ|b(Bref)-b(Bcurr)|

Given error measure, how to efficiently determine best-match block in search window? Full search: best results,

most computation Logarithmic search –

heuristic, faster Hierarchical motion

estimation

16

Page 17: VIDEO COMPRESSION  FUNDAMENTALS

17

Motion Vector Search Full search: Evaluate

every position in the search window

Logarithmic Search: First examine positions marked 1.

Choose best of these (lowest error measure) and examine positions marked 2 surrounding it

Choose the best of these, and examine the positions marked 3

Final result = best of these

Page 18: VIDEO COMPRESSION  FUNDAMENTALS

18

Hierarchical Motion Estimation Use an averaging filter on the image, then

downsample by a factor of 2 Conduct a search on the downsampled

image (only ¼ of the size) Given the results of the search on the

downsampled image, return to the full resolution image and refine the search there

Page 19: VIDEO COMPRESSION  FUNDAMENTALS

19

Motion Compensation

The standards do not specify HOW the encoder will find the motion vectors (MVs)

The encoder can use exhaustive/fast search, MSE /MAE/other error metric, etc.

The standard DOES specify The allowable syntax for specifying the MVs What the decoder will do with them

What the decoder does is to grab the indicated block from reference frame, and glue it in place

Page 20: VIDEO COMPRESSION  FUNDAMENTALS

20

Standard specifies bit stream

ENCODER DECODER

bit stream

Standard defines this

not thisnot this

The video compression standards define syntax and semantics for the bit stream between encoder and decoder

Encoder is not specified by MPEG except that it produces a compliant bit stream

Compliant decoder must interpret all legal MPEG bit streams

This allows future encoders of better performance to remain compatible with existing decoders.

Also allows for commercially secret encoders to be compatible with standard decoders

Today’s Ho-Hum Encoder

Tomorrow’s Nifty Encoder

Very secret Encoder

Today’s Decoder

Today’s decoder still works!

Page 21: VIDEO COMPRESSION  FUNDAMENTALS

21

Motion Compensation Example

Frame n-1

MOTION COMPENSATED

Frame n

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0)

Frame n

Page 22: VIDEO COMPRESSION  FUNDAMENTALS

22

Objects versus Macroblocks Real moving objects will not coincide with

boundaries of macroblocks

If encoder sends MV=(MotX,MotY), object well coded, but background poorly coded

If encoder sends MV=(0,0), background well coded, but moving object poorly coded

Either approach is valid

background

moving object

Prediction error

Moving object well encoded with motion vector

Prediction error

Background well encoded (no motion vector)

Page 23: VIDEO COMPRESSION  FUNDAMENTALS

23

Motion Compensation

This glued together frame is called

the motion compensated frame The encoder can also form the difference between

the motion compensated frame and the actual frame.

This is called the motion compensated difference frame

This difference frame formed using MC should have less correlation between pixels than the difference frame formed without using MC

Page 24: VIDEO COMPRESSION  FUNDAMENTALS

Motion Compensated Difference Frames

24

Suppose we are doing lossless coding Encoder has sequence of frames: …, F(n-2), F(n-1) Next: encode F(n) Past frames have been losslessly encoded, so the

decoder knows F(n-1) perfectly already Encoder sends the motion vectors for frame F(n)

relative to frame F(n-1), to form motion compensated frame M(n) Encoder knows M(n), Decoder knows M(n)

Page 25: VIDEO COMPRESSION  FUNDAMENTALS

25

Motion Compensation Example

F(n-1)

MOTION COMPENSATED Frame

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0)

F(n)

M(n)

Page 26: VIDEO COMPRESSION  FUNDAMENTALS

Encoding Difference Frames

Encoder forms motion compensated diff frame:

MCD(n) = F(n) – M(n) Encoder losslessly

encodes MCD(n) Decoder can then do

F(n) = MCD(n) + M(n) → knows F(n) exactly

With no motion compensation encoder could do frame diff: FD(n) = F(n) – F(n-1)

Encoder losslessly encodes FD(n)

Decoder can then do

F(n) = FD(n) + F(n-1) → knows F(n) exactly

26

If successive frames are very similar: fewer bits to send Motion Vectors + MCD(n) instead of FD(n)fewer bits to send FD(n) instead of F(n)

Page 27: VIDEO COMPRESSION  FUNDAMENTALS

27

Motion compensated difference frames Decoder knows F(n-1) and, once you send the

motion vectors, it knows M(n)

Reference Frame F(n-1)

Original Frame F(n)

Difference Image FD(n)=F(n)-F(n-1)

Motion compensated frame M(n)

Motion compensated difference image MCD(n) =F(n) – M(n)

Send Motion Vectors

Send FD(n)

Send MCD(n)

Page 28: VIDEO COMPRESSION  FUNDAMENTALS

Motion Compensated Difference Frames

28

But we are NOT doing lossless coding Encoder has sequence of frames: …, F(n-2), F(n-1) Next: encode F(n) Past frames have been lossy encoded, so the

decoder has versions …, G(n-2), G(n-1) Encoder knows …, G(n-2), G(n-1) also Encoder sends the motion vectors for frame F(n)

relative to frame G(n-1), to form motion compensated frame M(n)

Page 29: VIDEO COMPRESSION  FUNDAMENTALS

Encoding Difference Frames

Encoder forms motion compensated difference frame: MCD(n) = F(n) – M(n)

Encoder lossy encodes MCD(n) Call the decoder version MCD*(n) If the decoder received MCD(n) exactly,

could do: F(n) = MCD(n) + M(n) But with MCD*(n), decoder can do

G(n) = MCD*(n) + M(n) → knows F(n) approximately

29

Page 30: VIDEO COMPRESSION  FUNDAMENTALS

30

Motion estimation philosophy

Goal of motion estimation is NOT to provide a careful analysis of the actual motion

Goal is to achieve a given quality of representation of the video while globally minimizing the bit rate required to send The motion information The prediction error information

Most of the time, for a given representation quality fewer bits to send MV+MCD(n) instead of sending FD(n) fewer bits to send FD(n) instead of sending F(n) itself.

Page 31: VIDEO COMPRESSION  FUNDAMENTALS

31

Motion Compensation for Chrominance Luminance is highly correlated, more so than

chrominance The “best” motion vectors are available by

searching in the luminance plane Motion vectors for chrominance are not

computed separately, simply scaled as needed

Page 32: VIDEO COMPRESSION  FUNDAMENTALS

32

Motion Estimation/Compensation Summary At the encoder:

For each block in the frame being coded, examine the search window(s) in the reference frame to find the best match block (do this for luminance only)

Form the MC difference image = original image minus motion compensated image

Scale the motion vectors for the chrominance, form the motion compensated chrominance frames, and form chrominance difference image

Page 33: VIDEO COMPRESSION  FUNDAMENTALS

33

Motion Estimation/Compensation Summary At the decoder:

Decode the reference frames (Y,Cr,Cb) For each block in a temporally coded Y frame,

use the motion vector to select a block from the reference frame and glue it in place

Add the Y difference image For each block in temporally coded Cr,Cb frames,

first scale the motion vector, then do the previous 2 steps with Cr and Cb data

Page 34: VIDEO COMPRESSION  FUNDAMENTALS

34

Progress of Video Compression

Page 35: VIDEO COMPRESSION  FUNDAMENTALS

35

Progress of Video Compression

Page 36: VIDEO COMPRESSION  FUNDAMENTALS

36

Progress of Video Compression

Page 37: VIDEO COMPRESSION  FUNDAMENTALS

37

Temporal Location of Reference The reference frame need not occur before

the temporally coded frames which use it

Why? Scene changes, allow better matches

Page 38: VIDEO COMPRESSION  FUNDAMENTALS

38

Flavors of Motion Estimation

1. Forward predicted blocks: the best-match block occurs in the reference frame before the block’s frame

2. Backward predicted blocks: the best-match block occurs in the reference frame after the block’s frame

3. Interpolatively predicted blocks: the best-match block is the average of the best-match blocks from reference frames before & after

The motion compensation direction can be selected independently for each block in a frame.

Page 39: VIDEO COMPRESSION  FUNDAMENTALS

39

MPEG Frame Types

Intra (I) pictures: coded by themselves, as still images. No temporal coding. No motion vectors.

Page 40: VIDEO COMPRESSION  FUNDAMENTALS

40

MPEG Frame Types

Forward Motion Compensated predicted (P) pictures – forward motion compensated from the previous I or P frame

Page 41: VIDEO COMPRESSION  FUNDAMENTALS

41

MPEG Frame Types Motion Compensated interpolated (B) pictures –

forward, backward, and interpolatively motion compensated from previous/next I/P frames

Page 42: VIDEO COMPRESSION  FUNDAMENTALS

42

Motion Vector Coding

How are the motion vectors actually encoded for transmission to the decoder? Start by taking the difference between the current

motion vector and the most recent previous one of the same type (forward/backward/interpolative)

Encode the difference using variable length coding

Horizontal and vertical components coded separately

Page 43: VIDEO COMPRESSION  FUNDAMENTALS

43

MPEG Frame Structure Terminology A block contains 8x8 pixels

The DCT unit A macroblock (MB) contains 4 blocks from

the luminance, plus the corresponding chrominance blocks 4 blocks from each of Cr/Cb if 4:4:4 format 2 blocks from each of Cr/Cb if 4:2:2 format 1 block from each of Cr/Cb if 4:1:1 or 4:2:0 format The motion compensation unit

Page 44: VIDEO COMPRESSION  FUNDAMENTALS

44

MPEG Frame Structure Terminology A slice is a collection of macroblocks, tracing

in a raster scan from upper left to lower right The resynchronization unit

A picture is a frame, either progressive (non-interlaced) or interlaced The primary coding unit

A Group of Pictures (GOP) contains ≥ 1 frame. The unit for random access into the sequence

Page 45: VIDEO COMPRESSION  FUNDAMENTALS

45

MPEG GOP Structure

A Group of Pictures (GOP) may contain All I pictures I & P pictures only I, P, & B Pictures

A common GOP format for 30 frames/sec: I-picture spacing 15 frames (1/2 second) P-picture spacing 3 frames (1/10 second)

I B B P B B P B B P B B P B B I1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Page 46: VIDEO COMPRESSION  FUNDAMENTALS

46

Frame Ordering

Display order (encoder input order):

But consider coding dependencies: Frame 2 (B) needs frame 4 (P) to be decoded first, etc. So better transmit frame 4 before frame 2

B B I B B P B B P B B P B B P B B I

-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

I B B P B B P B B P B B P B B I B B

1 -1 0 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15

Page 47: VIDEO COMPRESSION  FUNDAMENTALS

47

Types of Coding Modes

What if the best-match block in the reference frame is a great match? Then the motion vector is all you need to send

What if it is a terrible match? Then don’t use the motion vector at all, just code

the block by itself, with something like JPEG (called intra mode coding)

What if it is a so-so match? Then you can send the MV, and also send the

frame difference information for that macroblock

Page 48: VIDEO COMPRESSION  FUNDAMENTALS

48

Coding Mode I (Inter-Coding)

CurrentFrame

PreviousFrame

Motion Vector

MacroBlock

Inter coding refers to coding with motion vectors

Page 49: VIDEO COMPRESSION  FUNDAMENTALS

49

Coding Mode II (Intra-Coding)

CurrentFrame

PreviousFrame

MacroBlock

INTRA coding refers to coding without motion vectors The MB is coded all by itself, in a manner similar to JPEG

Page 50: VIDEO COMPRESSION  FUNDAMENTALS

50

I-Picture Coding

Two possible coding modes for macroblocks in I-frames Intra- code the 4 blocks with the current

quantization parameters Intra with modified quantization: scale the

quantization matrix before coding this MB All macroblocks in intra pictures are coded Quantized DC coefficients are losslessly

DPCM coded, then Huffman as in JPEG

Page 51: VIDEO COMPRESSION  FUNDAMENTALS

51

I-Picture Coding

8

8

88

Use of macroblocks modifies block-scan order:

Quantized coefficients are zig-zag scanned and run-length/Huffman coded as in JPEG

Very similar to JPEG except (1) scaling Q matrix separately for each MB, and (2) order of blocks

Page 52: VIDEO COMPRESSION  FUNDAMENTALS

P-Picture Coding: many coding modes

Motion compensated coding: Motion Vector only

Motion compensated coding: MV plus difference macroblock

Motion compensation: MV & difference MB with modified quant. scaling

52

DCTMotion Vector

MV = (0,0), just send difference block

MV=(0,0), just send diff block, with modified quantization scaling

Intra: the MB is coded with DCTs (no difference is computed)

Intra with modified quantization scaling

Page 53: VIDEO COMPRESSION  FUNDAMENTALS

53

How to choose a coding mode? MPEG does not specify how to choose mode

Full search = try everything…the different possibilities will lead to different rate/distortion outcomes for that macroblock

• Intra

• MV, no difference

• MV, plus difference

rate

distortion •

••

Page 54: VIDEO COMPRESSION  FUNDAMENTALS

54

How to choose a coding mode?

Tree search: use a decision tree For example:

First find the best-match block in the search window. If it’s a very good match, then use motion compensation. Otherwise, don’t.

If you decided to use motion compensation, then need to decide whether or not to send the difference block as well. Make decision based on how good a match it is.

If you decided to send the difference block, then have to decide whether or not to scale the quantization parameter… check the current rate usage…

Page 55: VIDEO COMPRESSION  FUNDAMENTALS

55

B-Picture Coding

B pictures have even more possible modes: Forward prediction MV, no difference block Forward prediction MV, plus difference block Backward prediction MV, no difference block Backward prediction MV, plus difference block Interpolative prediction MV, no difference block Interpolative prediction MV, plus difference block Intra coding Some of above with modified Quant parameter

Page 56: VIDEO COMPRESSION  FUNDAMENTALS

56

Group of Pictures

IIIII…: Every picture is intra-coded. Fully decodable without reference to any other picture Editing is straightforward Requires about 2.5 more bit rate than bidirectional

IBBPBBPB…: Forward and bidirectional Best compression factor Needs large decoder memory Hard to edit Most useful for final delivery of post-produced material

(e.g., broadcast) because no editing requirement

Page 57: VIDEO COMPRESSION  FUNDAMENTALS

57

Group of Pictures

IPPPPIPP…: Forward predicted only. Needs less decoder memory

IBIBIB…: bidirectional compromise Some of the bit rate advantage of bidirectional coding Not nearly the full latency penalty of bidirectional Editable with moderate processing.

For example, if the video after a B picture needs to be deleted, the B frame would not be decodable.

Solution is to decode the B frame first, re-encode it using forward prediction only. Some quality loss.


Related Documents