Top Banner
MPEG-4 AVC (H.264)
113

MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Jan 01, 2016

Download

Documents

Osborne Murphy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

MPEG-4 AVC (H.264)

Page 2: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Introduction

The H.264 is aimed at very low bit rate, real-time, low end-to-end delay, and mobile applications such as conversational services and internet video.

Enhanced visual quality at very low bit rates and particularly at rate below 24kb/s.

Page 3: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Structure of H.264/AVC Video Coder VCL: Designed to efficiently represent the video

content NAL: formats the VCL representation of the video

and provides head information for conveyance by a variety of transport layers or storage media.

Page 4: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.
Page 5: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Video Coding Layer

Page 6: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Basic Structure of VCL

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

Intra-frame Prediction

De-blockingFilter

OutputVideoSignal

Page 7: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Intra-frame Prediction

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

Intra-frame Prediction

De-blockingFilter

OutputVideoSignal

Page 8: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Intra-frame encoding of H.264 supports Intra_4 4, Intra_16 16 and I_PCM.

I_PCM allows the encoder directly send the values of encoded sample.

Intra_4 4 and Intra_16 16 allows the intra prediction.

Page 9: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Intra 44– 9 modes– Used in texture area

Intra 1616– 4 modes– Used in flat area

Page 10: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Four modes of Intra_1616– Mode 0 (vertical) : extrapolation from upper

samples(H)

– Mode 1 (horizontal): extrapolation from left samples(V)

– Mode 2 (DC): mean of upper and left-hand samples (H+V)

– Mode 3 (Plane) : a linear “plane” function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance

Page 11: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example: Original image

Page 12: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Nine modes of Intra_44– The prediction block P is calculated based on the

samples labeled A-M.– The encoder may select the prediction mode for

each block that minimizes the residual between P and the block to be encoded

Page 13: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

Consider a 44 block and its neighbors labeledbelow.

Suppose we use the mode 4 for prediction.

Then

a = (A + 2M + I + 2)/4

Page 14: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

Page 15: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.
Page 16: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Motion Estimation/Compensation

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

Intra-frame Prediction

De-blockingFilter

OutputVideoSignal

Page 17: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Features of the H.264 motion estimation

– Various block sizes– ¼ sample accuracy

• 6-tap filtering to ½ sample accuracy• simplified filtering to ¼ sample accuracy

– Multiple reference pictures

– Generalized B-Frames

Page 18: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Variable Block Size Block-Matching

– In the H.264, a video frame is first splitted using fixed size macroblocks.

– Each macroblock may then be segmented into subblocks with different block sizes.

– A macroblock has a dimension of 16 16 pixels. The size of the smallest subblock is 4 4

0

16x16

0 1

8x16MB

Types

8x80 1

2 3

16x8

1

0

8x8

0

4x8

0 10 1

2 3

4x48x4

1

08x8Types

Page 19: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

This example shows the effectiveness of block matching operations with smaller sizes.

Frame 1

Page 20: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Frame 2

Page 21: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Difference between Frame 1 and Frame 2

Page 22: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Results of block-matching operation with size 16×16

Page 23: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Results of block-matching operation with size 8×8

Page 24: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Results of block-matching operation with size 4×4

Page 25: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

To use a subblock with size less than 88, it is necessary to first split the macroblock into four 88 subblocks.

Page 26: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

Page 27: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Encoding a motion vector for each subblock can cost a significant number of bits, especially if small block sizes are chosen.

Motion vectors for neighboring subblocks are often highly correlated and so each motion vector is predicted from vectors of nearby, previously coded subblocks.

The difference between the motion vector of the current block and its prediction is encoded and transmitted.

Page 28: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The method of forming the prediction depends on the block size and on the availability of nearby vectors.

Let E be the current block, let A be the subblock immediately to the left of E, let B be the subblock immediately above E, and let C be the subblock above and to the right of E.

It is not necessary that A, B, C, and E have the same size.

C

D B

A E

Page 29: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

There are two modes for the prediction of

motion vectors:

•Median prediction

Use for all block sizes excluding 16×8 and 8×16

•Directional segmentation prediction

Use for 16×8 and 8×16

Page 30: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

C

D B

A E

Median predictionIf C not exist then C=DIf B, C not exist then prediction = VA

If A, C not exist then prediction = VB

If A, B not exist then prediction = VC

Otherwise Prediction = median(VA,,VB,VC)

Page 31: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Directional segmentation prediction

• Vector block size 8×16 Left: prediction = VA

Right: prediction = VC

• Vector block size 16×8

Up: prediction = VB

Down: prediction =VA

Page 32: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Fractional Motion Estimation

In H.264, the motion vectors between current block and candidate block has ¼-pel resolution.

The samples at sub-pel positions do not exist in the reference frame and so it is necessary to create them using interpolation from nearby image samples.

Page 33: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

E

K

F

L

A

C

G

M

R

T

B

D

H

N

S

U

I

P

J

Q

aa

bb

b

j

s

gg

hh

cc dd h m ee ff

b=round((E-5F+20G+20H-5I+J)/32)h=round((A-5C+20G+20M-5R+T)/32)j=round((aa-5bb+20b+20s-5gg+hh)/32)

Interpolation of ½-pel samples.

Page 34: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Interpolation of ¼-pel samples.

a=round((G+b)/2)

d=round((G+h)/2)

e=round((b+h)/2)

Page 35: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Multiple Reference Frames

Page 36: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.
Page 37: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The motion estimation techniques based on multiple reference frame technique provides opportunities for more precise inter-prediction, and also improved robustness to lost picture data.

The drawback of multiple reference frames is that both the encoder and decoder have to store the reference frames used for Inter-frame prediction in a multi-frame buffer.

Page 38: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Mobile & Calendar (CIF, 30 fps)

0 1 2 3 426272829303132333435363738

R [Mbit/s]

PS

NR

Y [d

B]

PBB... with generalized B pictures PBB... with classic B pictures PPP... with 5 previous references PPP... with 1 previous reference

~15%

Page 39: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Generalized B Frames

Basic B-frames: The basic B-frames cannot be used as reference frames.

Page 40: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Generalized B-frames: The generalized B-frames can be used as reference frames.

Page 41: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Mobile & Calendar (CIF, 30 fps)

0 1 2 3 426272829303132333435363738

R [Mbit/s]

PS

NR

Y [d

B]

PBB... with generalized B pictures PBB... with classic B pictures PPP... with 5 previous references PPP... with 1 previous reference

>25%

Page 42: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Mobile & Calendar (CIF, 30 fps)

0 1 2 3 426272829303132333435363738

R [Mbit/s]

PS

NR

Y [d

B]

PBB... with generalized B pictures PBB... with classic B pictures PPP... with 5 previous references PPP... with 1 previous reference

~40%

Page 43: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Transformation/Quantization

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

Intra-frame Prediction

De-blockingFilter

OutputVideoSignal

Page 44: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The Discrete Cosine transform (DCT) operates on x, a block of N×N samples and creates X, and N×N block of coefficients.

The forward DCT:

T AxAX

The reverse DCT:

XAAx T

Transformation

Page 45: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

N

nmCmnA n 2

)12(cos),(

The elements of A are:

where

0 ,1

nN

Cn0 ,

2 n

NCn

1

0

1

0 2

)12(cos

2

)12(cos),(),(

N

n

N

mvu N

vm

N

unmnxCCvuX

1

0

1

0 2

)12(cos

2

)12(cos),(),(

N

u

N

vvu N

vm

N

unvuXCCmnx

That is,

Page 46: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

The transform matrix A for a 4×4 DCT is:

8

21cos

2

1

8

15cos

2

1

8

9cos

2

1

8

3cos

2

18

14cos

2

1

8

10cos

2

1

8

6cos

2

1

8

2cos

2

18

7cos

2

1

8

5cos

2

1

8

3cos

2

1

8cos

2

1

)0cos(2

1)0cos(

2

1)0cos(

2

1)0cos(

2

1

A

Page 47: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

8

3cos

2

1

8cos

2

1

8cos

2

1

8

3cos

2

12

1

2

1

2

1

2

18

cos2

1

8

3cos

2

1

8

3cos

2

1

8cos

2

12

1

2

1

2

1

2

1

A

or

cbbc

aaaa

bccb

aaaa

A

where

2

1a

8cos

2

1 b

8

3cos

2

1 c

That is,

Page 48: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The H.264 transform is based on the 4×4 DCT but with some fundamental differences:

1. It is an integer transfer,.2. The core part of the transform can be implemented using only additions and shifts.3. A scaling multiplication is integrated into the quantizer, reducing the total number of multiplications.

Page 49: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Recall that

caba

baca

baca

caba

cbbc

aaaa

bccb

aaaa

xAxAX T

where

2

1a

8cos

2

1 b

8

3cos

2

1 c

Page 50: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

22

22

22

22

111

111

111

111

11

1111

11

1111

babbab

abaaba

babbab

abaaba

d

d

d

d

dd

dd

T

x

E)Cx(CX

1. We call (CxCT) the core 2D transform.2. E is a matrix of scaling factors.3. indicates that each element of (CxCT) is multiplies by the scaling factor in the same position in matrix E (i.e., is scalar multiplication rather than matrix multiplication)

(where d = c/b)

Post-scaling

Page 51: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

To simplify the implementation of the transform, d is approximated by 0.5.

In order to ensure that the transform remains orthogonal, b also needs to be modified so that:

2

1,

5

2,

2

1 dba

Page 52: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

4/2/4/2/

2/2/

4/2/4/2/

2/2/

1121

2111

2111

1121

1221

1111

2112

1111

22

22

22

22

babbab

abaaba

babbab

abaaba

fT

ff

x

E)Cx(CX

The final forward transform becomes

Page 53: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

1/2-11-1/2

11-1-1

1-1/2-1/21

1111

2/1211

112/11

112/11

2/1111

22

22

22

22

T

babbab

abaaba

babbab

abaaba

iii

X

CEXCx

The inverse transform is given by:

Pre-Scaling

Page 54: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Quantization

H.264 assumes a scalar quantization.

The quantization should satisfy the following requirements:

(a) avoid division and/or floating point arithmetic(b) incorporate the post and pre-scaling matrices Ef and Ei.

Page 55: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The basic forward quantizer operation is

Z(u,v)= round( X(u,v)/QStep )

where X(u,v) is a transform coefficient, Z(u,v) is a quantized coefficient, and QStep is a quantizer step size.

Page 56: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

There are 52 quantizers (i.e.,Quantization Parameter (QP)=0-51).

Increase of 1 in QP means an increase of QStep by approximately 12%

Increase of 6 in QP means an increase of QStep by a factor of 2.

Page 57: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The post-scaling factor (PF) (i.e., a2 , ab/2 or b2/4) is incorporated into the forward quantizer in the following way:

1. The input block x is transformed to give a block of unscaled coefficients W=Cf xCf

T.2. Then, each coefficient in W is quantized and scaled in a single operation:

where PF is a2 , ab/2 or b2/4 depending on the position (u,v).

Z(u,v)= round( W(u,v)×PF /QStep )

Why?

•Post-Scaling

Page 58: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

In order to simplify the arithmetic, the factor (PF/QStep) is implemented as a multiplication by a factor MF and a right shift, avoiding any division operations.

Z(u,v)= round( W(u,v)×MF /2qbits )

where

QStep

PFMFqbits

2

andqbits=15+QP/6

Page 59: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Note that the round operation does not have to be the nearest integer operation. In the reference model software, the round operation is realized by

|Z(u,v)|=(|W(u,v)|×MF+f)>>qbitssign(Z(u,v))=sign(W(u,v))

where f is 2qbits/3 for Intra blocks and 2qbits /6 for Inter blocks.

Page 60: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

Suppose QP=4 and (u,v)=(0,0).

Therefore, QStep=1.0, PF=a2=0.25, and qbits=15.

QStep

PFMFqbits

2

From

We have

MF=8192

Page 61: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The MF value for various QPs (QP 5) are shown below.

For QP>5, the factors MF remain unchanged, butqbits increases by 1 for each increment of six in QP.That is, qbits=16 for 6QP 11, qbits=17 for12 QP 17, and so on.

Table_for_MF

Page 62: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

•Pre-Scaling

The de-quantized coefficient is given by

The inverse transform involving pre-scaling operations proceeds in the following way:

1. The dequantized block is pre-scaled to block for core 2D inverse transform.

2. The reconstructed block is then given by

QStepvuZvuX ),(),(

XW

x

iTi CWCx

Page 63: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The pre-scaling factor (PF) (i.e., a2 , ab or b2) is incorporated in the computation of , together with a constant scaling factor of 64 to avoid rounding errors.

W

64),(),( PFQStepvuZvuW

The values at the output of the inverse transform should be divided by 64 to remove the constant scaling factor.

Page 64: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The H.264 standard does not specify QStep or PF directly. Instead, the parameters V=QStep×PF×64 is defined.

The V values for various QPs (QP 5) are shown below.

Table_for_V

Page 65: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

For QP>5, the V value increases by a factor of 2 for each increment of six in QP.

),(),(),( vuVvuZvuW

That is,

where

6/2]6mod[__),( QPQPVforTablevuV

Page 66: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The Complete Transformation, Quantization, Rescaling and Inverse Transformation

Encoding:1. Input 4×4 block: x2. Forward core transform: W=Cf xCf

T

3. Post-scaling and quantization: Z(u,v)= round( W(u,v)×MF /2qbits )

Decoding:1. Pre-scaling:2. Inverse core transform: 3. Re-scaling:

),(),(),( vuVvuZvuW i

Ti CWCx

64/xx

Page 67: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

5 11 8 10

9 8 4 12

1 10 11 4

19 6 15 7

1. Suppose QP=10, and input block x =

140 -1 -6 7

-19 -39 7 -92

22 17 8 31

-27 -32 -59 -21

2. Forward core transform: W =

Page 68: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

3. MF=8192,3355 or 5243, qbits=16 and f is 2qbits/3. Z=

17 0 -1 0

-1 -2 0 -5

3 1 1 2

-2 -1 -5 -1

4. V=32, 50 or 40 because 2QP/6 =2.

W 544 0 -32 0

-40 -100 0 -250

96 40 32 80

-80 -50 -200 -50

Page 69: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

5. Output of the inverse core transform after division by 64 is

x 4 13 8 10

8 8 4 12

1 10 10 3

18 5 14 7

Page 70: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Entropy Coding

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

Intra-frame Prediction

De-blockingFilter

OutputVideoSignal

Page 71: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Here we present two basic variable length coding (VLC) techniques used by H.264: the Exp-Golomb code and context adaptive VLC (CAVLC).

Exp-Golomb code is used universally for all symbols except for transform coefficients.

CAVLC is used for coding of transform coefficients.

•No end-of-block, but number of coefficients is decoded.•Coefficients are scanned backward.•Contexts are built dependent on transform coefficients.

Page 72: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Exp-Golomb codes are variable length codes with a regular construction.

First 9 codewords of Exp-Golomb codes

Exp-Golomb code

Page 73: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Each codeword of Exp-Golomb codes is constructed as follows:

[M zeros][1][INFO]

where INFO is an M-bit field carrying information.

Therefore, the length of a codeword is 2M+1.

Page 74: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Given a code_num, the corresponding Exp-Golomb codeword can be obtained by the following procedure:

(a) M= log2[code_num+1])(b) INFO=code_num+1-2M

Example:

code_num=6M=log2[6+1])=2INFO=6+1-22=3

The corresponding Exp-Golomb codeword =[M zeros][1][INFO]=00111

Page 75: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Given a Exp-Golomb codeword, its code_num can be found as follows:

(a) Read in M leading zeros followed by 1.(b) Read M-bit INFO field(c) code_num=2M+INFO-1

Example:

Exp-Golomb codeword=00111(a) M=2(b) INFO=3(c) code_num=22+3-1=6

Page 76: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

A parameter v to be encoded is mapped to code_num in one of 3 ways:

ue(v) : Unsigned direct mapping, code_num=v. (Mainly used for macroblock type and reference frame index)

se(v): Signed mapping. v is mapped to code_num as follows.

code_num=2|v|, (v0) code_num=2v-1,(v>0)

(Mainly used for motion vector difference and delta QP)

Page 77: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

me(v): Mapped symbols. Parameter v is mapped to code_num according to a table specified in the standard.

This mapping is used for coded_block_pattern parameters. An example of such a mapping is shown below.

Page 78: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

CAVLC

This is the method used to encode residual and zig-zag ordered blocks of transform coefficients.

Page 79: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The CAVLC is designed to take advantage of several characteristics of quantized 4×4 blocks:

• After prediction, transformation and quantization, blocks are typically sparse (containing mostly zeros).

• The highest non-zero coefficients after the zig/zag are often sequences of +/- 1.

• The number of non-zero coefficients in neighboring blocks is correlated.

• The level (magnitude) of non-zero coefficients tends to be higher at the start of the zig-zag scan, and lower towards the high frequencies.

Page 80: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The procedure described below is based on the document entitled

JVT Document JVT-C028, Gisle Bjøntegaard and Karl Lillevold, “Context-adaptive VLC (CVLC) coding of coefficients,” Fairfax, VA, May 2002.

The H.264 CAVLC is an extension of this work.

Page 81: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The CAVLC encoding of a block of transform coefficients proceeds as follows.

1. Encode the number of coefficients and trailing ones.2. Encode the sign of each trailing ones.3. Encode the levels of the remaining no-zero coefficients.4. Encode the total number of zeros before the last coefficients.5. Encode each run of zeros.

Page 82: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The first step is to encode the number of coefficients (NumCoef) and trailling ones (T1s).

NumCoef can be anything from 0 (no coefficient in the block) to 16 (16 non-zero coefficients).

T1s can be anything from 0 to 3.

If there are more than 3 trailing +/- 1s, only the last 3 are treated as ``special cases” and the others are coded as normal coefficients.

•Encode the number of coefficients and trailing ones

Page 83: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

Consider the 4×4 block shown below

-2 4 0 -1

3 0 0 0

0 0 1 0

-1 1 0 0

The Num-Coef=7, and T1s=3

Page 84: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Three tables can be used for the encoding of Num_Coeff and T1: Num-VLC0, Num-VLC1 and Num-VLC2.

Num-VLC0

Page 85: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The selection of tables depends on the number of non-zero coefficients in upper and left-hand previously coded blocks NU and NL. A parameter N is calculate as follows:

If blocks U and L are available (i.e., in the same coded slice), N=(NU+NL)/2If only block U is available, N=NU.If only block L is available, N= NL.If neither is available, N=0.

Page 86: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The selection of table is based on N in the following way:

N Selected Table

0,1 Num-VLC0

2,3 Num-VLC1

4,5,6,7 Num-VLC2

8 or above FLC

The FLC is of the following form:

xxxxyy (i.e., 6 bits)

where xxxx and yy represent Num_Coeff and T1,respectively.

Page 87: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

For each T1, a single bit encodes the sign (0=+,1=-).

These are encoded in reverse order, starting with the highest frequency T1.

•Encode the sign of each trailing ones

Page 88: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The level (sign and magnitude) of each remaining non-zero coefficient in the block is encoded in reverse order.

There are 5 VLC tables to choose from, Lev_VLC0 to Lev_VLC4.

Lev_VLC0 is biased towards lower magnitudes;Lev_VLC1 is biased towards slightly highermagnitudes, and so on.

•Encode the levels of the remaining no-zero coefficients

Page 89: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.
Page 90: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

This is used only when it isimpossible for a coefficient to havevalues +/- 1. It will happen when T1s<3.

Page 91: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.
Page 92: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

To improve coding efficiency, the tables are changed along with the coding process based on the following procedure.

Page 93: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The following shows the table for encoding the total number of zeros before the last coefficient (TotZeros)

•Encode the total number of zeros before the last coefficient

Page 94: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

•Encode each run of zeros

At this stage it is known how many zeros are left to distribute (call this ZerosLeft). When encoding or decoding a non-zero coefficient for the first time, ZerosLeft begins at TotZeros, and decreases as more non-zero coefficients are encoded or decoded.

The number of preceding zeros before each non-zero coefficient (called RunBefore) needs to be coded to properly locate that coefficient.

Before coding the next RunBefore, ZerosLeft is updated and used to select one out of 7 tables.

Page 95: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Why the maximum number is 14?

zero-left

Page 96: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

0 3 -1 0

0 -1 1 0

1 0 0 0

0 0 0 0

Consider the following interframe residual 4×4 block

The zigzag re-ordering of the block is shown below:0,3,0,1,-1,-1,0,1,0,0,0,0,0,0,0,0

Therefore,NumCoeff=5, TotZero=3, T1s=3

Assume N=0

Page 97: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Encoding:

Value Code Comments

NumCoeff=5, T1s=3 0001011 Use Num-VLC0

sign of T1 (1) 0 Starting at highest frequency

sign of T1(-1) 1

sign of T1(-1) 1

Level= +1 1 Use Lev-VLC0

Level= +3 0010 Use Lev-VLC1

TotZeros=3 1110 Also depends on NumCoeff

ZerosLeft=3;RunBefore=1 00 RunBefore of the 1st Coeff

ZerosLeft=2;RunBefore=0 1 RunBefore of the 2nd Coeff

ZerosLeft=2;RunBefore=0 1 RunBefore of the 3rd Coeff

ZerosLeft=2;RunBefore=1 01 RunBefore of the 4th Coeff

ZerosLeft=1;RunBefore=1 No code required; last coeff

The transmitted bitstream for this block is 0001011011100101110001101

Page 98: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Decoding:

Code Value Output Array Comments

0001011 NumCoeff=5, T1s=3

Empty

0 + 1 T1 sign

1 - -1,1 T1 sign

1 - -1,-1,1 T1 sign

1 +1 1,-1,-1,1 level value

0010 +3 +3,1,-1,-1,1 level value

1110 TotZeros=3 +3,1,-1,-1,1

00 RunBefore=1 +3,1,-1,-1,0,1 RunBefore of the 1st Coeff

1 RunBefore=0 +3,1,-1,-1,0,1 RunBefore of the 2nd Coeff

1 RunBefore=0 +3,1,-1,-1,0,1 RunBefore of the 3rd Coeff

01 RunBefore=1 +3,0,1,-1,-1,0,1 RunBefore of the 4th Coeff

0,+3,0,1,-1,-1,0,1 ZeroLeft=1

Page 99: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

De-block Filter

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

Intra-frame Prediction

De-blockingFilter

OutputVideoSignal

Page 100: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The beblocking filter improves subjective visual quality. The filter is highly context adaptive. It operates on the boundary of 4×4 subblock as shown below.

q3

q2

q1

q0

p0

p1

p2

p3

q3 q2 q1 q0 p0 p1 p2 p3

Page 101: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

The choice of filtering outcome depends on the boundary strength and on the gradient of image samples across the boundary.

The boundary strength parameter Bs is selected according to the following rules.

Page 102: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

A group of samples from the set (p2,p1,p0,q0,q1,q2) is filtered only if:

(a) Bs>0 and (b) |p0-q0| < and |p1-p0| < and |q1-q0| <

where and are thresholds defined in the standard.

The threshold values increase with the average quantizer parameter QP of two blocks q and p.

Page 103: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

When QP is small, anything other than a very small gradient across the boundary is likely to be due to image features that should be preserved and so the thresholds and are low.

When QP is larger, blocking distortion is likely to be more significant and and are higher so that more boundary samples are filtered.

Page 104: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

without deblock filtering with deblock filtering

Page 105: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Data Partitioning andNetwork Abstraction Layer

Page 106: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

A video picture is coded as one or more slices.

Each slice contains an integral number of macroblocks from 1 to total number of macroblocks in a picture.

The number of macroblocks per slice need not to be constant within a picture.

Page 107: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

There are five slice modes. Three commonly use modes are:

1. I-slice: A slice where all macroblocks of the slice are coded using intra prediction.

2. P-slice: In addition to the coding types of the I-slice, some macroblocks of the P-slice can be coded using inter-prediction (predicted from one reference picture buffer only).

3. B-slice: In addition to the coding types available in a P-slice, some macroblocks of the B-slice can be predicted from two reference picture buffers.

Page 108: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Note that the coded data in a slice can be placed in three separate Data Partitions (A, B and C) for robust transmission.

Partition A contains the slice header and header data for each marcoblock in the slice.

Partition B contains coded residual data for Intra slice macroblocks.

Partition C contains coded residual data for Inter slice macroblocks.

Page 109: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

In the H.264, the VCL data will be mapped into NAL units prior to transmission or storage.

Each NAL unit contains a Raw Byte Sequence Payload (RBSP), a set of data corresponding to coded video data or header information.

The NAL units can be delivered over a packet-based network or a bitstream transmission link or stored in a file.

NAL

header

RBSP NAL

header

RBSP NAL

header

RBSP

sequence of NAL units

Page 110: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

RBSP type Description

Parameter Set Global parameter for a sequence such as picture dimensions, video format.

Supplemental

Enhancement Information

Side messages that are not essential for correct decoding of the video sequences.

Picture

Delimiter

Boundary between pictures (optional). If not present, the decoder infers the boundary based on the frame number contained within each slice header.

Coded

Slice

Header and data for a slice; this RBSP contains actual coded video data.

Data Partition

A, B or C

Three units containing Data Partitioned slice layer data (useful for error decoding).

End of Sequence

End of

Stream

Filler Data Contains ‘dummy’ data

Page 111: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Example:

Sequence

parameter set

SEI Picture parameter set

I Slice

(Coded slice)

Picture delimiter

P Slice (Coded slice)

P Slice

(Coded slice)

The following figure shows an example of RBSP elements.

...

Page 112: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.

Profiles

Baseline Main Extended High

Page 113: MPEG-4 AVC (H.264). Introduction The H.264 is aimed at very low bit rate, real- time, low end-to-end delay, and mobile applications such as conversational.