Electrical Engineering National Central University Video-Audio Processing Laboratory Overview of H.264/AVC 2003.9.x M.K.Tsai.

Electrical Engineering

National Central University

Video-Audio Processing Laboratory

Overview of H.264/AVC

2003.9.x

M.K.Tsai




OutlineOutline Abstract

Applications Network Abstraction Layer,NAL Conclusion—(I) Design feature highlight Conclusion—(II) Video Coding Layer,VCL Profile and potential application Conclusion—(III)




abstractabstractH.264/AVC is newest video coding standardMain goals have been enhanced compression and provisi

on of “network-friendly” representation addressing “conver

sational”(video telephony) and “nonconversational” (stora

ge,broadcast, or streaming) applicationH.264/AVC have achieved a significant improvement in ra

te-distortion efficiencyScope of standardization is illustrated below




applicationsapplicationsBroadcast over cable, cable modem …Interactive or serial storage on optical and DVD …Conversational service over LAN, modem …Video-on-demand or streaming service over

ISDN,wireless network …Multimedia message service (MMS) over DSL, mobile

network …

How to handle the variety of applications and networks ?




applicationsapplicationsTo address this need for flexibility and customizability, the

H.264/AVC design VCL and NAL, structure of H.264/AVC

encoder is shown below




applicationsapplicationsVCL(video coding layer), designed to efficiently represent

video contentNAL(network abstraction layer), formats the VCL

representation of the video and provides header

information in a manner appropriate for conveyance by a

variety of transport layers or storage media




Network Abstraction LayerNetwork Abstraction LayerTo provide “network friendliness” to enable simple and

effective customization of the use of the VCLTo facilitate the ability to map H.264/AVC data to

transport layers such as : RTP/IP for kind of real-time Internet services File formats,ISO MP4 for storage H.32X for conversational services MPEG-2 systems for broadcasting services

The design of the NAL anticipates a variety of such

mappings




Network Abstraction LayerNetwork Abstraction LayerSome key concepts of the NAL are NAL units, byte strea

m, and packet format uses of NAL units, parameter sets a

nd access units … NAL units

a packet that contains an integer number of bytes- First byte is header byte containing indication of type of data

- Remaining byte contains payload data

- Payload data is interleaved as necessary with emulation preventi

on bytes, preventing start code prefix from being generated insid

e payload Specifies a format for use in both packet- and bitstream- orient

ed transport system




Network Abstraction LayerNetwork Abstraction Layer NAL units in Byte-Stream format use

byte stream format Each is prefixed by a unique start code to identify the boundary Some systems require delivery of NAL unit stream as ordered

stream of bytes (like H.320 and MPEG-2/H.220)

NAL units in packet-transport system use Coded data is carried in packets framed by system transport

protocol Can be carried by data packets without start code prefix In such system, inclusion of start code prefixes in data would

be waste




Network Abstraction LayerNetwork Abstraction Layer VCL and Non-VCL NAL units

VCL NAL units contain data represents the values of the

samples in video pictures Non- VCL NAL units contain extra data like parameter sets

and supplemental enhancement information (SEI)- parameter sets, important header data applying to large number

of VCL NAL units

- SEI, timing information and other supplemental data enhancing

usability of decoded video signal but not necessary for decoding

the values in the picture




Network Abstraction LayerNetwork Abstraction Layer Parameter sets

Contain information expected to rarely change and offers the

decoding of a large number of VCL NAL units Divided into two types

- Sequence parameter sets, apply to series of consecutive coded

video picture

- Picture parameter sets, apply to the decoding of one or more

individual picture within a coded video sequence The above two mechanisms decouple transmission of

infrequently changing information Can be sent well ahead of the VCL NAL units and repeated to

provide robustness against data loss





Can be sent well ahead of the VCL NAL units and repeated to

provide robustness against data loss Small amount of data can be used (identifier) to refer to a

larger amount of of information (parameter set) In some applications, these may be sent within the channel

(termed “in-band” transmission)





In other applications, it can be advantageous to convey

parameters sets “out of band” using reliable transport

mechanism




Network Abstraction LayerNetwork Abstraction Layer Access units

The format of access unit is shown below





Contains a set of VCL NAL units to compose a primary coded

picture Prefixed with an access unit delimiter to aid in locating the

start of the access unit SEI contains data such as picture timing information Primary coded data consists of VCL NAL units consisting of

slices that represent the sample of the video Redundant coded picture are available for use by decoder in

recovering from loss of data





For the last coded picture of video sequence, end of sequence

NAL unit is present to indicate the end of sequence For the last coded picture in the entire NAL unit stream, end of

stream NAL unit is present to indicate the stream is ending Decoder are not required to decode redundant coded pictures

if they are present Decoding of each access unit results in one decoded picture




Network Abstraction LayerNetwork Abstraction Layer Coded video sequences

Consists of a series of access unit and use only one sequence

parameter set Can be decoded independently of other coded video

sequence ,given necessary parameter set Instantaneous decoding refresh(IDR) access unit is at the

beginning and contains intra picture Presence of IDR access unit indicates that no subsequent

picture will reference to picture prior to intra picture




Conclusion—(I)Conclusion—(I)H.264/AVC represents a number of advances in standard

video coding technology in term of flexibility for effective

use over a broad variety of network types and application

domain




Design feature highlightDesign feature highlightVariable block-size motion compensation with small block

size With minimum luma block size as small as 4x4 The matching chroma is half the length and width




Design feature highlightDesign feature highlightQuarter-sample-accurate motion compensation

Half-pixel is generated by using 6 tap FIR filter As first found in advanced profile of MPEG-4, but further

reduces the complexity

Multiple reference picture motion compensation Extends upon enhanced technique found in H.263++ Select among large numbers of pictures decoded and

stored in the decoder for pre-prediction Same for bi-prediction which is restricted in MPEG-2




Design feature highlightDesign feature highlightDecoupling of reference order from display order

A strict dependency between ordering for referencing and

display in prior standard Allow encoder to choose ordering of pictures for referencing

and display purposes with a high degree of flexibility Flexibility is constrained by total memory capability Removal of restriction enable removing extra delay

associated with bi-predictive coding




Design feature highlightDesign feature highlightMotion vector over boundaries

Motion vectors are allowed to point outside pictures Especially useful for small picture and camera movement

Decoupling of picture representation methods from picture

referencing capability Bi-predictively-encoded pictures could not be used as refere

nces in prior standard Provide the encoder more flexibility to use a picture for refer

encing that is closer to the picture being coded




Design feature highlightDesign feature highlightWeighted prediction

Allow motion-compensated prediction signal to be weighted

and offset by amounts Improve coding efficiency for scenes containing fades

one grid means one pixel




Design feature highlightDesign feature highlightImproved skipped and direct motion inference

In prior standard ,”skipped” area of a predictively-coded pict

ure can’t motion in the scene content ,which is detrimental f

or global motion Infers motion in “ skipped ” motion For bi-predictively coded areas ,improves further on prior dir

ect prediction such as H.263+ and MPEG-4.




Design feature highlightDesign feature highlightDirectional spatial prediction for intra coding

Extrapolating edges of previously decoded parts of current

picture is applied in intra-coded regions of picture Improve the quality of the prediction signal Allow prediction from neighboring areas that were not intra-

coded




Design feature highlightDesign feature highlightIn-the-loop deblocking filtering

Block-based video coding produce artifacts known as blocki

ng artifacts originated from both prediction and residual diffe

rence coding stages of decoding process Improvement in quality can be used in inter-picture predictio

n to improve the ability to predict other picture




Design feature highlightDesign feature highlightIn addition to improved prediction methods coding efficiency

is also enhanced, including the following

Small block-size transform All major prior video coding standards used a transform

block size of 8x8 while new ones is based primarily on 4x4 Allow the encoder to represent the signal in a more locally-

adaptive fashion and reduce artifact

Short word-length transform Arithmetic processing 32-bit 16-bits




Design feature highlightDesign feature highlightHierarchical block transform

Extend the effective block size for low-frequency chroma to

8x8 array and luma to 16x16 array




Design feature highlightDesign feature highlightExact-match inverse transform

Previously transform was specified within error tolerance

bound due to impracticality of obtaining exact match to ideal

inverse transform Each decoder would produce slightly different decoded

video, causing “drift” between encoder and decoder

Arithmetic entropy coding Previously found as an optional feature of H.263 Use a powerful “Context-adaptive binary arithmetic

coding”(CABAC)




Design feature highlightDesign feature highlightContext-adaptive entropy coding

Both “CAVLC (context-adaptive variable length coding)” and

“CABAC” use context-based adaptivity to improve performa

nce




Design feature highlightDesign feature highlightRobustness to data errors/losses and flexibility for operation

over variety of network environments is enable, including

the followingParameter set structure

Key information was separated for handling in a more

flexible and specialize manner Provide for robust and efficient conveyance header

information

Flexible slice size Rigid slice structure reduce coding efficiency by increasing

the quantity of header data and decreasing the

effectiveness of prediction in MPEG-2




Design feature highlightDesign feature highlightNAL unit syntax structure

Each syntax structure in H.264/AVC is placed into a logical

data packet called a NAL unit Allow greater customization of the method of carrying the

video content in a manner for each specific network

Redundant pictures Enhance robustness to data loss Enable a representation of regions of pictures for which the

primary representation has been lost




Design feature highlightDesign feature highlightFlexible macroblock ordering (FMO)

Partition picture into regions called slice groups, with each s

lice becoming independently decodable subset of a slice gr

oup Significantly enhance robustness by managing the spatial re

lationship between the regions that are coded in each slice

Arbitrary slice ordering (ASO) Enable sending and receiving the slices of the picture in any

order relative to each other as found in H.263+ Improve end-to-end delay in real time applications particular

ly for out-of-order delivery behavior




Design feature highlightDesign feature highlightData partitioning

Allow the syntax of each slice to be separated into up to

three different partitions(header data, Intra-slice, Inter-slice,

partition), depending on a categorization of syntax elements

SP/SI synchronization/switching pictures Allow exact synchronization of the decoding process of

some decoder with an ongoing video Enable switching a decoder between video streams that use

different data rate, recover from data loss or error Enable switching between different kind of video streams,

recover from data loss or error




Design feature highlightDesign feature highlightSP/SI synchronization/switching pictures




Design feature highlightDesign feature highlightSP/SI synchronization/switching pictures




H.264/AVC represents a number of advances in standard

video coding technology in term of both coding efficiency

enhancement and flexibility for effective use over a board

variety of network types and application domain

Conclusion—(II)Conclusion—(II)




Video Coding LayerVideo Coding LayerPictures, Frames, and Fields

Picture can represent either an entire frame or a single field If two fields of a frame were captured at different time

instants the frame is referred to as a interlaced frame,

otherwise it is referred to as a progressive frame




Video Coding LayerVideo Coding LayerYCbCr color space and 4:2:0 sampling

Y represents brightness Cb 、 Cr represents color deviates from gray toward blue an

d red

Division of the picture into macroblockSlices and slice groups

Slices are a sequence of macroblocks processed in the ord

er of a raster scan when not using FMO Some information from other slices maybe needed to apply

the deblocking filter across slice boundaries




Video Coding LayerVideo Coding Layer Picture may be split into one or more slices without FMO sh

own below

FMO modifies the way how pictures are partitioned into slic

es and MBs by using slice groups Slice group is a set of MBs defined by MB to slice group ma

p specified by picture parameter set and some information fr

om slice header




Video Coding LayerVideo Coding Layer Slice group can be partitioned into one or more slices, such

that a slice is a sequence of MBs within same slice group pr

ocessed in the order of raster scan By using FMO, a picture can be split into many macroblock

scanning patterns such as the below




Video Coding LayerVideo Coding Layer Each slice can be coding using different types

I slice - A slice where all MBs are coded using intra prediction

P slice- In addition to intra prediction, it can be coded with inter prediction

with at most one motion-compensated prediction B slice

- In addition to coding type of P slice, it can be coded with inter pre

diction with two motion-compensated prediction SP (switching P) slice

- Efficient switching between different pre-coded pictures SI (switching I) slice

- Allows exact match of a macroblock in an SP slice for random ac

cess and error recovery




Video Coding LayerVideo Coding Layer If all slices in stream B are P-slices, decoder won’t have

correct reference frame, solution is to code frame as an I-slice

like below

I-slice result in a peak in the coded bit rate at each switching

point




Video Coding LayerVideo Coding Layer SP-slices are designed to support switching without increased

bit-rate penalty of I-slices

Unlike “ normal ” P-slice, the subtraction occurs in transform d

omain




Video Coding LayerVideo Coding Layer A simplified diagram of encoding and decoding processing for

SP-slices A2、 B2、 AB2 is shown (A’ means reconstructed

frame)




Video Coding LayerVideo Coding Layer

If stream A and B are versions of the same original sequence

coded at different bit-rates the SP-slice AB2 should be efficient




Video Coding LayerVideo Coding Layer SP-slices is to provide random access and “VCR-like”

functionalities.(e.g decoder can fast-forward from A0 directly to

frame A10 by first decoding A0, then decoding SP-slice A0-10)

Second type of switching slice, SI-slice may be used to switch

from one sequence to a completely different sequence




Video Coding LayerVideo Coding LayerEncoding and decoding process for macroblocks

All luma and chroma samples of a MB are either spatially or

temporally predicted Each color component of prediction is subdivided into 4x4 bl

ocks and is transformed using integer transform and then b

e quantized and encoded by entropy coding methods The input video signal is split into MBs, the association of M

Bs to slice groups and slices is selected An efficient parallel processing of MB is possible when there

are various slices in the picture




Video Coding LayerVideo Coding LayerEncoding and decoding process for macroblocks

block diagram of VCL for a MB is in the following




Video Coding LayerVideo Coding LayerAdaptive frame/field coding operation

For regions of moving objects or camera motion, two

adjacent rows show a reduced degree of dependency in

interlaced frames but progressive frames To provide high coding efficiency, H.264/AVC allows the

following decisions when coding a frame To combine two fields and code them as one single frame

(frame mode) To not combine the two fields and to code them as separated

coded fields (field mode) To combine the two fields and compress them as a single

frame, before coding them to split the pairs of the vertically

adjacent MB into pairs of two fields or frame MB




Video Coding LayerVideo Coding Layer The three options can be made adaptively and the first two

can be is referred to as picture-adaptive frame/field (PAFF)

coding As a frame is coded as two fields, coded in ways similar to f

rame except the following Motion compensation utilizes reference fields rather frames The zig-zag scan is different Strong deblocking is not used for filtering horizontal edges of

MB in fields

A frame consists of mixed regions, it’s efficient to code the n

onmoving regions in frame mode, moving regions in field m

ode




Video Coding LayerVideo Coding LayerA frame/field encoding decision can be made independe

ntly for each vertical pairs of MB. The coding option is ref

erred as macroblock-adaptive frame/field (MBAFF) codin

g. The below shows the MBFAA MB pair concept.




Video Coding LayerVideo Coding LayerAn important distinction between PAFF and MBAFF is

that in MBAFF, one field can’t use MBs in other field of

the same frameSometimes PAFF coding can be more efficient than M

BAFF coding, particularly in the case of rapid global m

otion, scene change, intra picture refresh




Video Coding LayerVideo Coding LayerIntra frame prediction

In all slice coding type Intra_4x4 or intra_16x16 together wit

h chroma prediction and I_PCM prediction mode Intra_4x4 mode is based on 4x4 luma block and suited for si

gnificant detail of picture When using, each 4x4 block is predicted from the neighbori

ng samples like the below





4x4 block prediction mode Suited to predict textures with structure in the specified

direction except the “DC” mode prediction





In earlier draft, the four samples below L were also used for

some prediction modes. They are dropped due to the need

to reduce memory access Intra modes for neighboring 4x4 block are highly correlated.

For example, if previously-encoded 4x4 blocks A and B

were predicted mode 2, it’s likely that the best mode for

block C is also mode 2.





Intra_16x16 mode is suited for smooth areas of a picture When using this mode, it contains

vertical 、 horizontal 、 DC and plane prediction Plane prediction works well in areas of smoothly-varying

luminance





Chroma of MB is predicted by the similar prediction as Intra

_16x16(the same four modes) I_PCM mode allows the encoder to bypass the prediction a

nd transform coding process and instead directly send the v

alues of the encoded samples I_PCM mode server the following purposes

Allow the encoder to precisely represent the value of samples Provide a way to accurately represent the values of anomalou

s picture content Enable placing a hard limit on the number of bits, decoder mu

st handle for MB without harm to coding efficiency





Constrained intra coding mode allows prediction only from i

ntra-coded neighboring MBs Intra prediction across slice boundaries is not used Referring to neighboring samples of previously-coded block

s may incur error propagation in environments with transmis

sion error




Video Coding LayerVideo Coding LayerInter frame prediction

In P slices Each P MB type is partitioned into partitions like the below

This method of partitioning MB is known as tree structure moti

on compensation





Choosing larger partition size means- Small number of bits are required to signal the choice of MV and

the type of partition

- Motion compensated residual contain a significant amount of ene

rgy in frame areas with high detail Choosing small partition size means

- Give a lower-energy residual after motion compensation

- Require larger number of bits to signal MV and type of partition The accuracy of motion compensation is in units of one quarte

r of the distance between two luma sample





Half-sample values are obtained by applying a one-

dimensional 6-tap FIR filter vertically and horizontally 6 tap interpolation filter is relatively complex but produces

more accurate fit to the integer-sample data and hence better

motion compensation performance Quarter-sample values are generated by averaging samples

at integer- and half-sample position








Video Coding LayerVideo Coding Layer The above illustrates the half sample interpolation

1)1(

1)1(

10)5121(

)51201205(1

5)161(

5)161(

)520205(1

)520205(1

hbe

bGa

jj

ffeemhddccj

hh

bb

TRMGCAh

JIHGFEb

1)1(

1)1(

10)5121(

)51201205(1

5)161(

5)161(

)520205(1

)520205(1

hbe

bGa

jj

ffeemhddccj

hh

bb

TRMGCAh

JIHGFEb





The following illustrates the luma quarter-pel positions

a = round ((G+b)/2)

d = round ((G+h)/2)

e = round ((h+b)/2)




Video Coding LayerVideo Coding Layer The prediction for chroma component are obtained by bilinear

interpolation The displacements used for chroma have one-eighth sample p

osition accuracy

a = round([(8-dx)(8-dy)A + dx(8-dy)B + (8-dx)dyC + dxdyD]/64)





Motion prediction using full,half,and one-quarter sample have

improvements than the previous standards for two reasons- More accurate motion representation

- More flexibility in prediction filtering Allows MV over picture boundaries No MV prediction takes place across slice boundaries Motion compensation for smaller regions than 8x8 use the

same reference index for prediction of all blocks within 8x8





Choice of neighboring partitions of same and different size are

shown below

- For transmitted partitions, excluding 16x8 and 8x16 partition size

s: MVp is the median of the MV for partitions A,B,C




Video Coding LayerVideo Coding Layer- For 16x8 partitions: MVp for the upper 16x8 partition is predicted

from B, MVp for the lower 16x8 partition is predicted from A

- For 8x16 partitions: MVp for the left 8x16 partition is predicted fro

m A, MVp for the right 8x16 partition is predicted from C

- For skipped macroblocks: a 16x16 vector MVp is generated as in

case(1) above




Video Coding LayerVideo Coding Layer P MB can be coded in P_Skip type useful for large areas with

no change or constant motion like slow panning can be

represented with very few bits Support multi-picture motion-compensation like below




Video Coding LayerVideo Coding Layer In B slices

Intra coding are also supported Four other types are supported : list 0, list 1, bi-predictive, and

direct prediction For bi-predictive mode, the prediction signal is formed by a

weighted average of motion-compensation list 0 and list 1

prediction signal The direct mode can be list 0 or list 1 prediction or bi-

predictive Support multi-frame motion compensation




Video Coding LayerVideo Coding LayerTransform, scaling, quantization

Transform is applied to 4x4 block Instead of DCT, a separated integer transform with similar

properties as DCT is used Inverse transform mismatches are avoided At encoder, transform, scanning, scaling, and rounding as

quantization followed by entropy coding At decoder, process of inverse encoding is performed

except for the rounding Inverse transform is implemented using only additions and

bit-shifting operations of 16 bit




Video Coding LayerVideo Coding Layer Several reasons for using smaller size transform

Remove statistical correlation efficiently Have visual benefits resulting in less noise around edges Require less computations amd a smaller processing word-len

gth

Quantization parameter(QP) can take 52 values

Qstep double in size for every increment of 6 in QP

With increasing 1 of QP means increasing 12.5% Qstep




Video Coding LayerVideo Coding Layer Wide range of quantizer step size make it possible for enco

der to control the trade-off between bit rate and quality accu

rately and flexibly The values of QP may be different from luma and chroma.

QPchroma is derived from QPY by user-defined offset

4x4 luma DC coefficient and quantization (16x16 intra mode

only ) The DC coefficient of each 4x4 block is transformed again u

sing 4x4 Hadamard transform In a intra-coded MB, much energy is concentrated in the DC

coefficients and this extra transform helps to de-correlate th

e 4x4 luma DC coefficients




Video Coding LayerVideo Coding Layer 2x2 chroma DC coefficient transform and quantization, as w

ith Intra luma DC coefficients, the extra transform help to d

e-correlate the 2x2 chroma DC coefficients and improve co

mpression performance The complete process

Encoding ： Input ： 4x4 residual samples ： Forward “core” transform ：( followed by forward transform for Chroma DC or Intra-16 Luma

DC coefficients) Post-scaling and quantization ：( modified for Chroma DC or Intra-16 Luma DC)

qbitsstepQ

PFWZ

2

Xt

ff XCCW




Video Coding LayerVideo Coding LayerDecoding ：( inverse transform for chroma DC or intra-16 luma DC coefficient

) Re-scaling ( incorporating inverse transform pre-scaling ) ：

(modified for chroma DC or Intra-16 Luma DC coefficients) Inverse “core” transform ： Post-scaling ：

Output ： 4x4 residual samples ：

64...' PFQZW step

iTi CWCX ''

)64/'('' XroundX

''X




Video Coding LayerVideo Coding Layer Flow chart

An additional 2x2 transform is also applied to DC coefficient

s of the four 4x4 blocks of chroma




Video Coding LayerVideo Coding LayerEntropy coding

Simpler method use a single infinite-extent codeword table f

or all syntax elements except residual mapping of codeword table is customized according to data

statistics Codeword table chosen is an exp-Golomb code with simple

and regular decoding property In CAVLC, VLC tables for various syntax elements are switc

hed depending on already transmitted syntax elements In CAVLC, number of non-zero quantized coefficient and ac

tual size and position of the coefficients are coded separatel

y





VLC tables are designed to match the corresponding conditi

oned statistics CAVLC encoding of a block of transform coefficients procee

ds as follows Encode number of non-zero coefficients and “ trailing 1s ”

- Encode total number of non-zero coefficients(TotalCoeffs) and tr

ailing +/-1 values(T1) coeff_token

- TotalCoeffs:0~16 ,T1:0~3

- There are 4 look-up tables for coeff_token (3 VLC and 1 FLC) Encode the sign of each T1

- Coded in reverse order, starting with highest-frequency





Encoding levels of remaining non-zero coefficients- Coded in reverse order

- There are 7 VLC tables to choose from

- Choice of table adapts depending on magnitude of coded level Encode total number of zeros before last coefficient

- TotalZeros is sum of all zeros preceding the highest non-zero co

efficient in the reorder array

- Coded with a VLC Encode each run of zeros

- Encoded in reverse order

- Chosen depending on ZerosLeft 、 run_before





example





example





example




Video Coding LayerVideo Coding Layer In CABAC, it allows assignment of a non-integer number of

bits to each symbol of an alphabet Usage of adaptive codes permits adaptation to non-

stationary symbol statistics Statistics of already coded syntax elements are used to

estimate conditional probabilities used for switching several

estimated models Arithmetic coding core engine and its associated probability

estimation are specified as multiplication-free low

complexity methods using only shift and table look-ups




Video Coding LayerVideo Coding Layer Coding a data symbol involves the following stages (take M

VDx) Binarization

- For |MVDX|<9 it’s carried out by following table, larger values are

by Exp-Golomb codeword

the first bit is bin 1,second bit is bin 2





VDx) Context model selection

- It’s by following table

Arithmetic encoding- Selected context model supplies two probability estimates (1 and

0) to determine sub-range the arithmetic coder uses





VDx) Probability update

- The value of bin 1 is “0”, the frequency count of “0” is incremente

d




Video Coding LayerVideo Coding LayerIn-loop deblocking filter

Applied between inverse transform and reconstruction of MB Particular characteristics of block-based coding is the accide

ntal production of visible block structures Block edges are reconstructed with less accuracy than interio

r pixels and “blocking” is most visible artifacts It has two benefits

Block edges are smoothed Resulting in a smaller residuals after prediction

In adaptive filter, strength of filtering is controlled by several s

yntax elements





Basic idea is that if a relatively larger absolute difference bet

ween samples near a block edge is measured , it is quite lik

ely a blocking artifact and should be reduced If magnitude of difference is large and can’t be explained by

coarse quantization, it’s likely actual behavior of picture Filtering is applied 4x4 block





Filtering is applied 4x4 block

Choice of filtering outcome depends on boundary strength a

nd gradient of image across boundary




Video Coding LayerVideo Coding LayerIn-loop de-blocking filter

Boundary strength Bs is chosen according to following table

Filter implementation Bs {1,2,3} ： a 4-tap linear filter is applied Bs {4} ： 3 、 4 、 5-tap linear filter may be used




Video Coding LayerVideo Coding Layer Below shows principle using one dimensional edge

Samples p0 and q0 as well as p1 and q1 are filtered is

determined using quantization parameter (QP) dependent

thresholds α(QP) and β(QP), β(QP) is smaller than α(QP)




Video Coding LayerVideo Coding Layer Filtering of p0 and q0 takes place if each of the below is satisfied

1. |p0 – q0| < α(QP)

2. |p1 – p0| < β(QP)

3. |q1 – q0| < β(QP)

Filtering of p1 and q1 takes place if the below is satisfied

1. |p2 – p0| < β(QP)

or 2. |q2 – q0| < β(QP)





Foreman.qcif 10 Hz Foreman.cif 30 Hz




Video Coding LayerVideo Coding LayerHypothetical reference decoder (HRD)

For a standard, it’s not sufficient to provide a coding

algorithm It’s important in real-time system to specify how bits are fed

to a decoder and how the decoded pictures are removed

from decoder Specifying input and output buffer models and developing

an implementation independent model of receiver called

HRD Specifies operation of two buffers

Coded picture buffer (CPB) Decoded picture buffer (DPB)




Video Coding LayerVideo Coding Layer CPB models arrival and removal time of the coded bits HRD is more flexible in support of sending video at variety

of bit rates without excessive delay HRD specifies DPB model management to ensure that

excessive memory capability is not needed




Profile and potential applicationProfile and potential applicationProfiles

Three profiles are defined, which are Baseline, Main, and

Extended profiles. Baseline support all features except the following

B slice, weighted prediction, CABAC, field coding, and picture

or MB adaptive switching between frame/field coding SP/SI slices, and slices data partition

Main profile supports first set of above but FMO, ASO, and

redundant pictures Extended profile supports all features of baseline and the

above both set except for CABAC




Profile and potential applicationProfile and potential applicationAreas for profiles of new standard to be used

A list of possible application areas is list below Conversational services

- H.323 conversational video services that utilize circuit–switched

ISDN-based video conference

- H.323 conversational services over internet with best effort

IP/RTP protocols Entertainment video applications

- Broadcast via satellite, cable or DSL

- DVD for standard

- VOD(video on demand) via various channels




Profile and potential applicationProfile and potential application Streaming services

- 3GPP streaming using IP/RTP for transport and RSTP for

session setup

- Streaming over wired Internet using IP/RTP protocol and RTSP

for session Other services

- 3GPP multimedia messaging services

- Video mail




Conclusion—(III)Conclusion—(III)Its VCL design is based on convectional block-based

hybrid video coding concepts, but with some differences

relative to prior standard, they are illustrated below Enhanced motion-prediction capability Use of a small block-size exact-match transform Adaptive in-loop de-blocking filter Enhanced entropy coding methods

Electrical Engineering National Central University Video-Audio Processing Laboratory Overview of H.264/AVC 2003.9.x M.K.Tsai.

Documents

modem video

avc data

tsai slide

access units nal units

storage media slide

network friendliness

mobile network

provision of network