Video Coding Standards

Video Coding Standards

Heejune AHNEmbedded Communications Laboratory

Seoul National Univ. of TechnologyFall 2011

Last updated 2011. 5. 13

Heejune AHN: Image and Video Compression p. 2

Agenda

History and Concepts JPEG and JPEG-2000 MPEG-1 and MPEG-2 MPEG-4 H.261 and H.263 H.264 Beyond H.264


1. Standards and Standards Bodies

VCEG (video coding expert group) in ITU (formerly CCITT) Focus on real-time, two-way video communication

MPEG/JPEG (moving picture expert group) in ISO Focus on multimedia storage and distribution for entertainment

Some are overlapped

JPEG

JPEG-2000

MPEG-1MPEG-2 => H.262

MPEG-4

MPEG-7

MPEG-21

H.261

H.263

H.264MPEG-4/AVC <=

ITU VCEGISO MPEG/JPEG

H.264 High ProfileH.264 SVCH.264 MVCHEVC(H.265)


History of Video Coding Standards

HP SVCMVC

HEVC

2011


ISO-MPEG/JPEG JPEG (1992) : compression of still image (DCT) MPEG-1 (1993) : real time play back of VHS quality on Video CD (1.4Mbps) MPEG-2 (1995) : broadcasting quality video service (3~5Mbps) MPEG-4 (1998) : wide bandwidth (20bps to high) and object oriented coding JPEG-2000 (2000) : better quality still image

ITU-VCEG H.261 (1990) : video telephony over ISDN (px64kbps) H.263 (1995) : video telephony over circuit and packet network, at 20 kbps

to high bandwidth H.264 (2003) : multipurpose better quality video coding

Others MPEG-7 (Multimedia content description interface) for search and retrieval

in multimedia DB MPEG-21(Multimedia Framework) for multimedia delivery for interoperability


Standards process and usage

Standards process

Understanding standards Only Syntax and Decoder system are defined in Standards. Encoder, application, and Implementation are open to users Standards provides “profile and level” and recommended usage for

helping users to choose from many technical options.

Scope & Aim of St’ds

Proposals From

Companies, Universities

Test Model

(Docs & ref. SW)

Draft St’ds

Int’lSt’ds

Improvement Proposals

Performance &

complexity evaluation


2. JPEG

ISO IS-10918 By ISO/IEC JTC1/SC29/WG10, (1984~1992) Widely used in WWW and digital photography Motion-JPEG is just a successive stream of JPEG images


Baseline JPEG Codec

RGB or YCbCr coded in either separately or in interleaved order

Leveloffset

8x8DCT

Uniformscalar

quantization

Zig-zag scan

Run-levelcoding VLC

DifferentialCoding VLC

Quantizationtables

ac quantization indices

dc quantization indices

AC Huffmantables

DC Huffmantables

bits

bits

inputimage

16 11 10 16 24 40 51 61

12 12 14 19 26 58 60 55

14 13 16 24 40 57 69 56

14 17 22 29 51 87 80 62

18 22 37 56 68 109 103 77

24 36 55 64 81 104 113 92

49 64 78 87 103 121 120 101

72 92 95 98 112 100 103 99

8x8 blocks

[0,255] => [-128,127]

RRRRSSSS-value

SSSS-value


Lossless JPEG DPCM used, prediction from 3 neighbors pixels

Optional mode Progressive encoding

• Store image data in order of DC only, low-frequency AC, high frequency AC

Hierarchical encoding• Store image data in low resolution to high resolution

Motion-JPEG Just a sequence of JPEG still images Low complexity, Error tolerance, Market awareness Used for video conferencing and surveillance before widely

available cheap MPEG-1/2/4 solution in a market


JPEG-2000

Features Good compression performance than JPEG

• at high compression ratio, no blocking effects Good compression for continuous tone, bi level (text) Both lossless and lossy compression in one framework ROI (region of interest) support Error resilient support (data partitioning) Rather slow in current embedded system due to complexity

Encoding process

WaveletTransform

QuantizerArithmetic Encoder(Tiling)

bits image


Comparison between JPEG vs. JPEG-2000

Lenna, 256x256 RGBBaseline JPEG: 4572 bytes

Lenna, 256x256 RGBJPEG-2000: 4572 bytes


MPEG-1/2

MC-DCT Hybrid Coding

Intra-frame Decoder

Motion-Compensated

Predictor

ControlData

DCTCoefficients

MotionData

0

Intra/Inter

CoderControl

Decoder

MotionEstimator

Intra-frameDCT Coder- E

ntro

py co

der

Quant

DeQ


MPEG-1

MPEG-1 Targeted VHS quality(352x288, 30fps, YCbCr420) on VCD (600MB) 1.4 Mbps (1.2 Mbps video + 0.2 Mbps audio) VCD, 70 minutes Three parts: Part 1 System, Part 2 Video, Part 3 Audio

Technology MC-DCT Hybrid

• Macro-block (16x16 pixels): Motion estimation unit

• Block (8x8 pixels): DCT and Quant unit GOP structure

• I, P, B picture

• Trade-off between random access and coding efficiency Asymmetric complexity

• Larger memory and high computation required at Encoder


MPEG-1 Structure

Syntax Hierarchy Sequence layer

GOP layer

Picture Layer

Slice Layer

MB Layer

Block Layer

SH

MB MB MB MB ... ...

3 4

1 2 5 6

6

5

1 2

3 4

8

8

16

16

Cr

Cb

Y

8

8

Cb Cr

I B B P B B P ... P

Slice

GOPSH

GOPSH

GOP ...SH

SH : Sequence Header GOP : Group of Picture

(4:2:0)


Picture Coding • I Picture: no interframe prediction

• P Picture: interframe prediction from one casual reference picture

• B Picture: interframe prediction from one previous and one future picture

GOP and picture order • display order (input at encoder)

• Transmission order (Encoding/decoding order)

I1 P1 B1 B2 P2 B4 B5 I2 B6 B7

I1 P1B1 B2 P2B4 B5 I2B6 B7


MPEG-2

Major target application Digital television quality (720x576/480, 25/30 fps) at 3 ~ 4Mbps

Interlaced video support Frame picture vs field picture : motion compensation unit Frame DCT vs field DCT in frame picture

field picture field picture

frame picture

Field DCTFrame DCT


Scalability Support Spatial scalability

• Low resolution at Base layer and high resolution at Enhancement layer

• BL is used for prediction of EL

• E.g. SD resolution at BL, HD resolution at EL Temporal scalability

• 30 fps at BL, 60 fps at EL SNR scalability

• Same resolution but different quality Data partitioning

• Coding Data is packed into different stream

BL Enc

Input video EL Enc

BL bit stream

EL bit stream

down

BL Dec

EL Enc Higher Quality

Lower Quality


Profile & Level MPEG-2 has many options; all implementation do not needs all of them Profiles

• Simple : 4:2:0 input, I and P picture only, low complexity & low perf.• Main : 4:2:0 input, I,P,B Picture, interlaced• 4:2:2 : 4:2:2 input (same vertical resolution of color)• SNR : SNR scalable• Spatial : Spatial scalable• High : Spatial and 4:2:2

Level• Low (352x288), Main(720x576), High 1440 (1440x1152), High (1920x1152)

E.g.• MPEG-1 : Main profile & Low Level• SD DTV, DVD : Main profile & Main Level • HDTV : Main profile & High Level (Historically MPEG-3’s target application)


MPEG-4

Features Support for low bit rate (from 20 Kbps) Support for object based coding

• Reuse of components, composition, and interactivity support. In practice, object based is not well used

Object-based Coding Video Object Shape Coding : transparent/opaque region, binary or grey scale Texture coding with arbitrary shape

• DCT after zero filling in interblock and exrapolation in Intrablock

VO1 VO2

VO3


Visual data structure

VS : 비쥬얼 화면열 (VS : visual seguence/video session)

VO1: 비디오 객체

(video object)

VOL1 : 비디오 객체 계층 (VOL : video object layer)

GOV1 : 비디오 객체 화면모음 (GOV : group of VOP)

VOP1 : 비디오 객체 화면 (VOP : video object plane)

MB : 대블록 (MB : macro block)

2차원/3차원 합성객체(synthetic object)

VO2

GOV2

VOL2

VOP2


H.261

ITU Mostly focus on real-time communication H.261

First video coding std(1990) N-ISDN (1990’s)

• px64Kbps (p=1,..30), typically 64 ~ 384kbps

• Circuit network based: low delay, reliable

H.261 key features YCbCr420 CIF, QCIF input MC-DCT Integer-pel motion Optional loop filter (for deblocking)

• Filtering at 8x8 block boundary FEC used


H.261 syntax structure

H.261 Bit structure

MBA 채워넣기

PSC PTYPETR PEI PSPARE GOB 층

MBA MTYPE MQUANT

MVD

CBP

블럭 층CBPMVD

TCOEFF EOB

가변길이 부호

고정길이 부호

GBSC GQUANTGN GEI GSPARE 대블럭 층

화면 층

GOB (Group of block) 층

대블럭(Macro block) 층

블럭 층

12

10

8

6

4

11

9

7

5

3

1 2

5

3

1

23 24 25 26 27 28 29 30 31 32 33

12 13 14 15 16 17 18 19 20 21 22

1 2 3 4 5 6 7 8 9 10 11

QCIFCIF

352 176

288

144

8

8

16

16

Y Cr Cb

화면

GOB

대블럭

블럭


H.263

H.263 Versions Version 1 (1995)

• Improvement to H.261

• 4 optional modes Version 2 (2000, H.263+)

• 12 optional modes Version 3 (2002, H.263++)

• 19 optional modes Key Features

Targets to 20 kbps and for packet based network also Half-pel prediction Redesigned 3-D VLC code


H.263 Optional Modes Annex D: Unrestricted motion vectors Annex E: Syntax-based arithmetic coding Annex F: Advanced Prediction Annex G: PB Frames

Annex I : Advanced Intra Coding Annex J: Deblocking Filter Annex K: Slice Structured Mode Annex L: Supplemental enhancement information Annex M: Improved PB frames Annex N: Reference Picture Selection Annex O: Scalability Annex P: reference picture resampling


(continued) Annex Q: Reduced resolution update Annex R: Indepenedent Segment Decoding Annex S: Alternative inter VLC Annex T: Modified Quantization Annex U: Enhanced reference picture selection Annex V: Data partition slice Annex W: Additional supplemental enhancement information


Performance


H.264

Name ITU H.264 = ISO MPEG-4 Part 10/AVC H.26L : Long term enhancement, not compatible H.263 Now accepted in DMB-T/S, IPTV, replacing many MPEG-2 solutions For 50% gain to H.263+


Key features Smaller processing units (upto 4x4 pixel block) Intra prediction Inter prediction

• Macroblock based Interframe prediction selection

• ¼ pixel motion vector support

• Motion vector options for subblocks 4x4 Integer DCT Deblocking filter Universal VLC CAVAC (content-based adaptive binary arithmetic coding)


Intra-frame Prediction

luma- 4x4: 9 modes

- 16x16: 4 modes

chroma- 8x8: 4modes

- The same prediction mode is always applied to both chroma blocks

M A B C D

I

J

K

L

M A B C DI

J

K

L

M

I

J

A B C D

K

L

Mean (A-D, I-M)

M A B C D

I

J

K

L

E F G H

……..

H

V

……

..

H

VMean(H, V)

H

V

H

V

……

..

H

V ……..

H

V

H

VMean(H, V)

H

V

…


Inter-frame Prediction

H.264 MPEG-1/2/4, H.261/3

References

Permits up to 15 (2 mostly used) reference pictures Bi-predictive B-slices A P-slice may reference a picture that has B-slices Supports explicit weighting coefficients and (a+b)/2 type

A P-slice references only one I-picture Bi-directional B-slices

Only permit (a+b)/2 type prediction weighting

Block Sizes

Tree-structured (16x16 16x8, 8x16, 8x8 8x4, 4x8, 4x4)

Either 16x16 or 8x8

Motion Estimation

half or ¼-pixel accuracy 6-point interpolation for half-pixel and 2-point linear interpolation for ¼-pixel

MPEG2 permits half-pixel accuracy and MPEG4 permits ¼-pixel accuracy2-point linear interpolation

I B P



Transform and Quantization

Integer DCT No encoder decoder mismatch

Three types of transform followed by quantization- Type 1: for the 4x4 array of luma DC coefficients in intra MBs predicted in 16x16 mode # -1

- Type 2: for the 2x2 array of chroma DC coefficients #16-17

- Type 3: for all other 4x4 blocks # 0-15, 18-25

-1

( 16x16 Intra Mode only)

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

16 17

18 19

20 21

22 23

24 25

*Data is transmitted in the numbered order

4 pixels 4 pixels 4 pixels

4

pixe

ls

4

pixe

ls

4

pixe

ls


Transform and Quantization

4×4 DCT ( X – Input, Y – output)

4×4 integer transform- forward

- backward

5

2,2

1with ba

W Post-scaling factor (PF)


Entropy Coding

Parameters to be codedentropy_coding_mode=0

entropy_coding_mode=1

Macroblock type (Intra/Inter)

Exponential Golomb codes (Exp_Golomb)

Variable Length Coding (VLC)

Context-based Adaptive Binary Arithmetic Coding (CABAC)

Coded block pattern

Quantizer parameter

Reference frame index

Motion vector

Residual dataContext-adaptive variable length coding (CAVLC)


Deblocking Filters

A boundary-strength (BS) parameter is assigned to every 4×4 block

BS = 0 No filtering

BS = 1-3 Slight filtering

BS = 4 Strong filtering Filters only when

|P0-Q0|< α

|P1-P0|< β

|Q1-Q0|< β

Thresholds α and β depend on the average quantization parameter (QP)

The deblocking filtering accounts for 1/3 of the computational complexity of a decoder.

Block modes and conditions

Boundary-Strength

parameter (BS)

One of the blocks is intra-coded and the edge is a MB edge

4

One of the blocks is intra-coded 3

One of the blocks has coded residuals 2

Difference of block motion ≥ one luma sample distance

1

Motion compensation from different reference frames

1

Else 0

P3 P2 P1 P0 Q0 Q1 Q2 Q3


Network Adaptation

VCL & NAL VCL (video coding layer) NAL (network adaptation layer)

Error Resilient Tools Flexible macroblock ordering (FMO)

• Allows to assign MBs to slices In an order other than scan order

Arbitrary slice ordering (ASO)• Improved end-to-end delay in real-time applications

Redundant slices (RS)• Redundant representations are coded using different

coding parametersSlice Group #0

Slice Group #1


Profile & Level

Main application Baseline : Video telephony Main : DTV and Storage Extended :Streaming

Profile & tools


Performance comparison


Contributions of the VCL Tools

Spatial Prediction for Intra-coded Macroblocks

Saves 6-9% bits

Temporal Prediction Saves around 50% bits

Transforms PSNR less than 0.02dB

Logarithmic QuantizationA change in step size by 12% also saves 12% bits

CAVLC Saves 5-8% bits

CABACSaves 5-15% bits over CAVLC

Picture-adaptive frame/field (PAFF) coding Saves 16%-20% bits

MB-adaptive frame/field (MBAFF) codingSaves 14-16% bits over PAFF

Deblocking Filter Saves 5-10% bits


Conclusion

Many video coding standards St’ds reflect Coding Technology and Implementation Technology Coding performance has improved over 4 times since H.261 (1990)

What’s next SVC (Scalable Video Coding) in H.264 (done) H.264ext (further improvement of H.264) 3-D and MVC (Multi-View Coding) is on going. UDTV (ultra Definition TV: 3840x2160) And what’s next?

Video Coding Standards

Documents

video compressionp

mbps video

video conferencing

picture coding

market heejune ahn

bytes heejune ahn

available cheap mpeg

encoderheejune ahn