1 Video Concepts and Techniques Wen-Shyang Hwang KUAS EE.

1

Video Concepts and Techniques

Wen-Shyang HwangKUAS EE.

2

Outline

Fundamental Concepts

Basic Video Compression Techniques

MPEG Video Coding I – MPEG-1 and 2

MPEG Video Coding II – MPEG-4, 7, and Beyond

3

Types of Video Signals

3 types: Component Video, Composite Video, S-Video Component Video – 3 signal

use 3 separate video signals for red, green, and blue image planes. most computer systems use it. get best color reproduction since no crosstalk between channels. however, requires more bandwidth and good synchronization.

Composite Video - 1 signal chrominance and luminance signals are mixed into a single carrier. chrominance is composition of (I and Q, or U and V) a color subcarrier put chrominance at high-frequency end of the

signal shared with luminance signal. some interference between luminance and chrominance signals.

S-Video - 2 Signals uses two wires for luminance and composite chrominance signals. less crosstalk between them.

4

Analog Video

Interlaced scanning odd-numbered lines traced first, then even-numbered lines traced horizontal retrace: the jump from Q to R, during which the

electronic beam in CRT is blanked. vertical retrace: the jump from T to U or V to P.

NTSC (National Television System Committee) TV standard used in North America and Japan. 4:3 aspect ratio (ratio of picture width to height) 525 scan lines per frame at 30 frames per second (fps).

5

Digital Video

Advantages: stored in memory, ready to be processed (noise removal, cut and paste),

and integrated to various multimedia applications repeated recording does not degrade image quality ease of encryption and better tolerance to channel noise

Chroma Subsampling human see color with much less spatial resolution than black/white how many pixel values should be actually sent? scheme (4:4:4): no chroma subsampling is used: each pixel's Y, Cb and Cr

values are sent. scheme (4:2:2): horizontal subsampling of Cb, Cr signals by a factor of 2. a

ll Ys are sent, and every two Cb's and Cr's are sent. scheme (4:1:1): subsamples horizontally by a factor of 4 scheme (4:2:0): subsamples in both the horizontal and vertical dimension

s by a factor of 2. (used in JPEG and MPEG)

6

Video Compression

A video consists of a time-ordered sequence of frames, i.e.,images. Video Compression

(Static) predictive coding based on previous frames. temporal redundancy: consecutive frames in a video are similar subtract images in time order, and code the residual error.

The approach of deriving the difference image (subtract image from the other) is ineffective because of object motion.

Steps of Video compression based on Motion Compensation (MC)1. Motion Estimation (motion vector search).2. MC-based Prediction.3. Derivation of the prediction error, i.e., the difference.

7

Motion Compensation

For efficiency, each image is divided into macroblocks of size N X N. The current image frame is referred to as Target Frame. A match is sought between the macroblock in the Target Frame and the most

similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)).

motion vector MV: the displacement of the reference macroblock to the target macroblock.

Prediction error: the difference of two corresponding macroblocks.

8

Video Coding Evolution

9

H.261

An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards.

Designed for videophone, video conferencing and other audiovisual services over ISDN.

The video codec supports bit-rates of p 64 kbps, where p ranges from 1 to 30. The delay of the video encoder must be less than 150 msec so that the video c

an be used for real-time bidirectional video conferencing. H.261 Frame Sequence:

10

H.261 Frame Sequence

Two types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames): I-frames are treated as independent images. Transform coding method s

imilar to JPEG is applied within each I-frame, hence “Intra”. P-frames are not independent: coded by a forward predictive coding me

thod (prediction from a previous P-frame is allowed –not just from a previous I-frame).

Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal.

Interval between pairs of I-frames is a variable. Usually, an ordinary digital video has a couple I-frames per second.

11

Intra-frame (I-frame) Coding

Macroblocks are of size 16X16 pixels for the Y frame, and 8X8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8X8 blocks.

For each 8X8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding.

12

Inter-frame (P-frame) Predictive Coding

H.261 P-frame coding scheme based on motion compensation: For each macroblock in Target frame, a motion vector is allocated by sea

rch method. After the prediction, a difference macroblock is derived to measure the prediction error.

Each of these 8X8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures.

Sometimes, a good matchcannot be found, then encode the macroblockas an intra macroblock.

The quantization in H.261 uses a constant step size, for all DCT coefficients within a macroblock.

13

H.261 encoder and decoder

14

A Glance at Syntax of H.261 Video Bitstream

A hierarchy of four layers: Picture, Group of Blocks (GOB), Macroblock, and Block.

15

Syntax of H.261

Picture layer: PSC (Picture Start Code) delineates boundaries between pictures. TR (Temporal Reference) provides a time-stamp for the picture.

GOB layer: H.261 pictures are divided into regions of 11X3 macroblocks, each of which is called a Group of Blocks (GOB). In case a network error causes a bit error or the loss of some bits, H.261 vi

deo can be recovered and resynchronized at the next identifiable GOB. Macroblock layer: Each Macroblock (MB) has its own Address indicating its p

osition within the GOB, Quantizer (MQuant), and six 8X8 image blocks (4 Y, 1Cb, 1 Cr).

Block layer: For each 8X8 block, the bitstream starts with DC value, followed by pairs of length of zerorun (Run) and the subsequent non-zero value (Level) for ACs, and finally the End of Block (EOB) code.

16

H.263

An improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN). aims at low bit-rate communications at bit-rates of less than 64 kbps. uses predictive coding for inter-frames to reduce temporal redundancy a

nd transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction).

The difference is that GOBs in H.263 do not have a fixed size, and they always start and end at the left and right borders of the picture.

17

Optional H.263 Coding Modes

H.263 specifies many negotiable coding options.1. Unrestricted motion vector mode2. Syntax-based arithmetic coding mode3. Advanced prediction mode4. PB-frames mode

Introduction of a B-frame (predicted bidirectionally) Improve the quality of prediction. The PB-frames mode yields

satisfactory results for videos with moderate motions.

Under large motions, PB-framesdo not compress as well as B-frames.

18

MPEG

MPEG (Moving Pictures Experts Group), established in 1988 for the development of digital video.

MPEG-1 adopts CCIR601 digital TV format: SIF (Source Input Format). supports only non-interlaced video.

Normally, MPEG-1picture resolution is: 352X240 for NTSC video at 30 fps, or 352X288 for PAL video at 25 fps It uses 4:2:0 chroma subsampling.

MPEG-1 standard has 5 parts: ISO/IEC 11172-1 system 11172-2 Video 11172-3 Audio 11172-4 Conformance 11172-5 Software

19

Motion Compensation in MPEG-1

Motion Compensation (MC) based video encoding in H.261 works as :

In Motion Estimation (ME), each macro-block (MB) of the Target P-

frame is assigned a best matching MB from the previously coded I

or P frame - prediction.

prediction error: The difference between the MB and its matching

MB, sent to DCT and its subsequent encoding steps.

The prediction is from a previous frame - forward prediction.

The MB containing part of a ball in the Target frame cannot find a good

matching MB in the previous frame because half of the ball was

occluded by another object. A match however can readily be obtained

from the next frame.

20

Motion Compensation in MPEG-1 (Cont'd)

MPEG introduces a third frame type: B-frames, and its accompanying bi-directional motion compensation. Each MB from a B-frame will have up to two motion vectors (MVs) (one fr

om the forward and one from the backward prediction). If matching in both directions is successful, then two MVs will be sent and

the two corresponding matching MBs are averaged (indicated by `%' in the figure) before comparing to the Target MB for generating the prediction error.

If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.

21

B-frame Coding Based on Bidirectional Motion Compensation.

MPEG Frame Sequence

MPEG-1

22

Other Major Differences from H.261

Instead of GOBs as in H.261, an MPEG-1 picture can be divided into one or more slices. May contain variable numbers of macro-blocks in a single picture. May start and end anywhere as long as they fill the whole picture. Each slice is coded independently (flexibility in bit-rate control). Slice concept is important for error recovery.

23

Typical Sizes of MPEG-1 Frames

Size of compressed P-frames is significantly smaller than of I-frames. B-frames are smaller than P-frames. (B-frames: lowest priority).

24

MPEG-2

MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps. Defined 7 profiles aimed at different applications:

Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview. Within each profile, up to 4 levels are defined. The DVD video specification allows only 4 display resolutions: 720X480,

704X480, 352X480, and 352X240 (a restricted form of the MPEG-2 Main profile at the Main and Low levels).

Four Levels in the Main Profile of MPEG-2

25

Supporting Interlaced Video

MPEG-2 supports interlaced video for digital broadcast TV and HDTV.

In interlaced video, each frame (picture) consists of two fields. If each

field is treated as a separate picture, then is called Field-picture.

5 Modes of Predictions: (wide range of applications requirement for

accuracy and speed of motion compensation vary) Frame Prediction for Frame-pictures Field Prediction for Field-pictures Field Prediction for Frame-pictures 16X8 MC for Field-pictures Dual-Prime for P-pictures

26

MPEG-2 Scalabilities

layered coding: a base layer and one or more enhancement layers.

MPEG-2 supports the following scalabilities:

SNR Scalability- enhancement layer provides higher SNR.

Spatial Scalability- enhancement layer provides higher spatial resolution.

Temporal Scalability- enhancement layer facilitates higher frame rate.

Hybrid Scalability- combination of any two of the above three scalabilities.

Data Partitioning- quantized DCT coefficients are split into partitions.

27

MPEG-4

MPEG-4 adopts a new object-based coding approach. (not frame-based compression coding)

object-based coding has higher compression ratio and good for digital video composition, manipulation, indexing, and retrieval.

Its 6 parts are system, video, audio, conformance, software, and DMIF (Delivery Multimedia Integration Framework).

bit-rate range for MPEG-4 video now between 5 kbps to 10 Mbps.

28

Comparison of interactivities in MPEG standards:

MPEG-4 standard for: Composing media objects to create desirable audiovisual scenes. Multiplexing and synchronizing the bitstreams so that they can be trans

mitted with guaranteed Quality of Service (QoS). Interacting with audiovisual scene at receiving end (provides a toolbox of

advanced coding modules and algorithms for audio and video compressions).

Reference models in MPEG-1 and 2 (interaction in dashed lines supported only by MPEG-2)

MPEG-4 reference model

29

Hierarchical structure of MPEG-4 visual bitstreams

Video-object Sequence (VS) - delivers the complete MPEG-4 visual scene, which may contain 2-D or 3-D natural or synthetic objects.

Video Object (VO) - a particular object in the scene, which can be of arbitrary (non-rectangular) shape corresponding to an object or background of the scene.

Video Object Layer (VOL) - facilitates a way to support (multi-layered) scalable coding. A VO can have multiple VOLs under scalable coding, or have a single VOL under non-scalable coding.

Group of Video Object Planes (GOV) - groups Video Object Planes together (optional level).

Video Object Plane (VOP) - a snapshot of a VO at a particular moment.

Each VS will have one or more VOs,each VO will have one or more VOLs,and so on.

30

VOP-based Coding

MPEG-1 and -2 do not support the VOP concept, and hence their coding method is referred to as frame-based (block-based) coding.

MPEG-4 VOP-based coding employs Motion Compensation technique: Intra-frame coded VOP is called I-VOP. Inter-frame coded VOPs are called P-VOPs (forward prediction) or B-VOP

s (bi-directional Predictions).

(a) A video sequence;

(b) MPEG-1 and 2 block-based coding.

(c) Two potential matches in MPEG-1 and 2

(d) object-based coding in MPEG-4

31

ISO MPEG-4 Part10/ ITU-T H.264

Offers up to 50% better compression than MPEG-2, and up to 30% over H.263+ and MPEG-4 advanced simple profile.

The leading candidates to carry High Definition TV (HDTV) video content on many potential applications.

Core features: Entropy decoding, Motion compensation (P-prediction), Intra-prediction

(I-prediction), Transform, scan, quantization, and In-loop deblocking filters.

Baseline profile features Arbitrary slice order (ASO), Flexible macroblock order (FMO), redundant s

lices Main profile features

B slices, Context adaptive binary arithmetic coding (CABAC), weighted prediction

Extended profile features B slices, weighted prediction, Slice data partitioning, SP and SI slice type

s.

32

MPEG-7

To serve the need of audiovisual content-based retrieval (or audiovisual object retrieval) in applications such as digital libraries.

The formal name Multimedia Content Description Interface.

33

MPEG-7 and Multimedia Content Description

MPEG-7 has developed Descriptors (D), Description Schemes (DS) and Description Definition Language (DDL). The following are some of the important terms: Feature - characteristic of the data. Description - a set of instantiated Ds and DSs that describes the structur

al and conceptual information of the content, the storage and usage of the content, etc.

D - definition (syntax and semantics) of the feature. DS - specification of the structure and relationship between Ds and betw

een DSs. DDL - syntactic rules to express and combine DSs and Ds.

The scope of MPEG-7 is to standardize the Ds, DSs and DDL for descriptions. The mechanism and process of producing and consuming the descriptions are beyond the scope of MPEG-7.

1 Video Concepts and Techniques Wen-Shyang Hwang KUAS EE.

Documents

composite video

video concepts

separate video signals

video coding evolutionh

steps of video compression

svideocomponent video

mpegvideo compressiona

luminance signals