Top Banner
Wavelet-based Scalable Video Coding A Joint Application Assignment Report for the course of Wavelets: EE-678 by TEAM-1 under the guidance of Prof. Vikram M. Gadre Department of Electrical Engineering Indian Institute of Technology, Bombay Powai, Mumbai - 400 076. April, 2012
49
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: R_V

Wavelet-based Scalable VideoCoding

A Joint Application Assignment Report for the course ofWavelets: EE-678

by

TEAM-1

under the guidance of

Prof. Vikram M. Gadre

Department of Electrical EngineeringIndian Institute of Technology, Bombay

Powai, Mumbai - 400 076.

April, 2012

Page 2: R_V

Abstract

The scalable video coding (SVC) aims of single bitstream which is adaptable toheterogenous devices. For example, a mobile may need a lower resolution videoas compared to a digital TV. Here, SVC aims to generate a bitstream which candecoded by both mobile and TV as per the requirements. In this assignment, wehave tried to simulate one of the simple model for wavelet-based scalable videocoding (WSVC). Our initial attempt was to implement spatially scalable videocoding. Later we extended it with few changes for quality (PSNR) scalability.Core part of the model is based on set partitioning in hierarchical tree (SPIHT)algorithm.

Page 3: R_V

Contents

1 Introduction 11.1 Spatial Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 GR coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 PSNR Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 SPIHT Algorithm [22] . . . . . . . . . . . . . . . . . . . . . 41.3 Temporal Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Literature Survey 82.1 Reviewer: Ankit Bhurane (114070004) . . . . . . . . . . . . . . . . 8

2.1.1 State-of-the-Art and Trends in Scalable Video CompressionWith Wavelet-Based Approaches [1] . . . . . . . . . . . . . . 8

2.1.2 Wavelet Based Rate Scalable Video Compression [25] . . . . 82.1.3 Transforms for the motion compensation residual [12] . . . . 82.1.4 Advances on Transforms for High Efficiency Video Coding [9] 9

2.2 Reviewer: Prateek Chaplot (08D07011) . . . . . . . . . . . . . . . . 92.2.1 Adaptive Bit-Rate Control for Region-of-Interest Scalable

Video Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.2 Muliplexing/ De-Muliplexing DIRAC Video with AAC Au-

dio Bit-Stream . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Performance analysis of Dirac video codec with H.264/ AVC 10

2.3 Reviewer: Raj Doshi (08D07031) . . . . . . . . . . . . . . . . . . . 102.3.1 Spatio-Temporal Scalability for MPEG Video Coding [10] . . 102.3.2 Scalable video coding using motion-compensated temporal

filtering [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.3 Adaptive Temporal Scalability of H.264-compliant Video Con-

ferencing in Heterogeneous Mobile Environments [8] . . . . . 132.3.4 An Efficient Inter-Prediction Mode Decision Method for Tem-

poral Scalability Coding With Hierarchical B-Picture Struc-ture [15] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Reviewer: Tripti Meena (08D07015) . . . . . . . . . . . . . . . . . . 142.4.1 Embedded Zerotree Wavelet Based Algorithm For Video

Compression [18] . . . . . . . . . . . . . . . . . . . . . . . . 142.4.2 A Scalable Video Compression Technique Based on Wavelet

Transform and MPEG Coding [5] . . . . . . . . . . . . . . . 15

Page 4: R_V

Contents ii

2.4.3 A Resolution And Frame Rate Scalable Subband/WaveletVideo Coder [31] . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.4 Bottom-Up Motion Compensated Prediction In Wavelet Do-main For Spatially Scalable Video Coding [29] . . . . . . . . 16

2.5 Reviewer: Saket Porwal (11307R006) . . . . . . . . . . . . . . . . . 172.5.1 Embedded image coding using zerotrees of wavelet coeffi-

cients [24] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5.2 A new, fast, and efficient image codec based on set parti-

tioning in hierarchical trees [21] . . . . . . . . . . . . . . . . 172.5.3 Scalable motion vector coding [2] . . . . . . . . . . . . . . . 172.5.4 Quantifying the Coding Performance of Zerotrees of Wavelet

Coefficients Degree-k Zero tree [6] . . . . . . . . . . . . . . . 182.6 Reviewer: Vijayan (08D07026) . . . . . . . . . . . . . . . . . . . . . 18

2.6.1 Robust Digital Watermarking for Wavelet-based Compres-sion [11] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6.2 A DWT-based Digital Video Watermarking Scheme withError Correcting Code [4] . . . . . . . . . . . . . . . . . . . 19

2.6.3 A Comparison of Temporal Scalability Techniques [7] . . . . 202.7 Reviewer: Rahul Bharadwaj (08D07012) . . . . . . . . . . . . . . . 21

2.7.1 Temporal-Scalable Coding Based on Image Content [13] . . 212.7.2 Three-dimensional Subband Coding of Video [19] . . . . . . 222.7.3 Three-Dimensional Wavelet Coding of Video with Global

Motion Compensation [30] . . . . . . . . . . . . . . . . . . . 222.7.4 A Scalable Video Compression Technique Based on Wavelet

Transform and MPEG Coding [5] . . . . . . . . . . . . . . . 222.7.5 Improvements in Wavelet-Based Rate Scalable Video Com-

pression by Eduardo Asbun . . . . . . . . . . . . . . . . . . 222.8 Reviewer: Shrikant Bagde (113230019) . . . . . . . . . . . . . . . . 23

2.8.1 A Scalable Wavelet Based Video Distortion Metric and Ap-plications [17] . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.8.2 A Wireless Video Streaming System Based on OFDMA withMulti-Layer H.264 Coding and Adaptive Radio Resource Al-location [17] . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.8.3 Highly Scalable Video CompressionWith Scalable Motion-Coding [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.8.4 Rate Control of H.264/AVC Scalable Extension [16] . . . . . 242.9 Reviewer: Newton (113079018) . . . . . . . . . . . . . . . . . . . . 24

2.9.1 An embedded wavelet hierarchical image coder [23] . . . . . 242.9.2 Highly scalable wavelet-based video codec for very low bit-

rate environment [27] . . . . . . . . . . . . . . . . . . . . . . 252.9.3 Image Compression Using the Spatial-Orientation Tree [20] . 262.9.4 Low Bit-Rate Scalable Video Coding with 3-D Set Parti-

tioning in Hierarchical Trees (3-D SPIHT) [14] . . . . . . . . 26

Page 5: R_V

Contents iii

3 Work Done and Results 273.1 Spatial Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2 Temporal Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.1 Motion Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.2 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 353.2.3 Difference Encoding . . . . . . . . . . . . . . . . . . . . . . 35

4 MATLAB code 36

Bibliography 41

Page 6: R_V

List of Figures

1.1 Spatial-cum-PSNR scalable video coder . . . . . . . . . . . . . . . . 21.2 Data Structure used in SPIHT . . . . . . . . . . . . . . . . . . . . . 51.3 Group of frames and bidirectional prediction . . . . . . . . . . . . . 7

3.1 Performance for sequence: CITY_704× 576_30_orig_01.yuv . . . 283.2 Performance for sequence: CREW_704× 576_30_orig_01.yuv . . 293.3 Performance for sequence: HARBOUR_704× 576_30_orig_01.yuv 303.4 Performance for sequence: ICE_704× 576_30_orig_01.yuv . . . . 313.5 Performance for sequence: SOCCER_704× 576_30_orig_01.yuv . 323.6 Performance for sequence: VMG_352× 288_25_orig_01.yuv . . . 333.7 Motion estimation and compensation . . . . . . . . . . . . . . . . . 343.8 Finding the best match and the motion vectors . . . . . . . . . . . 35

Page 7: R_V

Chapter 1

Introduction

For video streaming applications, normally the server has to serve heterogeneousdevices ranging from mobile, Ipad, digital HD TV, etc. These devices have dif-ferent resolutions and processing capabilities. Scalable video coding allows adap-tation of the codec to such variety of devices. There can be many varients ofintroducing a scalability in a vidio codec. Few of them are listed below.

• Spatial Resolution Scalability

• Temporal Scalability

• Quantization Noise (SNR) Scalability

• Sharpness/Frequency Scalability

• Robustness Scalability

• View Scalability

• Bit Depth Scalability

• Chroma Scalability

• Interlace-to-Progressive Scalability

• Compexity Scalability

We can go on extending the list as per the requirements. In this assignment wehave considered following three popular techniques of scalability,

1.1 Spatial ScalabilityA simple two level (base layer + one enhancement layer) model for wavelet-basedspatial-cum-PSNR scalability can be thought of as shown in Figure 1.1. Here wehave considered the spatial resolution factor of "2" i.e., the spatial resolution is

Page 8: R_V

1.1. Spatial Scalability 2

doubled as we go up the layers. Also it should be noted that initially we areconsidering "all intra" frame structure. So, we haven’t incorporated the motion-estimation and motion-compensation (ME/ MC) scheme in the model shown here.We will bring the ME/ MC module later in this report. Initially at the encoderSpatial scalable video coding model

Raw Video

Frame (YUV)

Base-layer

bitstreamEncoding

Frame-wise

Frame-wise

EZW/ SPIHT

Difference

2D DWT

Quantization

(scalar/ vector/

optimal)

Run-length +

Huffman/ GR

coding

Enhancement

layer-1

bitstream

Frame-wise

IEZW/ ISPIHT Decoding

Figure 1.1: Spatial-cum-PSNR scalable video coder

side, a raw video frame was given as an input to downsampler which will resizethe input frame to half the original size. The resized frame is given as an inputto the set partitioning in hierarchical trees (SPIHT) algorithm. Details of theSPIHT will be given later in this report. The resultant bit stream will contributeto base-layer (BL) i.e. a layer with lowest resolution. The BL bit stream can bethen decoded to get the lowest resolution of the video frame. For encoding theenhancement level information, the BL bit stream is decoded and resized backto original resolution at the encoder side. The difference (residue) of the originalframe with the decoded-cum-resized frame is calculated. This difference is given asinput to a three level conventional DWT. The resultant subbands of the residualimage are quantized by an optimal quantizer. For an L-level 2D DWT of an imagewith a bit budget of R bpp, the number of bits for the quantizer in the ith subbandcan be obtained from [28],

Ri = R +1

2log2

σ2i(∏3L+1

k=1 σ2k

) 1(3L+1)

, 1 6 i 6 3L+ 1 (1.1)

The quantized coefficients have large number of zeros. so, the quantized coefficientscan be reduced by run-length encoding (RLE) the quantized coefficients. Thecoefficients can be further compressed using either Huffman coding or Arithmetic

Page 9: R_V

1.2. PSNR Scalability 3

coding. Here, as the image to be encoded is a residual (difference image), it haveexponential distribution. For such image Golomb-Rice (GR) encoding is found tobe effective than Huffman or Arithmetic coding. Also unlike Huffman coding, incase of GR-coding, we don’t need to send any table information to the decoder.

1.1.1 GR coding

• GR Coding Procedure: Given an integer n to be coded and an m = 2k,where k is a positive integer,

1. Map n to n̂ using,

M(n) =

{2n , if n > 0

2|n| − 1 , if n < 0(1.2)

2. Compute the quotient Q = b n̂mc and the remainder R = n̂−mQ.

3. Concatenate the unary code of Q with the k-bit binary code of R.

• GR Decoding Procedure: Given the GR code of an integer and the value ofm, decoding the GR code can be described as follows.

1. Compute k = log2m.

2. Count the number of zeros starting from the most significant bit posi-tion until a "1" is encountered. The number of zeros gives the value ofQ. Compute P = m×Q .

3. Treat the remaining k-digits in the given GR code as the k-bit binaryrepresentation of the integer R, n̂ = P +R.

4. The encoded integer

n =

{n̂2

, if n̂ is even− n̂+1

2, if n̂ is odd

(1.3)

1.2 PSNR ScalabilityPSNR scalability can be inherently obtained from SPIHT algorithm. For base layerwe may transmit only few significant coefficients. Enhancement layer informationcan be formed by further allowing refinement in the SPIHT coefficients.

PSNR scalibility is added into an encoder’s bit stream using progressive trans-mission of bits. Bits corresponding to significant larger in magnitude coefficientsin the transformed image, are transmitted earlier than smaller wavelet coefficients.The decoder at the receiver side first displays a coarser image using important bitsand as more refinement bits are received, a fine, high quality image with higherPSNR is rendered. Hence due to progressive transmission the initial rendered

Page 10: R_V

1.2. PSNR Scalability 4

image has low PSNR and as the refinement bits are added PSNR improves. De-pending on the requirement of the receiver and its device capability, it can choosethe refinement bits further required to meet the device’s requirements.

1.2.1 SPIHT Algorithm [22]

The data structure used by the SPIHT algorithm is similar to that used by theEZW algorithm. The wavelet coefficients are again divided into trees originatingfrom the lowest resolution band (band I in our case). The coefficients are groupedinto 2 × 2 arrays that, except for the coefficients in band I, are offsprings of acoefficient of a lower resolution band. The coefficients in the lowest resolutionband are also divided into 2 × 2 arrays. However, unlike the EZW case, all butone of them are root nodes. The coefficient in the top-left corner of the array doesnot have any offsprings. The data structure is shown pictorially in Figure 1.2 fora seven-band decomposition. The trees are further partitioned into four types ofsets, which are sets of coordinates of the coefficients:

The trees are further partitioned into four types of sets, which are sets ofcoordinates of the coefficients:

• O(i, j): This is the set of coordinates of the offsprings of the wavelet coeffi-cient at location (i, j). As each node can either have four offsprings or none,the size of O(i, j) is either zero or four. For example, in Figure 15.20 the setO(0, 1) consists of the coordinates of the coefficients b1, b2, b3, and b4.

• D(i, j): This is the set of all descendants of the coefficient at location (i, j).Descendants include the offsprings, the offsprings of the offsprings, and so on.For example, in Figure 15.20 the set D > (0, 1) consists of the coordinates ofthe coefficients b1, ..., b4, b11, ..., b14, ..., b44. Because the number of offspringscan either be zero or four, the size of D(i, j) is either zero or a sum of powersof four.

• H: This is the set of all root nodes-essentially band I in the case of Figure1.2.

• L(i, j): This is the set of coordinates of all the descendants of the coefficientat location (i, j) except for the immediate offsprings of the coefficient atlocation (i, j). In other words, L(i, j) = D(i, j)− O(i, j). In Figure 1.2 theset L(0, 1) consists of the coordinates of the coefficients b11, ..., b14, ..., b44.

A set D(i, j) or L(i, j) is said to be significant if any coefficient in the set has amagnitude greater than the threshold. Finally, thresholds used for checking sig-nificance are powers of two, so in essence the SPIHT algorithm sends the binaryrepresentation of the integer value of the wavelet coefficients. The bits are num-bered with the least significant bit being the zeroth bit, the next bit being thefirst significant bit, and the kth bit being referred to as the k − 1 most significantbit.

Page 11: R_V

1.2. PSNR Scalability 5

Figure 1.2: Data Structure used in SPIHT

With these definitions, the algorithm makes use of three lists: the list of in-significant pixels (LIP), the list of significant pixels (LSP), and the list of insignif-icant sets (LIS). The LSP and LIS lists will contain the coordinates of coefficients,while the LIS will contain the coordinates of the roots of sets of type D or L. Westart by determining the initial value of the threshold. We do this by calculatingn = blog2Cmaxc, where Cmax is the maximum magnitude of the coefficients to beencoded. The LIP list is initialized with the set H. Those elements of H thathave descendants are also placed in LIS as type D entries. The LSP list is initiallyempty.

In each pass, we will first process the members of LIP, then the members ofLIS. This is essentially the significance map encoding step. We then process theelements of LSP in the refinement step.

Page 12: R_V

1.3. Temporal Scalability 6

We begin by examining each coordinate contained in LIP. If the coefficientat that coordinate is significant (that is, it is greater than 2n), we transmit a 1followed by a bit representing the sign of the coefficient (we will assume 1 forpositive, 0 for negative). We then move that coefficient to the LSP list. If thecoefficient at that coordinate is not significant, we transmit a 0.

After examining each coordinate in LIP, we begin examining the sets in LIS.If the set at coordinate (i, j) is not significant, we transmit a 0. If the set issignificant, we transmit a 1. What we do after that depends on whether the set isof type D or L.

If the set is of type D, we check each of the offsprings of the coefficient at thatcoordinate. In other words, we check the four coefficients whose coordinates arein O(i, j). For each coefficient that is significant, we transmit a 1, the sign of thecoefficient, and then move the coefficient to the LSP. For the rest we transmit a 0and add their coordinates to the LIP. Now that we have removed the coordinatesof O(i, j) from the set, what is left is simply the set L(i, j). If this set is not empty,we move it to the end of the LIS and mark it to be of type L. Note that this newentry into the LIS has to be examined during this pass. If the set is empty, weremove the coordinate (i, j) from the list. If the set is of type L, we add eachcoordinate in O(i, j) to the end of the LIS as the root of a set of type D). Again,note that these new entries in the LIS have to be examined during this pass. Wethen remove (i, j) from the LIS.

Once we have processed each of the sets in the LIS (including the newly formedones), we proceed to the refinement step. In the refinement step we examine eachcoefficient that was in the LSP prior to the current pass and output the nth mostsignificant bit of |ci,j|. We ignore the coefficients that have been added to thelist in this pass because, by declaring them significant at this particular level, wehave already informed the decoder of the value of the nth most significant bit.This completes one pass. Depending on the availability of more bits or externalfactors, if we decide to continue with the coding process, we decrement n by oneand continue.

1.3 Temporal ScalabilityAfter the removal of spatial redundancy, removing temporal redundancy can resultin further compression. To do this, only parts of the new frame that have changedfrom the previous frame are sent. In most cases, changes between frames aredue to movement in the scene that can be approximated as simple linear motion.From the previous transmitted frames, we can predict the motion of regions andsend only the prediction error (motion prediction). This way the video bit rate isfurther reduced. Three standard temporal coding techniques are as follows.

1. Temporal Subband Coding (TSB): The most basic technique - gives a natu-ral multiresolution decomposition into frames that are halved at each anal-ysis level thereby providing lower frame-rate video by decoding low-pass

Page 13: R_V

1.3. Temporal Scalability 7

subbands. However, motion is blurred in lower frame-rate video in this tech-nique since frames are generated by a linear combination of full-frame-ratevideo frames. (The visual below shows 2 different levels of temporal scaling,1/8 and 1/4 respectively.)

2. Motion-Compensated Temporal Subband Coding (MC-TSB): TSB with mo-tion compensation. Motion compensation is usually performed on multi-pixel blocks, in half-pel or full-pel modes, to avoid the computational com-plexity involved with pixel level computations. In full-pel motion compen-sation, the unaltered frames are used with the nominal pixel resolution. Inhalf-pel estimation, the resolution of all frames is doubled, creating a half-pixel grid.

3. Motion Compensated Prediction (MCP): Compensation improves the effi-ciency of prediction. Two major prediction techniques have been exploredviz. telescopic prediction and recursive prediction.

Motion estimation based on only the previous frame does not work in every sit-uation. For example, a scene may contain movements where the moving objectsreveal parts of the background that were previously hidden. These hidden partscannot be successfully predicted from the previous frame. In such cases, predictioncan be improved by using a combination of motion prediction from the previousframe and from a future frame. This is known as interpolated or bi-directionalmotion prediction. In bi-directional motion prediction, the encoder searches for amatching block in the previous and in the future frame as shown in the Figure 1.3

Temporal Scalability(Dyadic prediction structure)

Frame Rate = 3.75 fpsFrame Rate = 7.5 fpsFrame Rate = 15 fpsFrame Rate = 30 fps

PredictionGOP border GOP border

• Group of Pictures (GOP)

– Key Picture: Typically Intra-coded

– Hierarchically predicted B Pictures: Motion-Compensated Prediction

Key Picture Key PictureT0

T0T1T2T2T3

T3T3

T3

Tx : Temporal Layer Identifier

Structural Delay = 7 frames

Figure 1.3: Group of frames and bidirectional prediction

Page 14: R_V

Chapter 2

Literature Survey

2.1 Reviewer: Ankit Bhurane (114070004)

2.1.1 State-of-the-Art and Trends in Scalable Video Com-pression With Wavelet-Based Approaches [1]

This paper presents one of the promising solution for wavelet-based scalable videocompression (WSVC). Starting with the basics, the paper also serves as a tutorialfor the beginners in WSVC. The paper also describes VidWav (Video Wavelet)model which uses STP-tool (Spatio-Temporal) the joint effort of ITU and ISO/MPEG in WSVC. The current VidWav model seems to give a promising solutionfor a WSVC. Few problems with wavelets are also addressed at the end. Theproblems emphasizes lack of deep understanding of WSVC and its componentsand encourages to look further in wavelet-based approaches.

2.1.2 Wavelet Based Rate Scalable Video Compression [25]

In this paper a new Scalable Adaptive Motion Compensated Wavelet (SAMCoW).SAMCoW uses motion compensation to reduce temporal redundancy. The pro-posed algorithm provides wide range of continuous rate scalability. Simulationresults show comparable performance to MPEG1 and H.263.

2.1.3 Transforms for the motion compensation residual [12]

Most of the video compression standards uses DCT to encode both video frameand motion-compensated (MC) residual. However, MC residual posses a differentstatistical characteristic than an image. This paper explores these differences andpropose a set of block transforms for the MC residual.

Page 15: R_V

2.2. Reviewer: Prateek Chaplot (08D07011) 9

2.1.4 Advances on Transforms for High Efficiency VideoCoding [9]

This thesis gives a brief overview on the upcoming high efficiency video coding(HEVC) standard. A novel transform coding module is proposed and integrated inexisting software model. The experiments show encouraging results (16.0% bitratesaving in relation to the DCT) especially for HD content.

2.2 Reviewer: Prateek Chaplot (08D07011)

2.2.1 Adaptive Bit-Rate Control for Region-of-Interest Scal-able Video Coding

In this paper a new adaptive bit-rate control technique has been suggested for Re-gion Of Interest (ROI) Scalable Video Coding (SVC). As the technology progressesthere is an increasing demand to simultaneously transmit or store video in vari-ety of spatial/ temporal resolutions and qualities, leading to the video bit-streamscalability. Due to the constraints on bandwidth and computation, various videoapplications demand for high quality video content in a specific region of interestwhich can be extracted form the encoded video bit stream. The recent researchesin the SVC field do not consider either the scenarios of video content-based layercoding or decoder complexity-based layer coding. Therefore, there is a continuousneed to allow joint video content-adaptive and user-adaptive scalable video coding,while considering a variety of different user devices (a variety of decoders havingdifferent computational capabilities). This paper works towards providing solutionto these needs. In the paper an introduction on the ROI encoding is given alongwith different methods to achieve it adopted by various papers. ROI encoding hassignificant impact on the quality of the decoded video sequence but this effect isless in the case of fixed camera and static background. This realization opens upvarious avenues for the implementation of ROI approach in video conferencing andvideo surveillance. The paper shows that by implementing the proposed adaptivebit-rate control for the ROI SVC, smaller quantization parameters can be used,while obtaining the same compression rate. As a result, the SVC video quality atthe decoder side will be significantly improved even when the available bandwidthand computational resources are low, which can be very useful for various mobiledevices, such as cellular phones. Further the paper suggests that the proposedadaptive bit-rate control scheme is especially useful for future Internet and 4Gapplications with limited computational resources and/ or with a limited channelbandwidth, such as video conferencing (between two or more mobile device users),video transrating, video transcoding between video coding standards, and manyother applications.

Page 16: R_V

2.3. Reviewer: Raj Doshi (08D07031) 10

2.2.2 Muliplexing/ De-Muliplexing DIRAC Video with AACAudio Bit-Stream

With the tremendous growth in network infrastructure, storage capacity and com-puting power, a large number of video applications, used in day to day life, areusing variety of transmission and storage systems. With such an extensive usageof video applications development of a good compression scheme is vital. Diracvideo codec, developed by BBC, is an open technology and it doesn’t require anylicensing fees. Dirac is a state of an art video codec which matches the perfor-mance of the industry standard h.264. Thus Dirac is used as the video codec inthis thesis along with advanced audio coding (AAC) as the audio codec. Thisthesis discusses encoding video using Dirac and audio based on AAC, multiplex-ing the two coded bit-streams for transmission, packetization, de-multiplexing thetwo coded bit-streams at the receiver’s end, decoding the video (Dirac) and audio(AAC) while maintaining the lip sync during playback.

2.2.3 Performance analysis of Dirac video codec with H.264/AVC

With significant growth in network infrastructure, storage capacity and computingpower, a large number of video applications, used in day to day life, are usingvariety of transmission and storage systems. To develop compression codecs whichprovide optimum saving in terms of data rate while maintaining a desired qualityhas become a matter of prime importance. H.264/ AVC is a standard for videocompression and currently is one of the most widely used codec for recording,compression and transmission of video data. It has proven to be superior to earlierstandards in terms of compression ratio, quality, bitrates and error resilience.However unlike Dirac, it requires payment of license/ patent fees. This paperbriefly discusses the architecture of Dirac and H.264 codec. The objective of thispaper is to implement Dirac video codec (encoder and decoder) based on inputtest sequences, and compare its performance with H.264/ AVC. Analysis has beendone on Dirac and H.264 using QCIF video test sequences as input and the resultshave been recorded graphically for various parameters including compression ratio,bit rate, PSNR and MSE. The results out of these various comparisons based ondifferent factors show that H.264/ AVC outperforms the Dirac codec.

2.3 Reviewer: Raj Doshi (08D07031)

2.3.1 Spatio-Temporal Scalability for MPEG Video Coding[10]

The goal of this paper is to improve spatial scalability of MPEG-2 for progressivevideo. It proposes spatio-temporal scalability to avoid the problem of very large

Page 17: R_V

2.3. Reviewer: Raj Doshi (08D07031) 11

bitstreams produced by some of the previously proposed spatial scalable decoders.In the base layer, temporal resolution reduction is obtained by B-frame data par-titioning, i.e., by placing each second frame (B-frame) in the enhancement layer.Subband (wavelet) analysis is used to provide spatial decomposition of the signal.

The problem that the paper seeks to address is of unacceptably high bitrateoverheads associated with the MPEG-2 spatial scalability as compared to single-layer MPEG-2 encoding of video. The approach is developed for progressive video,however, can be extended to interlaced formats also. In the approach of waveletdecomposition, where the idea is to split each image into four spatial subbands,often leads to allocation of much higher bit rates to a base layer than to anenhancement layer, which is disadvantageous for practical applications. So thepaper proposes to use spatio-temporal scalability to answer this problem. Thebase layer corresponds to the bitstream of the pictures with reduced both spatialand temporal resolutions. Therefore, in the base layer, the bit rate is decreasedas compared to a encoder with spatial scalability only. The enhancement layeris used to transmit the information needed for restoration of the full spatial andtemporal resolution.

The paper studies two approaches for spatio-temporal scalability-3D subbandanalysis and B-frame data partitioning. In the first approach, the analysis is donein three consecutive steps: temporal, horizontal, and vertical. Temporal analysisresults in two subbands Lt and Ht of low and high temporal frequencies, respec-tively. These two subbands are then partitioned into four spatial subbands eachLL, LH, HL and HH. The three high-spatial-frequency subbands correspondingto the high-temporal-frequency are rejected since they are less relevant for thehuman visual system. LL corresponding to Lt forms the base layer and the re-maining four form the enhancement layer. In the second approach, reduction oftemporal resolution is obtained by removal of each second frame, and Reductionof spatial resolution is obtained by use of subband decomposition. Scalable en-coder: The base-layer encoder is imple-mented as a standard motion compensatedhybrid MPEG-2 encoder. In the enhancement layer encoder, motion is estimatedfor full-resolution images, and full-frame motion compensation is performed.

The paper begins with several assumptions about the shortcomings in the spa-tial scalability and proposes the spatio-temporal scalability approach. It dedicatessufficient explanation for the two approaches of 3D subband analysis and B-framedata partitioning. It then delves deeper into the implementation of the scalableencoder with illustrated structure. They verify the results of this approach byimplementing it in C++. The first goal of base-layer bitstream not exceeding thatof enhancement layer, was been reached for all bit rates and video-test sequencestested. The bit-rate overhead of the spatio-temporal scalability is much lower thanby standard spatial scalability. Also the complexity of the new encoder is only30% greater than that for a single-layer MPEG-2 encoder.

Overall, the paper is easy to understand and sufficient elaboration has beenprovided for most of the things that need explanation.

Page 18: R_V

2.3. Reviewer: Raj Doshi (08D07031) 12

2.3.2 Scalable video coding using motion-compensated tem-poral filtering [3]

The paper proposes a new scalable video coder that combines motion-compensatedtemporal filtering with an embedded wavelet based coder. It begins with a reason-able introduction of the challenges in video coding and the various aspects whichneed to be taken care of. It mentions that the problem with the conventionalstandards is that even though they already support scalability functionality, thescalable profiles of those standards have rarely been used due to the significant lossin coding efficiency and the high increase in decoder complexity. It outlines thelimitations in the other standards and approaches such as Simulcast and H.264/AVC.

The paper describes interframe wavelet video coding and illustrates the workingof the Motion Compensated Embedded Zero Block Coding (MC-EZBC) algorithm.However, the explanation provided is not clear to understand and the build-up ofthe ideas seem rather discontinuous. Ideas are simply stated as being true orfeasible rather then building them up through logical explanations and discussion.They propose a scalable video coding paradigm based on MCTF and EmbeddedBlock Coding with Optimal Truncation (EBCOT) coder that exploits the intra-band dependencies between the wavelet coefficients. Now, EBCOT is referred tothe paper in reference and to gain understanding of the further presented ideas, oneneeds to go through the idea of the EBCOT, which thus makes the paper difficult toread and grasp. Then it speaks about Half-pel motion estimation that is achievedusing fast hierarchical variable-size block matching algorithm (HVSBM), from16×16 blocks down to 4×4 blocks. Again HVSBM is something that hasn’t beenclearly introduced or explained. After this temporal analysis, the description ofthe spatial analysis is taken up. It adopts Daub. 9/ 7 analysis/ synthesis filters forthe spatial filtering. It uses the rate allocation principle, the idea of which is thatrather than adjusting an average bitrate over the entire encoded sequence, theiralgorithm performs rate allocation independently for each GOP. The algorithm isexplained using a series of steps.

The experimental results and conditions are described. The comparison oftheir algorithm is made with the other methods. Preliminary simulation resultsindicate that the algorithm outperforms the conventional MC-EZBC codec andperforms competitively with JSVM v. 12. In their future work, they plan toextend scalable representations to the motion information, rather than just thesubband sample data.

Page 19: R_V

2.3. Reviewer: Raj Doshi (08D07031) 13

2.3.3 Adaptive Temporal Scalability of H.264-compliant VideoConferencing in Heterogeneous Mobile Environments[8]

This paper aims to cater to multipoint video conferencing with mobiles. They havebuilt their system on the low complexity scalable extension of our H.264 codecDAVC. It explains the various challenges associated with video conferencing, suchas having rigid real-time constraints because it tends to be very conversational,drop down of the video conferencing performance at weak end points leading to analienating experience for users, heterogeneity due to differences in fixed and mobiledevices and so on. It then moves on to explain the ideas of scalable coding andadaptive video distribution. The usual modes of scalability are temporal, spatial,and quality scalability. Selection of layers can be static or dynamic. Dynamicadaptation faces the problems of reliably detecting network conditions, enabling asender reaction in time so that media data neither accumulate in buffers or drop,nor under-utilize the available transmission resources.

Discussion of the implementation of the DSVC-dyadic and non-dyadic layerdecomposition leads to a reduced weight of the base layer. This reduces the datarate of the base layer down to 50%. Frames of the base layer are coded withthe highest fidelity because they are used as the references for all the temporalenhancement layers. It has been explained that temporal scalability does notintroduce additional overhead and does not decrease the run time performance.

The DSVC has real time capabilities as against the JSVM encoder. Next,effects of layering and QP cascading are discussed. It turns out that their mobile-based scalable video encoder produces acceptable video quality when conformingto the tight resource constraints of the mobility regime. To cater to the networkconditions changing at run-time, adaptive layer is introduced. The objective ofthe adaptation layer is to achieve a dynamic scaling of the video transmissionappropriate to network resources at the sender without explicitly involving re-ceivers. Quick responses need to be provided by immediately detecting networkcongestions and overloads. The temporal scaling in their conferencing system isadjusted according to the available bandwidth between sender and receivers. Ex-tending experimental analysis and optimizations to maximize video performanceand extending the scalability of their codec to include spatial scaling option willentail in their future work.

2.3.4 An Efficient Inter-Prediction Mode Decision Methodfor Temporal Scalability Coding With Hierarchical B-Picture Structure [15]

This paper identifies the problem of fewer efforts that have been made in reducingcomplexity for temporal scalability coding in the hierarchical B-picture structure,although sufficient research has been done to reduce the encoding complexity

Page 20: R_V

2.4. Reviewer: Tripti Meena (08D07015) 14

for the conventional group of pictures. It mentions some of the present workthat has been done in this area, such as Liu et al. It also points out that thesemethods are not efficient for temporal scalability coding since they do not utilizethe characteristics of the hierarchical B-picture structure. In this paper they haveused statistical hypothesis testing to introduce statistical hypothesis testing. Theybegin with the explanation of the hierarchical B-picture structure, indicating thatthe frames in lower temporal scalability levels should maintain higher fidelity ofimage quality by selecting finer partitioned block modes. This is because, theycan be more frequently referred by B-pictures in higher temporal scalability levels.The idea of their fast inter-prediction mode decision is that an early terminationon the mode decision can be done when the two null hypotheses are accepted forboth mean and variance tests. This results in considerable computational saving.Variance test and the mean test are performed as the two hypothesis tests.

The results of the experiment on their proposed method are well tabulated.The proposed fast inter mode decision method exhibited up to 64.8% of encodingtime reduction with quality degradation of 0.11dB and average 2.54% bit increase.a fallback check to reduce false-alarm rates is incorporated with hypothesis testing.It improves the detection power of the homogenous regions in 16 × 16 blocksand provides an effective compromise between encoding time saving and codingefficiency. The confidence intervals are adjusted in temporal scalability levels bytaking into account the proportion of large block mode selection in hierarchicalB-picture GOP structure. So to conclude, statistical hypothesis testing is used forearly termination in the inter-prediction mode decision on hierarchical B-picturesfor temporal scalability. In order to make it efficient the early termination ininter-prediction mode decision, two sets of hypotheses on mean and variance areeffectively tested to check the similarity between the pixel values of the currentblock and those of a reference block for 16× 16 and 8× 8 block sizes.

2.4 Reviewer: Tripti Meena (08D07015)

2.4.1 Embedded Zerotree Wavelet Based Algorithm For VideoCompression [18]

In this paper the main focus is on reducing memory bandwidth and in devising anefficient architecture for real-time EZW coding. A modified 2-D subband decom-position scheme with parallelized block based EZW coder is used. A disadvantageof DCT (Discrete cosine transform) i.e. the vulnerability to block noise and re-stricted scalability is overcome by DWT(Discrete wavelet Transform). Using EZW(Embedded Zerotree Wavelet) for encoding the DWT coefficients enables preciserate control. Coding algorithm consist of 3 main parts: 2D-DWT, EZW and arith-metic coding. A novel scheme for 2-D subband decomposition is devised. All levelsof vertical 1-D DWT are executed first and the horizontal 1-D DWT is applied onlyto the lowest frequency subband obtained by series of vertical transforms and the

Page 21: R_V

2.4. Reviewer: Tripti Meena (08D07015) 15

transposition operations are executed only for these coefficients. Thus, the amountof memory needed is reduced. A Lifting Algorithm is devised for implementingDWT (instead of a set of lowpass and high pass filters) in which DWT and scal-ing coefficients are determined using Prediction Step (high frequency coefficientsare calculated as prediction error signals) and Updating Step(low frequency scal-ing coefficients are predicted). This algorithm reduces the number of arithmeticoperations. EZW algorithm encodes the DWT coefficients bitplane by bitplane,from MSB to LSB. Bitplane coding is done in two phases: Dominant path andsubordinate path coding. In the proposed EZW scheme, all bitplanes of a singlecoefficient are coded simultaneously, each stored in the output buffer to reduce theamount of memory bandwidth. The dominant path symbol sequence is encodedwith a quasi-arithmetic coder. Thus, the proposed scheme achieves both memorybandwidth reduction and low complexity.

2.4.2 A Scalable Video Compression Technique Based onWavelet Transform and MPEG Coding [5]

This paper presents scalable video coding using Discrete Wavelet Transform (DWT)and MPEG coding. The DWT is used to decompose image into several bands andthen each band is compressed using motion compensated MPEG coding. A 9-7 bi-orthogonal wavelet filter has been used which decompose the image into 3layers with 7 non-uniform bands. Since the lowest band (LL2) has similar statis-tic features as the original image, MPEG with default settings is applied to itwhile for the other higher bands modified MPEG scheme is used. Due to thehigh correlation of the motion activity between different bands, hierarchical fixedsize multi-resolution motion estimation is used to reduce searching time. In thisscheme, the motion vector in LL2 is used for high frequency bands. In LH2, HL2and HH2 motion vector is directly used as the initial value in both directions sincethey have the same resolution with LL2. For LH1, HL1, HH1 bands, we take LL2motion vector multiplied by 2 as an initial guess and then searching in the smallneighbourhood. This results in low computational complexity. Efficient coding isobtained with custom designed quantization matrices in MPEG coder for differentbands since the statistical properties of different bands are significantly different.The quantization table in DWT-MPEG are designed based on energy distributionof still images and more bits are given to parts with larger energy. Scanning isdone as: Zigzag scan for LL2, vertical scan for LH1,LH2 bands, horizontal scanfor HL1, HL2 bands and inverse zigzag scan for HH1, HH2 bands. Perpetual bitallocation is done giving different weights to each band based on the sensitivity toHuman Visual System. It has been observed that DWT-MPEG provides betterresults than the original MPEG.

Page 22: R_V

2.4. Reviewer: Tripti Meena (08D07015) 16

2.4.3 A Resolution And Frame Rate Scalable Subband/WaveletVideo Coder [31]

This paper puts forward a source coding algorithm for producing spatiotemporalencoded bitstream with no redundant data common to different resolutions pro-duced from a given video. In this scheme, motion compensated temporal filteringand spatial subband/wavelet pyramid is used to provide an efficient 3D multi-resolution representation. The spatial analysis precedes the temporal analysis noncommutatively. Error feedback hierarchical coding and adaptive conditional cod-ing of quantizer are used to increase efficiency. The image is first decomposedinto various spatiotemporal resolutions by the spatiotemporal analysis block. Thelower resolution and frame-rate videos are generated using 3-D subband/waveletfilter bank. Temporal subband analysis is achieved using two-tap motion compen-sated temporal filtering (MCTF) to avoid undesirable delays. The motion fieldsat each spatial resolution are estimated to half pixel accuracy using hierarchicalblack matching motion estimation. The resulting multi-resolution collection ofvideos is encoded into a single fully-embedded bitstream by the multi-resolutionencoding block. After code-word packing, channel coding and transmission themultiresolution decoding and spatiotemporal synthesis blocks reconstruct the de-sired resolution video from the subset of codewords extracted from the bitstreamusing selective data reception block. In order to provide near optimal coding toeach subvideo, error feedback hierarchical coding is used in order to generate anon-redundant data packets containing information across both resolutions andframe rates. The results mentioned in the paper shows that the algorithm pro-vides excellent results for medium to high special resolution data at each temporalrate and relatively good results for the lowest spatial resolution.

2.4.4 Bottom-UpMotion Compensated Prediction InWaveletDomain For Spatially Scalable Video Coding [29]

This paper introduces the bottom-up prediction algorithm (BUP), an in-bandmotion compensation algorithm for video coding to overcome the periodic shift ofthe discrete wavelet transform (DWT) and is formalized using prediction rules forfiltering operations. So, combining these rules the bottom-up overcomplete DWTor BUP ODWT is obtained which is shift invariant. While using ODWT fortemporal prediction loop of the in-band video codec, first the IDWT is performedon the wavelet subband reference frame. But in case of bottom-up predictionmethod (BUP) IDWT is skipped making a direct link between the subbands of thereference frame and the coefficients redundant subbands that would be obtainedapplying ODWT. The prediction rules of the BUP algorithm allow the in-bandmotion compensation process to reach a zero prediction error. The BUP ODWTcalculates the overcomplete subbands from the critically samples subbands. Thisalgorithm achieves a significant reduction in the total multiplication budge andlatency.

Page 23: R_V

2.5. Reviewer: Saket Porwal (11307R006) 17

2.5 Reviewer: Saket Porwal (11307R006)

2.5.1 Embedded image coding using zerotrees of waveletcoefficients [24]

The Embedded Zero Tree is an algorithm for image compression using wavelets.The basic idea behind this compression scheme is to use the energy compactionprovided by the wavelet. Most of the images have relatively greater low frequencycontent and when the image is in the spatial domain low frequency content isspreaded all over its size whereas the same image in DWT domain has most ofits energy in the highest level decomposed LL band and in the other bands thecoefficients are very close to zero. Hence in other words it can be said that EZWuses the idea of decaying spectral power density and successive approximation. Inthis scheme for each coding pass a significance map is constructed and this mapcontains the signi?cance information of every coefficient with respect to preciouslyset threshold. If the coefficient’s magnitude exceeds or equals the threshold, it issaid to be significant and a "1" is posted; otherwise, it is said to be insignifcantand a "0" is posted. The threshold is decreased after every pass by a factor of2 because of this, the significant pixels are mapped first. An initial thresholdis decided according to the maximum pixel value of the image and the pixels’magnitudes are compared with the threshold. The scanning is done in such a waythat the parent root is compared first and the appropriate symbol is passed afterchecking all the descendants of the root. This is known as the 1st dominant passafter this the significant pixels are refined in the subordinate pass and then aftereach dominant and subordinate pass the threshold is halved and the loop goes on.In this paper the compression result of EZW are compared with JPEG, for samecompression ratio EZW gives better PSNR than JPEG.

2.5.2 A new, fast, and efficient image codec based on setpartitioning in hierarchical trees [21]

SPIHT stands for set partitioning in hierarchical trees. It is an image compressiontechnique using wavelets. The image in wavelet domain is encoded in a specificway. SPIHT is the successor of EZW. SPIHT progressively transforms the wavelettransform of an image into a bit stream. Like EZW this stream can be anywherecut off during decoding and the wavelet coefficients can be reconstructed. Theproposed coding and decoding procedures are claimed to be extremely fast.

2.5.3 Scalable motion vector coding [2]

A prediction based architecture for quality-scalable motion vector coding is in-troduced in the paper. Scalable spatial domain motion compensated temporalfiltering (SDMCTF) based video codecs generally employ non scalable MotionVector Codecs (MVCs). Achieving low bit rate for the base layer is difficult and if

Page 24: R_V

2.6. Reviewer: Vijayan (08D07026) 18

low bit rate base layer if some how achieved then the high bit rate data also suffershence it significantly affect the codec performance at all data rates. Using qualityscalable MVC lower rates can be achieved. The quality-scalable prediction basedmotion vector coding was proposed and experimented. The proposed MVC in thepresent paper was found to be the superior than the wavelet based MVCs also thelower bit rate base layer was achieved without sacrificing the overall performancethat was not in the case of wavelet based coding.

2.5.4 Quantifying the Coding Performance of Zerotrees ofWavelet Coefficients Degree-k Zero tree [6]

The DWT of an image contains zerotrees. These zerotrees allow us to code thewhole tree by a single symbol. A tree is called k-level zerotree if it has all zerosexcept for all top k-levels and a k-level zerotree coder can code 0-level zerotreethrough k-level zerotree. The quantification is of a higher level verses a lowerlevel zerotree coder is done in terms of bit savings in this paper. Both EZW andSPIHT take the advantage of zerotree for image compression. EZW is a 0-levelzerotree coder where as SPIHT is 2-level zerotree coder. One can take advantage ofefficiency of a higher level zerotree coder only if the wavelet decomposition is higheralso if the higher degree zero tree are more in number in the wavelet decomposedimage otherwise it may result in the poor efficiency in terms of computation andprocessing time without having much advantage in terms of bits per pixel. Inthis paper, an experimental analysis was carried out on the standard 512 × 512lena image. The images were decomposed using 9/7 JPEG filter bank and it wasfound that the number of degree 0, 1 and 2 zerotrees are more as compared to thehigher degree zerotrees which is giving the reason, why SPIHT is more efficientthan EZW.

2.6 Reviewer: Vijayan (08D07026)

2.6.1 Robust Digital Watermarking for Wavelet-based Com-pression [11]

Since a digital watermark may be used for applications such as finger printing,digital rights enforcement and authentication it needs to be tolerant to imageprocessing and lossy compression type of operations. Most standard watermark-ing techniques do not survive wavelet based compression and may also not becompatible with the scalability feature of wavelet based compression. This pa-per proposes and experimentally validates a blind watermarking technique whichshows superior robustness to wavelet based compression.

Experiments have shown that both spread spectrum and QIM watermarkingschemes do not survive wavelet based compression. When the level of scalingwas increased beyond three it resulted in quantization at almost all significant

Page 25: R_V

2.6. Reviewer: Vijayan (08D07026) 19

frequencies within the image, thus removing any watermark embedded using aspread spectrum or a QIM technique.

The technique discussed in this paper embeds the watermark in such mannerthat survives quantization associated with the compression process and also aproportional part of watermark information is placed in each level of waveletdecomposition.

Steps of implementation

• Watermark-decomposed into a residual difference pyramid and each level ofthis pyramid is embedded in the corresponding level of the image pyramid

• Limited resolution watermark-extracted from the correspondingly limitedresolution host image

• Extracted watermark-compared against a down sampled version of the orig-inal watermark for authentication

• The Discrete Fourier Transform (DFT) is used to modulate the watermarkinformation onto the wavelet coefficients of the host image

• Since the phase angle has been shown to be more resilient to quantizationthan the magnitude, the watermark information is added to the phase anglesof the DFT coefficients

A watermark benchmarking tool has been used to compare the performance ofthe techniques and also their efficacies against standard signal processing attacks.Observations showed that the proposed technique performed much more superiorthan other techniques compared.

2.6.2 A DWT-based Digital Video Watermarking Schemewith Error Correcting Code [4]

This paper proposes and experimentally validates a DWT-based blind digital videowatermarking scheme with scrambled watermark and error correcting code. Dif-ferent parts of a watermark, refined using error correcting codes, are embeddedinto different scenes of a video under the wavelet domain.

This algorithm is robust against Frame dropping, Averaging statistical analysis,Lossy compression and Video cropping. The proposed scheme consists of fourparts,

1. Watermark preprocess:

• Video watermark: A 256-gray-level watermark image is scrambled intosmall parts and is embedded into different scenes.

Page 26: R_V

2.6. Reviewer: Vijayan (08D07026) 20

• Audio watermark: Using error correction coding techniques such asReed-Solomon coding and Turbo coding, an ECC is extracted from thevideo watermark and this ECC is embedded in the audio channel as anaudio watermark (this audio watermark is used to refine the extractedvideo watermark.)

2. Video preprocess: Scene changes are detected and randomly chosen inde-pendent watermarks are embedded in video frames of different scenes.

3. Watermark embedding: The watermark is embedded by exchanging DWTcoefficients at particular indices based on conditions posed by pixels of thewatermark.

4. Watermark detection:

• Video watermark detection: Scene changes are detected in the receivedvideo and the watermark embedding process is reversed after transform-ing each frame of the video to the wavelet domain. Since an identicalwatermark is used for all frames within a scene, multiple copies of eachpart of the watermark may be obtained. The watermark is recoveredby averaging the watermarks extracted from different frames.

• Audio watermark detection and refining: As mentioned earlier, theaudio watermark is extracted and is used to refine the video watermarkobtained in the previous step.

Also, a similarity measurement of the extracted and the referenced watermarks,the cross-correlation of the normalized watermark by the reference watermarkenergy to give unity as the peak correlation, is used for objective judgment ofthe extraction fidelity. Authors have conducted experiments with frame dropping,lossy compression and averaging and statistical attacks. The scheme is observedto be robust as compared to other DWT-based techniques.

2.6.3 A Comparison of Temporal Scalability Techniques [7]

Temporal scalability, as the name suggests, is a frame rate variation based SVCtechnique wherein, the same video but at different frame-rates can be extractedfrom a single coded bit-stream. In this paper, the authors have compared threetemporal coding techniques

1. Temporal Subband Coding (TSB): The most basic technique - gives a natu-ral multiresolution decomposition into frames that are halved at each anal-ysis level thereby providing lower frame-rate video by decoding low-passsubbands. However, motion is blurred in lower frame-rate video in this tech-nique since frames are generated by a linear combination of full-frame-ratevideo frames.

Page 27: R_V

2.7. Reviewer: Rahul Bharadwaj (08D07012) 21

2. Motion-Compensated Temporal Subband Coding (MC-TSB): TSB with mo-tion compensation. Motion compensation is usually performed on multi-pixel blocks, in half-pel or full-pel modes, to avoid the computational com-plexity involved with pixel level computations. In full-pel motion compen-sation, the unaltered frames are used with the nominal pixel resolution. Inhalf-pel estimation, the resolution of all frames is doubled, creating a half-pixel grid.

3. Motion Compensated Prediction (MCP): Compensation improves the effi-ciency of prediction. Two major prediction techniques have been exploredviz. telescopic prediction and recursive prediction.

For experimentation, the rate-distortions for the three techniques were comparedat lower frame rates, using measured quantities from several sequences to ana-lyze the effects of sequence motion and quality of the motion compensation. Forfull-frame-rate video, the performances of MCP and MC-TSB are approximatelyequivalent. However, MCP clearly provides the best performance in terms of vi-sual quality, quantitative quality, and bit rate of the lower frame-rate video interms of temporal scalability. It has also been demonstrated, through experimen-tation, that MCP requires a smaller percentage of the bitstream to decode lowerframe-rate video, giving it an advantage in terms of both distortion and bit rates.Through experimentation the theoretical results have been verified. MCP andMC-TSB always outperform TSB. Experiments also confirm that MCP providessuperior performance at lower frame rates.

2.7 Reviewer: Rahul Bharadwaj (08D07012)

2.7.1 Temporal-Scalable Coding Based on Image Content[13]

This paper proposes an object-based temporal scalability using shape coding, anew motion estimation/compensation method, weighting techniques, and back-ground composition. Simulation results have shown the following

• the half-zero ME/MC improves the image quality especially when largemovement iscontained in the sequence,

• the weighting techniques save thebits around the object edges and improvethe image quality for the center portion of the selected objects, and

• the proposed temporal scalability technique is convenient to smooth the-movement of the selected objects hierarchically.

Page 28: R_V

2.7. Reviewer: Rahul Bharadwaj (08D07012) 22

2.7.2 Three-dimensional Subband Coding of Video [19]

The paper prescribes Geometric Vector Quantization where the codevectors are in-spired by edgerelated features of the high-frequency subbands, in contrastwith thetraditional design based on a training set. A vector codebook is chosen accordingto the need for 3 × 3 codevectors, and for a given input vector, an adaptive pro-cedure modulatesthe two intensities of each codevector to maximize the matchtothe input. This procedure is repeated for each codevector in the codebook, andthe (intensity modulated) codevectorwith the best match is used to reproducethe input imageblock. GVQ method is very effective in coding the higher fre-quencysubbands where the data is very sparse and structured but isineffective forencoding subband 1. An effective VQ scheme for encoding subband 1 is based onan unbalanced tree structured vector quantization technique (UTSVQ) to furtherreduce the bit-rate. At each stepof UTSVQ, only the node that corresponds tothe largest distortion reduction is split.

2.7.3 Three-Dimensional Wavelet Coding of Video with GlobalMotion Compensation [30]

The paper introduces global motion compensation for 3-D subband video codersand finds 0.5-2.0 dB gain on sequences with dominant background motion. Thepaper tries to solve for the coefficients in a single shot using least squares basedon feature correspondences, which has essentially the same complexity as ordinaryblock matching motion estimation in traditional video coding, thus it restricts toaffine motion model.

2.7.4 A Scalable Video Compression Technique Based onWavelet Transform and MPEG Coding [5]

This paper presents a modified MPEG encoder with the same architecture asthe original MPEG, including DCT, scalar quantization, scanning and variablelength coding, with the difference in hierarchical motion estimation/compensa-tion, custom-designed quantization matrices, and distinct scanning direction. Themotion estimation in the lowest band, LL2 is performed as in MPEG. The sim-ulation results show that the DWT-MPEG coding method improves the imagequality over ordinary MPEG coding by 0.3-1.5 dB.

2.7.5 Improvements in Wavelet-Based Rate Scalable VideoCompression by Eduardo Asbun

The thesis first reviews the various high and low data rate compression standards.It presents Scalable Adaptive Motion Compensated Wavelet technique for videocompression using Color Embedded Zerotree Wavelet technique (technique for still

Page 29: R_V

2.8. Reviewer: Shrikant Bagde (113230019) 23

image compression). CEZW exploits the interdependence betweencolourcompo-nents to achieve a higher degree of compression. The thesis suggests modificationsto SAMCoW to improve its performance at data rates of 64 kbps.

2.8 Reviewer: Shrikant Bagde (113230019)

2.8.1 A Scalable Wavelet Based Video Distortion Metricand Applications [17]

This paper shows what is mean by video distortion metric and its application.The video distortion metric is based on the models of the human visual system.This paper can explains a computationally efficient algorithm of video distortionmetric which can operate in full reference or reduced reference modes. In fullreference distortion metric mode we take the distance between the reference anddistorted sequences. While the reduced reference distortion metric mode use alower bandwidth representation of the reference sequence. Out of the two referencemodel full reference metric based on human visual system is commonly used. Inthis metric the reference sequence is processed in server side and the distortedsequence is processed at the client side. Applications of Video Distortion Metriccan be in rate control and continuous quality evaluation.

2.8.2 AWireless Video Streaming System Based on OFDMAwith Multi-Layer H.264 Coding and Adaptive RadioResource Allocation [17]

In this paper, an Adaptive method based on OFDMA for multi-user video trans-mission using multi-layered H.264 video codes for wireless medium is proposed.For increasing the number of subcarriers and so that the number of users usingthe allocated channel at the same time, a multi-user channel is assumed where thesub-carrier allocation and bit loading with the transmission of the H.264 ScalableVideo Coding encoded video sequences andsubsequently provide the best qualityof service to the users.

Here a multi-layer H.264 scenario is assumed where two different bit streamsare obtained in which one is assumed as the base layer which carries the importantinformation at very low bit rates such that the error probabilities are low, and anenhanced layer where the bit rates are relatively higher than the base layer andcontent is of less important. Based on the required Bit Error Rate of the videosignal, Bandwidth is allotted and thus the type of layering is chosen accordingly.

A theoretical model has been presented where the quality of the received videosignal in a noisy erroneous channel has been estimated with a specific BER. Thisis done using Quality measurement Method using Peak Signal to Noise ratio wherethe received video signal is compared to the Original video signal.

Page 30: R_V

2.9. Reviewer: Newton (113079018) 24

Then a modified algorithm for Adaptive Radio Resource allocation is proposedwhere different streams of bits are allotted to different sub-carriers based on theBase layer channels and Enhancement channels. This is done using a Bit loadingalgorithm in which Shannon Capacity is used to calculate the number of bitsrequired for each of the sub-carriers using the SNR for the required sub-carrier.For a given maximum probability of error, and at a given SNR, the number of bitsrequired for a particular sub-carrier is calculated using a proposed algorithm. Thisis done until all the users are allocated the sub-carriers based on the required BERand SNR which the Channel can support. The maximum number of bits that canbe transmitted in a given sub-carrier is limited to 8 for 64-QAM system. Thetime domain transmit signal is generated by computing the inverse fast Fouriertransform (IFFT) over the complex frequency domain signal. This signal is thentransmitted over the frequency selective wireless channel considered. A similarreverse operation is performed at the receiver end.

The performance evaluation is done with the Spectral efficiency and the costfunction where ther Peak Signal to Noise Ratio is taken into account. So, fromthe simulations as observed in the paper, the quality obtained from the real videotransmission simulations, is approximately close to that of the theoretical esti-mated quality.

2.8.3 Highly Scalable Video CompressionWith Scalable Mo-tionCoding [26]

This paper proposes a scalable video coding scheme with rate scalable motion in-formation. In this scheme video frames are compressed using unquantized motionparameters, but the decoder receives and uses only the most appropriate motionparameter quality layer.

2.8.4 Rate Control of H.264/AVC Scalable Extension [16]

This paper introduces the switched model for MAD prediction in the enhancementlayer. The MAD for a particular frame in the enhancement layer can either bepredicted from previous frame in enhancement layer or from the same frame in thebase layer. In this paper a new bit allocation scheme is introduced for hierarchicalB frames depending upon their importance.

2.9 Reviewer: Newton (113079018)

2.9.1 An embedded wavelet hierarchical image coder [23]

EZW is an image compression algorithm. In the output bit stream, the bits areordered according to their importance. The image, represented by pixel coeffi-

Page 31: R_V

2.9. Reviewer: Newton (113079018) 25

cients, are parsed multiple times to approximate the coefficients accurately usingsuccessive approximation algorithm.

It is called embedded zero wavelet transform become because the bit streamgenerated by the decoder has the property that bits corresponding to lower bitrates are already part of or embedded, in the higher order bit rates. Using such anembedded bit stream gives decoder the flexibility of choosing the decoding bit ratefrom the input bit stream. The decoder can stop decoding at any desired pointand can still generate the image with the same quality that would have, had theimage been encoded at bit rate corresponding to the new truncated bit stream.

The algorithm is based in 3 major concepts:

1. Using wavelets to decompose the image into hierarchical sub bands.

2. Using the property of similarity across subbands, predicting the absence ofimportant coefficients across various subbands.

3. Hierarchical entropy quantizer uses coarse to fine philosophy inherent insuccessive approximation.

The wavelet decomposed image is translated to a significance map indicatingthe location of non-zero values. A large number of bits in the encoded bit streamcorrespond to the significant non-zero coefficients in the significance map. Basi-cally during each pass the all the pixels in the decomposed image are comparedagainst a threshold value and those coefficients whose value are more than thethreshold value are called significant coefficients. Value of threshold is reducedfor each pass. Each pass is divided into 2 sub passes, namely Dominant pass andSubordinates pass. The significant coefficients are encoded in the dominant passand they are refined to more accurate value in the subordinates pass.

EZW uses a data structure called zerotree. It is based on the hypothesis thatif a root coefficient is insignificant, then all wavelet coefficients related to it in thesame spatial location at finer scales are also very likely to be insignificant. A treeis called zerotree if all its descendants are insignificant for a particular thresholdchosen for that parse. The algorithm runs for multiple passes, so the time taken toencode all the bits to more. The precise rate control achieved during this algorithmis also an importance feature of EZW.

2.9.2 Highly scalable wavelet-based video codec for very lowbit-rate environment [27]

In this paper an "Unrestricted center-biased diamond search" (UCBDS) methodhas been used to estimate the motion vectors. It has been found that the UCBDSmethod gives 31% speed improvement over the fast four-step search proposed byPo and Ma. But by using block-based motion estimation artifacts like blockynessin the reconstructed video has been observed.

Page 32: R_V

2.9. Reviewer: Newton (113079018) 26

2.9.3 Image Compression Using the Spatial-Orientation Tree[20]

SPIHT algorithm forms a Spatial orientation tree The PSNR results obtainedusing SPIHT were better than EZW encoding. Also the method is more simplerto implement and consumes less CPU time as compared to EZW.

2.9.4 Low Bit-Rate Scalable Video Coding with 3-D SetPartitioning in Hierarchical Trees (3-D SPIHT) [14]

3D extension of the SPIHT algorithm has been proposed. The performance of 3Dspatial-temporal orientation combinded with 3D SPIHT algorithm in video codinghas provided results comparable with H.263. The video coder’s output bit streamcontains bit stream corresponding to different video quality. The video codec evenwithout using motion compensation has shown results comparable to H.263.

Page 33: R_V

Chapter 3

Work Done and Results

3.1 Spatial ScalabilityFor spatial scalability following settings were used,

1 seq = 'FOREMAN_352x288_30_orig_01.yuv';2 Nf = 30; % No. of Frames to be Enc/ Dec.3 [Y,U,V]=yuv_import(strcat('D:\New folder\',seq),[352 ...

288],Nf,0,'YUV420_8');4 BLD = [128 128]; % BL Dimension5 ELD1 = [256 256]; % EL1 Dimension6 ELD2 = [512 512]; % EL Dimension7 wName = 'db2'; % Wavelet name8 L = 3; % DWT decomposition levels for Residue9 loop = 11; % Total loops for SPIHT

10 R1 = 0.2; R2 = 0.3; % Avg Bits allocated to EL1 and EL2

3.2 Temporal ScalabilityFor temporal scalability, we provide an input as the number of enhancement layersthat we wish to get. For e.g., if the input is 1: All odd frames will remain intactand all even frames will be predicted, if the input is 2: frames 1,5,9,... shall remainintact from the original video and frames 3,7,11,... shall be reconstructed in thefirst iteration. Frames 2,4,6,8,... shall be reconstructed from the initially availableframes (1,5,9,...) and the frames reconstructed in the first iteration (3,7,11,...). Inthe present application we are using the I frames only for the prediction, i.e., we aremaking only forward directional prediction. However, had it been a bi-directionalprediction, multiple enhancement would have worked better.

Real MV may not always be multiples of pixels. To allow sub pixel MV,the search step size must be less than 1 pixel. Motion compensation has alsobeen done with half-pixel prediction. Half-pixel values for motion compensationare obtained by interpolation between neighbouring pixel values. This can give

Page 34: R_V

3.2. Temporal Scalability 28

0 5 10 15 20 25 3031.75

31.8

31.85

31.9

31.95

32

Frame Number

PS

NR

PSNR/Frame for Base Layer of sequence:CITY704x576

30

orig

01.yuv

(a) PSNR/Frame for BL

0 5 10 15 20 25 3030

30.5

31

31.5

32

32.5

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-1 of sequence:CITY704x576

30

orig

01.yuv

(b) PSNR/Frame for EL-1

0 5 10 15 20 25 3031

31.5

32

32.5

33

33.5

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-2 of sequence:CITY704x576

30

orig

01.yuv

(c) PSNR/Frame for EL-2

0 0.5 1 1.5 2 2.5 3

x 107

31.8

31.9

32

32.1

32.2

32.3

32.4

Bitrate

Ave

rage

PS

NR

Performance for BL, EL1 and EL2

(d) Average PSNR Vs Bitrate

Base Layer of sequence:CITY704x576

30

orig

01.yuv

(e) 30th

frame ofBL

Enhancement Layer-1 of sequence:CITY704x576

30

orig

01.yuv

(f) 30th frame of EL-1

Enhancement Layer-2 of sequence:CITY704x576

30

orig

01.yuv

(g) 30th frame of EL-2

Figure 3.1: Performance for sequence: CITY_704× 576_30_orig_01.yuv

Page 35: R_V

3.2. Temporal Scalability 29

0 5 10 15 20 25 3031.5

32

32.5

33

33.5

34

34.5

Frame Number

PS

NR

PSNR/Frame for Base Layer of sequence:CREW704x576

30

orig

01.yuv

(a) PSNR/Frame for BL

0 5 10 15 20 25 3034

34.5

35

35.5

36

36.5

37

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-1 of sequence:CREW704x576

30

orig

01.yuv

(b) PSNR/Frame for EL-1

0 5 10 15 20 25 3035.2

35.4

35.6

35.8

36

36.2

36.4

36.6

36.8

37

37.2

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-2 of sequence:CREW704x576

30

orig

01.yuv

(c) PSNR/Frame for EL-2

0 2 4 6 8 10 12 14

x 106

33.5

34

34.5

35

35.5

36

36.5

Bitrate

Ave

rage

PS

NR

Performance for BL, EL1 and EL2

(d) Average PSNR Vs Bitrate

Base Layer of sequence:CREW704x576

30

orig

01.yuv

(e) 30th

frame ofBL

Enhancement Layer-1 of sequence:CREW704x576

30

orig

01.yuv

(f) 30th frame of EL-1

Enhancement Layer-2 of sequence:CREW704x576

30

orig

01.yuv

(g) 30th frame of EL-2

Figure 3.2: Performance for sequence: CREW_704× 576_30_orig_01.yuv

Page 36: R_V

3.2. Temporal Scalability 30

0 5 10 15 20 25 3030.7

30.75

30.8

30.85

30.9

30.95

31

31.05

31.1

Frame Number

PS

NR

PSNR/Frame for Base Layer of sequence:HARBOUR704x576

30

orig

01.yuv

(a) PSNR/Frame for BL

0 5 10 15 20 25 3029.2

29.4

29.6

29.8

30

30.2

30.4

30.6

30.8

31

31.2

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-1 of sequence:HARBOUR704x576

30

orig

01.yuv

(b) PSNR/Frame for EL-1

0 5 10 15 20 25 3032.2

32.4

32.6

32.8

33

33.2

33.4

33.6

33.8

34

34.2

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-2 of sequence:HARBOUR704x576

30

orig

01.yuv

(c) PSNR/Frame for EL-2

0 0.5 1 1.5 2 2.5 3

x 107

30

30.5

31

31.5

32

32.5

33

Bitrate

Ave

rage

PS

NR

Performance for BL, EL1 and EL2

(d) Average PSNR Vs Bitrate

Base Layer of sequence:HARBOUR704x576

30

orig

01.yuv

(e) 30th

frame ofBL

Enhancement Layer-1 of sequence:HARBOUR704x576

30

orig

01.yuv

(f) 30th frame of EL-1

Enhancement Layer-2 of sequence:HARBOUR704x576

30

orig

01.yuv

(g) 30th frame of EL-2

Figure 3.3: Performance for sequence: HARBOUR_704× 576_30_orig_01.yuv

Page 37: R_V

3.2. Temporal Scalability 31

0 5 10 15 20 25 3033.2

33.4

33.6

33.8

34

34.2

34.4

34.6

34.8

Frame Number

PS

NR

PSNR/Frame for Base Layer of sequence:ICE704x576

30

orig

02.yuv

(a) PSNR/Frame for BL

0 5 10 15 20 25 3034.5

35

35.5

36

36.5

37

37.5

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-1 of sequence:ICE704x576

30

orig

02.yuv

(b) PSNR/Frame for EL-1

0 5 10 15 20 25 3036.8

37

37.2

37.4

37.6

37.8

38

38.2

38.4

38.6

38.8

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-2 of sequence:ICE704x576

30

orig

02.yuv

(c) PSNR/Frame for EL-2

0 2 4 6 8 10 12

x 106

34

34.5

35

35.5

36

36.5

37

37.5

38

Bitrate

Ave

rage

PS

NR

Performance for BL, EL1 and EL2

(d) Average PSNR Vs Bitrate

Base Layer of sequence:ICE704x576

30

orig

02.yuv

(e) 30th

frame ofBL

Enhancement Layer-1 of sequence:ICE704x576

30

orig

02.yuv

(f) 30th frame of EL-1

Enhancement Layer-2 of sequence:ICE704x576

30

orig

02.yuv

(g) 30th frame of EL-2

Figure 3.4: Performance for sequence: ICE_704× 576_30_orig_01.yuv

Page 38: R_V

3.2. Temporal Scalability 32

0 5 10 15 20 25 3032.3

32.4

32.5

32.6

32.7

32.8

32.9

33

33.1

33.2

Frame Number

PS

NR

PSNR/Frame for Base Layer of sequence:SOCCER704x576

30

orig

02.yuv

(a) PSNR/Frame for BL

0 5 10 15 20 25 3031

31.5

32

32.5

33

33.5

34

34.5

35

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-1 of sequence:SOCCER704x576

30

orig

02.yuv

(b) PSNR/Frame for EL-1

0 5 10 15 20 25 3031

32

33

34

35

36

37

38

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-2 of sequence:SOCCER704x576

30

orig

02.yuv

(c) PSNR/Frame for EL-2

0 0.5 1 1.5 2 2.5 3

x 107

32.6

32.8

33

33.2

33.4

33.6

33.8

34

34.2

34.4

Bitrate

Ave

rage

PS

NR

Performance for BL, EL1 and EL2

(d) Average PSNR Vs Bitrate

Base Layer of sequence:SOCCER704x576

30

orig

02.yuv

(e) 30th

frame ofBL

Enhancement Layer-1 of sequence:SOCCER704x576

30

orig

02.yuv

(f) 30th frame of EL-1

Enhancement Layer-2 of sequence:SOCCER704x576

30

orig

02.yuv

(g) 30th frame of EL-2

Figure 3.5: Performance for sequence: SOCCER_704× 576_30_orig_01.yuv

Page 39: R_V

3.2. Temporal Scalability 33

0 5 10 15 20 2535.2

35.4

35.6

35.8

36

36.2

36.4

36.6

36.8

37

Frame Number

PS

NR

PSNR/Frame for Base Layer of sequence:WAVEVMG

352x288

25

100.yuv

(a) PSNR/Frame for BL

0 5 10 15 20 2532.6

32.8

33

33.2

33.4

33.6

33.8

34

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-1 of sequence:WAVEVMG

352x288

25

100.yuv

(b) PSNR/Frame for EL-1

0 5 10 15 20 2538

38.2

38.4

38.6

38.8

39

39.2

39.4

Frame Number

PS

NR

PSNR/Frame for Enhancement Layer-2 of sequence:WAVEVMG

352x288

25

100.yuv

(c) PSNR/Frame for EL-2

0 5 10 15

x 106

33

34

35

36

37

38

39

Bitrate

Ave

rage

PS

NR

Performance for BL, EL1 and EL2

(d) Average PSNR Vs Bitrate

Base Layer of sequence:WAVEVMG

352x288

25

100.yuv

(e) 25th

frame ofBL

Enhancement Layer-1 of sequence:WAVEVMG

352x288

25

100.yuv

(f) 25th frame of EL-1

Enhancement Layer-2 of sequence:WAVEVMG

352x288

25

100.yuv

(g) 25th frame of EL-2

Figure 3.6: Performance for sequence: VMG_352× 288_25_orig_01.yuv

Page 40: R_V

3.2. Temporal Scalability 34

improved motion estimation and hence a lower prediction error. So, we have alsotried fractional-pixel prediction which would give better results that the integer-pixel prediction.

3.2.1 Motion Vectors

On the encoder side, motion vectors are to be generated for every alternate frame,in case of halving the frame rate; in the other cases, motion vectors are generatedaccordingly for the intermediate frames. The image or the frame is divided intoblocks of size either 4× 4 or 8× 8 (or 16× 16) pixels. Each block in the frame tobe predicted is searched within a search area in the previous frame for best match.As shown, each frame is divided into blocks. The best match for a block in frame 2

Frame 2

This block in frame 2 is compared with

the region surrounding the block marked

in Frame 1

Frame 1

Now, the motion vector just gives out the address of this best-matched block.

Block of pixels

Search Region

Finding the best match and the motion vectors

Figure 3.7: Motion estimation and compensation

is searched for within an appropriate search area around the corresponding blockin frame 1. That block is selected which has the minimum mean squared error.To define the search area, we define spanz which determines the span of the pixelsthat should be in the search region. This span is an odd number. For example ifthe spanz is 5, the present block in frame 2 is compared with 25 blocks in frame1, such that the block is moved 2 pixels in all the 4 directions for the comparison.Out of these, the block which has the least difference is chosen. Now, the motionvector just gives out the address of this best-matched block. The motion vector

Page 41: R_V

3.2. Temporal Scalability 35

Frame 2

This block in frame 2 is compared with

the region surrounding the block marked

in Frame 1

Frame 1

Now, the motion vector just gives out the address of this best-matched block.

Block of pixels

Search Region

Finding the best match and the motion vectors

Figure 3.8: Finding the best match and the motion vectors

for the above case is (−1,−1). We can also see how for the particular case of spansize 5, there are 25 blocks to do the comparison with. The motion vectors for allsuch blocks are thus calculated. The encoder then sends to the decoder side, theoriginal frames and the motion vectors to reconstruct the predicted frames.

3.2.2 Reconstruction

The decoder uses the I frames as it is from the encoder side and for the recon-struction of the predicted P frames, it uses the motion vectors to find the bestmatching block from the I frame which it received just before. It then plays outthe frames in sequence for the video.

3.2.3 Difference Encoding

At the encoder side, we take the pixel-wise difference between the reconstructedframe and the original video frame. This forms the difference frame. This is thenencoded using DCT and using a suitable quantization factor, we quantize it.

The decoder decodes the input stream and de-quantizes the same and then addsit to the reconstructed frame, which was constructed using the motion vectors asdescribed above. This improves the reconstructed video to almost perfect matchwith the original frame.

Page 42: R_V

Chapter 4

MATLAB code

1 % ...=========================================================================

2 % Title: Scalable Video Coding using SPIHT + DCT3 % Code by: Team−14 % Code repository: ...

http://code.google.com/p/wsvc−team−1/downloads/list5 % Last updated: 09 Apr 20126 % ...

=========================================================================7 % Abstract:The scalable video coding (SVC) aims of single ...

bitstream which8 % is adaptable heterogenous devices. For example, a mobile may ...

need a lower9 % resolution video as compared to a digital TV. Here, SVC aims ...

to generate10 % a bitstream which can decoded by both mobile and TV as per the11 % requirements. In this assignment, we have tried to simulate ...

one of the12 % simple model for wavelet−based scalable video coding (WSVC). ...

Our initial13 % attempt was to implement spatially scalable video coding. ...

Later we14 % extended it with few changes for quality (PSNR) scalability. ...

Core part of15 % the model is based on set partitioning in hierarchical tree ...

(SPIHT)16 % algorithm.17 % ...

=========================================================================18

19 %% Specifications20 clc; clear all; close all;21 BR = zeros(1,3); % Bitrate22 Avg_PSNR = zeros(1,3);23 seq = 'FOREMAN_352x288_30_orig_01.yuv';

Page 43: R_V

37

24 Nf = 30; % No. of Frames to be Enc/ Dec.25 [Y,U,V]=yuv_import(strcat('D:\New folder\',seq),[352 ...

288],Nf,0,'YUV420_8');26 BLD = [128 128]; % BL Dimension27 ELD1 = [256 256]; % EL1 Dimension28 ELD2 = [512 512]; % EL Dimension29 wName = 'db2'; % Wavelet name30 L = 3; % DWT decomposition levels for Residue31 loop = 11; % Total loops for SPIHT32 R1 = 0.2; R2 = 0.3; % Avg Bits allocated to EL1 and EL233

34 %% Space reserved35 BitStr_e = cell(1,3*Nf); % Encoder Frames36 BLF_d = cell(1,Nf); % Base Layer Frames at Decoder37 ELF1_d = cell(1,Nf); % EL1 Frames at Decoder38 ELF2_d = cell(1,Nf); % EL2 Frames at Decoder39

40 F_B = cell(1,Nf); % Actual BL frame to be encoded41 F_E1 = cell(1,Nf); % Actual EL1 frames to be encoded42 F_E2 = cell(1,Nf); % Actual EL2 frames to be encoded43

44 %% PSNR calculations45 PSNR_B = zeros(1,Nf);46 PSNR_E1 = zeros(1,Nf);47 PSNR_E2 = zeros(1,Nf);48

49 %% Encoding/ Decoding50

51 for n = 1:Nf % Total number of frames to be processed52 F_B{n} = imresize(Y{n},BLD); % Frame resolution of BL at ...

Encoder53 F_E1{n} = imresize(Y{n},ELD1); % Frame resolution of EL1 ...

at Encoder54 F_E2{n} = imresize(Y{n},ELD2); % Frame resolution of EL2 ...

at Encoder55

56 % Compress F_B57 wcompress('c',F_B{n},'BASE.bin','spiht','wname',wName,'maxloop',loop);58

59 % Open & read compressed Base layer frame60 fid_B = fopen('BASE.bin'); B = fread(fid_B);61 F_B_U = wcompress('u','BASE.bin'); % BL frame Uncompressed62

63 BLF_d{n} = F_B_U; % Save uncompressed BL frame in cell BLF_d64

65 F_B_U_R1 = imresize(F_B_U,ELD1); % Uncompressed BL frame ...resized

66 % Residual−1: (actual enhanced frame − reconstructed BL frame)67 RR1 = F_E1{n} − F_B_U_R1;68 % L−level 2D DWT of RR1 (difference between EL1 and BL)69 [C1,S1] = wavedec2(RR1,L,wName);70 D_RR1 = Optimal_Quantizer(C1,S1,wName,R1); % Optimal quantizer

Page 44: R_V

38

71 Q_ID_RR1 = waverec2(D_RR1,S1,wName); % IDWT of quantized ...DCT coeffs

72 ELF1_d{n} = F_B_U_R1 + Q_ID_RR1; % Decoded BL+Enhancement ...residue−1

73

74 F_E1_U_R = imresize(ELF1_d{n},ELD2);75 % Residual−2: (actual frame at EL2) −76 % (decoded and resized BL + enhancement residue−1)77 RR2 = F_E2{n} − F_E1_U_R;78 % L−level 2D DWT of RR279 [C2,S2] = wavedec2(RR2,L,wName);80 D_RR2 = Optimal_Quantizer(C2,S2,wName,R2);81 Q_ID_RR2 = waverec2(D_RR2,S2,wName); % IDWT of quantized ...

DWT coeffs82 ELF2_d{n} = F_E1_U_R + Q_ID_RR2;83

84 BitStr_e{3*n−2} = B;85 BitStr_e{3*n−1} = rle(D_RR1); % Run−Length Coding86 BitStr_e{3*n} = rle(D_RR2); % Run−Length Coding87

88 % PSNR Calculations89

90 A_B = BLF_d{1,n}; % A−frame of BL91 B_B = F_B{1,n}; % B−frame of BL92

93 A_E1 = ELF1_d{1,n}; % A−frame of EL194 B_E1 = F_E1{1,n}; % B−frame of EL195

96 A_E2 = ELF2_d{1,n}; % A−frame of EL297 B_E2 = F_E2{1,n}; % B−frame of EL298

99 x_B=size(A_B,2); % X−dimension of BL frame100 y_B=size(A_B,1); % Y−dimension of BL frame101

102 x_E1=size(A_E1,2); % X−dimension of EL1 frame103 y_E1=size(A_E1,1); % Y−dimension of EL1 frame104

105 x_E2=size(A_E2,2); % X−dimension of EL2 frame106 y_E2=size(A_E2,1); % Y−dimension of EL2 frame107

108 R_B=A_B−B_B; % Difference between A and B frames of BL109 R_E1=A_E1−B_E1; % Difference between A and B frames of EL1110 R_E2=A_E2−B_E2; % Difference between A and B frames of EL2111

112 MSE_B=sum(sum(R_B.^2))/(x_B*y_B); % MSE for BL113 MSE_E1=sum(sum(R_E1.^2))/(x_E1*y_E1); % MSE for EL1114 MSE_E2=sum(sum(R_E2.^2))/(x_E2*y_E2); % MSE for EL2115

116 PSNR_B(n) = 10*log10(255^2/MSE_B); % PSNR for BL117 PSNR_E1(n) = 10*log10(255^2/MSE_E1); % PSNR for EL1118 PSNR_E2(n) = 10*log10(255^2/MSE_E2); % PSNR for EL2119 end

Page 45: R_V

39

120

121 %% Average PSNR calculations122

123 Avg_PSNR(1) = mean(PSNR_B); % Average PSNR of all frames in BL124 Avg_PSNR(2) = mean(PSNR_E1); % Average PSNR of all frames in EL1125 Avg_PSNR(3) = mean(PSNR_E2); % Average PSNR of all frames in EL2126

127 % Encoded bitstream reshaped for conveniently calculating ...bytes req

128 BitStr_e_r = reshape(BitStr_e',3,Nf);129

130 BitStr_e_r_B = BitStr_e_r(1,:); % Encoded bitstream of BL131 % Specifications (size,bytes, etc.) of encoded bitstream of BL132 Spec_B = whos('BitStr_e_r_B');133 BR(1) = Spec_B.bytes; % Bytes required for Encoded Bitstream ...

of BL134

135 BitStr_e_r_E1 = BitStr_e_r(1:2,:); % Encoded Bitstream of EL1136 % Specifications (size,bytes, etc.) of encoded bitstream of EL1137 Spec_E1 = whos('BitStr_e_r_E1');138 BR(2) = Spec_E1.bytes; % Bytes required for encoded bitstream ...

of EL1139

140 BitStr_e_r_E2 = BitStr_e_r(1:3,:); % Encoded Bitstream of EL2141 % Specifications (size,bytes, etc.) of encoded bitstream of EL2142 Spec_E2 = whos('BitStr_e_r_E2');143 BR(3) = Spec_E2.bytes; % Bytes required for encoded bitstream ...

of EL2144

145 %% Display frames146 figure147 for i = 1:Nf148 imshow(mat2gray(BLF_d{i}))149 drawnow150 end151 title(strcat('Base Layer of sequence:',seq));152

153 figure154 for i = 1:Nf155 imshow(mat2gray(ELF1_d{i}))156 drawnow157 end158 title(strcat('Enhancement Layer−1 of sequence:',seq));159

160 figure161 for i = 1:Nf162 imshow(mat2gray(ELF2_d{i}))163 drawnow164 end165 title(strcat('Enhancement Layer−2 of sequence:',seq));166

167 %% Average PSNR Vs Bitrate

Page 46: R_V

40

168 figure169 plot(BR,Avg_PSNR,'−−rs','LineWidth',2,'MarkerEdgeColor','k','MarkerFaceColor','g','MarkerSize',10)170 hold on; grid on;171 % axis([0 5e7 0 50])172 xlabel('Bitrate');173 ylabel('Average PSNR');174 title('Performance for BL, EL1 and EL2');175

176 %% Compression Ratio177 Spec_F_E2 = whos('F_E2');178 Spec_BitStr_e = whos('BitStr_e_r_E2');179 CR = Spec_F_E2.bytes/Spec_BitStr_e.bytes % Bytes required for ...

Encoded Bitstream of BL

Page 47: R_V

Bibliography

[1] N. Adami, A. Signoroni, and R. Leonardi. State-of-the-art and trends in scal-able video compression with wavelet-based approaches. Circuits and Systemsfor Video Technology, IEEE Transactions on, 17(9):1238 –1255, sept. 2007.

[2] J. Barbarien, A. Munteanu, F. Verdicchio, Y. Andreopoulos, J. Cornelis, andP. Schelkens. Scalable motion vector coding. In Image Processing, 2004. ICIP’04. 2004 International Conference on, volume 2, pages 1321 – 1324 Vol.2,oct. 2004.

[3] B. Ben Fradj and A.O. Zaid. Scalable video coding using motion-compensatedtemporal filtering. In Visual Information Processing (EUVIP), 2011 3rd Eu-ropean Workshop on, pages 50 –55, july 2011.

[4] Pat Pik-Wah Chan and Michael R. Lyu. A dwt-based digital video water-marking scheme with error correcting code. In Sihan Qing, Dieter Gollmann,and Jianying Zhou, editors, ICICS, volume 2836 of Lecture Notes in ComputerScience, pages 202–213. Springer, 2003.

[5] Pao-Chi Chang and Ta-Te Lu. A scalable video compression technique basedon wavelet transform and mpeg coding. Consumer Electronics, IEEE Trans-actions on, 45(3):788 –793, aug 1999.

[6] Yushin Cho and W.A. Pearlman. Quantifying the coding performance ofzerotrees of wavelet coefficients: Degree-k zerotree. Signal Processing, IEEETransactions on, 55(6):2425 –2431, june 2007.

[7] G.J. Conklin and S.S. Hemami. A comparison of temporal scalability tech-niques. Circuits and Systems for Video Technology, IEEE Transactions on,9(6):909 –919, sep 1999.

[8] H.L. Cycon, V. George, G. Hege, D. Marpe, M. Palkow, T.C. Schmidt, andM. Wa? andhlisch. Adaptive temporal scalability of h.264-compliant videoconferencing in heterogeneous mobile environments. In Global Telecommuni-cations Conference (GLOBECOM 2010), 2010 IEEE, pages 1 –5, dec. 2010.

[9] Miguel Lobato de Faria Pereira Capelo. Advances on transforms for highefficiency video coding. Master’s thesis, April 2011.

Page 48: R_V

Bibliography 42

[10] M. Domanski, A. Luczak, and S. Mackowiak. Spatio-temporal scalabilityfor mpeg video coding. Circuits and Systems for Video Technology, IEEETransactions on, 10(7):1088 –1093, oct 2000.

[11] S.A.R. Jafri and S. Baqai. Robust digital watermarking for wavelet-basedcompression. In Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9thWorkshop on, pages 377 –380, oct. 2007.

[12] F. Kamisli and J.S. Lim. Transforms for the motion compensation resid-ual. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEEInternational Conference on, pages 789 –792, april 2009.

[13] H. Katata, N. Ito, and H. Kusao. Temporal-scalable coding based on imagecontent. Circuits and Systems for Video Technology, IEEE Transactions on,7(1):52 –59, feb 1997.

[14] Beong-Jo Kim, Zixiang Xiong, and W.A. Pearlman. Low bit-rate scalablevideo coding with 3-d set partitioning in hierarchical trees (3-d spiht). Circuitsand Systems for Video Technology, IEEE Transactions on, 10(8):1374 –1387,dec 2000.

[15] B. Lee and M. Kim. An efficient inter-prediction mode decision method fortemporal scalability coding with hierarchical b-picture structure. Broadcast-ing, IEEE Transactions on, PP(99):1, 2012.

[16] Yang Liu, Zheng Guo Li, and Yeng Chai Soh. Rate control of h.264/avcscalable extension. Circuits and Systems for Video Technology, IEEE Trans-actions on, 18(1):116 –121, jan. 2008.

[17] M. Masry, S.S. Hemami, and Y. Sermadevi. A scalable wavelet-based videodistortion metric and applications. Circuits and Systems for Video Technol-ogy, IEEE Transactions on, 16(2):260 – 273, feb. 2006.

[18] R.Y. Omaki, G. Fujita, T. Onoye, and I. Shirakawa. Embedded zerotreewavelet based algorithm for video compression. In TENCON 99. Proceedingsof the IEEE Region 10 Conference, volume 2, pages 1343 –1346 vol.2, dec1999.

[19] C.I. Podilchuk, N.S. Jayant, and N. Farvardin. Three-dimensional subbandcoding of video. Image Processing, IEEE Transactions on, 4(2):125 –139, feb1995.

[20] A. Said and W.A. Pearlman. Image compression using the spatial-orientationtree. In Circuits and Systems, 1993., ISCAS ’93, 1993 IEEE InternationalSymposium on, pages 279 –282 vol.1, may 1993.

Page 49: R_V

[21] A. Said and W.A. Pearlman. A new, fast, and efficient image codec based onset partitioning in hierarchical trees. Circuits and Systems for Video Tech-nology, IEEE Transactions on, 6(3):243 –250, jun 1996.

[22] Khalid Sayood. Introduction to Data Compression. Morgan Knufmann Pub-lishers, 2000.

[23] J.M. Shapiro. An embedded wavelet hierarchical image coder. In Acoustics,Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE InternationalConference on, volume 4, pages 657 –660 vol.4, mar 1992.

[24] J.M. Shapiro. Embedded image coding using zerotrees of wavelet coefficients.Signal Processing, IEEE Transactions on, 41(12):3445 –3462, dec 1993.

[25] Ke Shen and E.J. Delp. Wavelet based rate scalable video compression. Cir-cuits and Systems for Video Technology, IEEE Transactions on, 9(1):109–122, feb 1999.

[26] D. Taubman and A. Secker. Highly scalable video compression with scalablemotion coding. In Image Processing, 2003. ICIP 2003. Proceedings. 2003International Conference on, volume 3, pages III – 273–6 vol.2, sept. 2003.

[27] Jo Yew Tham, S. Ranganath, and A.A. Kassim. Highly scalable wavelet-basedvideo codec for very low bit-rate environment. Selected Areas in Communi-cations, IEEE Journal on, 16(1):12 –27, jan 1998.

[28] K. S. Thyagarajan. Still image and video compression with MATLAB. AJohn Wiley and Sons, Inc., 2011.

[29] G. Van der Auwera, A. Munteanu, P. Schelkens, and J. Cornelis. Bottom-upmotion compensated prediction in wavelet domain for spatially scalable videocoding. Electronics Letters, 38(21):1251 – 1253, oct 2002.

[30] A. Wang, Zixiang Xiong, P.A. Chou, and S. Mehrotra. Three-dimensionalwavelet coding of video with global motion compensation. In Data Compres-sion Conference, 1999. Proceedings. DCC ’99, pages 404 –413, mar 1999.

[31] J.W. Woods and G. Lilienfield. A resolution and frame-rate scalable sub-band/wavelet video coder. Circuits and Systems for Video Technology, IEEETransactions on, 11(9):1035 –1044, sep 2001.