Robust digital video watermarking scheme for H.264 …zjanew/journal/ZJW_JE_WM_5.pdfRobust digital video watermarking scheme for H.264 advanced video coding standard Adarsh Golikeri

ArfivibpwvAtblwbmfqspa

1Istmb�mmvf

pmfCtsd

P221

Journal of Electronic Imaging 16(4), 043008 (Oct–Dec 2007)

J

Robust digital video watermarking schemefor H.264 advanced video coding standard

Adarsh GolikeriPanos Nasiopoulos

Z. Jane WangDepartment of Electrical and Computer Engineering

The University of British Columbia2332 Main Mall

Vancouver, BC, V6T 1Z4, Canada
E-mail: [email protected]
bstract. Digital video watermarking has attracted a great deal ofesearch interest in the past few years in applications such as digitalngerprinting and owner identification. The H.264/AVC (advancedideo coding) standard is the latest and most advanced video cod-

ng standard, but to this date, very few watermarking schemes haveeen designed for it. This is mainly due to its complexity and com-ression efficiency, which present a major challenge for any videoatermarking approach. We developed a new, quantization-basedideo watermarking scheme that is designed to work with H.264/VC. We propose a locally adaptive, rate-distortion optimized wa-

ermark that is inserted in the transform coefficients of the macrolocks. We use a unique perceptual mask in order to control the

evels of spatial and temporal distortion. Our scheme is designedith a built-in bit allocation mechanism that ensures optimum distri-ution of watermark bits between different macro blocks. Our water-ark offers constant robustness to H.264 video compression at dif-

erent bit rates, without significantly affecting the overall bit rate anduality of the video stream. Experimental results show that ourcheme outperforms existing watermarking methods under com-ression, transcoding, filtering, scaling, rotation, and collusionttacks. © 2007 SPIE and IS&T. �DOI: 10.1117/1.2816054�

Introductionn the past few years, the need for watermarking has gainedignificant attention due to the spread of illegal redistribu-ion and unauthorized use of digital multimedia.1 Water-arking is an information-embedding technique that em-

eds a secret imperceptible signal directly into original dataoften called host signal� in a robust manner. The water-ark should be resistant to both malicious attacks and com-on signal processing operations �e.g., filtering, noise, and

ideo compression�. In general, the watermark contains in-ormation that uniquely identifies the content owner.

A great variety of watermarking schemes have been pro-osed in the literature. Among them, one class of water-arking schemes that draw a lot of attention was inspired

rom communication with side information.19 The scalarosta scheme �SCS� is a reliable information-embedding

echnique, which is based on Costa’s original, theoreticalcheme.2 SCS has been shown to outperform the relatedither modulation �DM� techniques for low watermark-to-

aper 07008R received Jan. 18, 2007; revised manuscript received Apr.2, 2007; accepted for publication Jun. 10, 2007; published online Dec.1, 2007.
017-9909/2007/16�4�/043008/14/$25.00 © 2007 SPIE and IS&T.
ournal of Electronic Imaging 043008-

noise ratios �WNR�. SCS also performs significantly betterthan the state-of-the-art blind spread spectrum �SS�watermarking.3 This is mainly due to the host-interferencerejection properties of SCS. Spread transform �ST� coding,in which the host signal is projected on to a pseudorandomvector, has been effectively used along with SCS �hencecalled ST-SCS�, resulting in lower bit-error rates.4,5 Similarto SCS in principle, a continuous periodic self-noise sup-pression �CP-SNS� scheme with threshold was proposed inRef. 18 for multimedia steganography. However, ST-SCS isa generic watermarking framework and hence has certaininherent limitations when used for video watermarking.These limitations include a fixed watermarking embeddingstrength, which is based only on the WNR, and no rate-distortion optimization. Moreover, ST-SCS does not pro-vide any way of controlling the spatial and temporal distor-tions caused by the watermark insertion. Finally, thedistribution of the watermark bits is not dependent on thevisual importance of the different regions of the videoframe.

An important challenge for video watermarking is that,for real-time applications, the watermarking process mustbe part of the encoder and should be relatively simple andinexpensive to implement. Moreover, the watermark shouldbe robust to video compression at different bit rates. To thisend, several methods have been designed to work with spe-cific compression standards such as MPEG-2 andMPEG-4.6–8 However, to this date, very few watermarkingschemes have been designed for H.264/AVC, the latest andmost advanced video coding standard of the ITU-T andISO/IEC.9 This is mainly due to its complexity and com-pression efficiency, which present challenges for any videowatermarking approach.

We have developed a new, quantization-based, videowatermarking scheme, which is designed to work withH.264/AVC. Unlike other existing schemes, our method of-fers constant robustness to H.264 compression at differentbit rates without affecting the overall bite rate and qualityof the video stream. Our proposed method borrows ideasfrom ST-SCS and is extremely robust to compression,transcoding, filtering, scalingrestoration, rotation restora-tion, and average collusion. This is achieved by offering a
locally adaptive watermark embedding and optimum ro-
Oct–Dec 2007/Vol. 16(4)1

blbmotwmmS

2Tsetvsedwtwmvw1uehiimlsFSoi�wt

q

wrwpk

Golikeri, Nasiopoulos, and Wang: Robust digital video watermarking scheme…

J

ustness distortion. A unique perceptual mask controls theevels of spatial and temporal distortion, while a built-init-allocation mechanism is used to ensure optimum water-ark bit distribution within a video frame. This paper is

rganized as follows: Section 2 offers a brief overview ofhe scalar Costa scheme �SCS�. Section 3 describes ouratermarking scheme, while Section 4 shows how ourethod is designed to work within H.264/AVC. Experi-ental results are presented in Section 5 and conclusions inection 6.

The Scalar Costa Scheme „SCS…he block diagram of a typical watermarking scenario ishown in Fig. 1. The watermarking process can be consid-red as a communications system with side information athe encoder side.2–5 Henceforth, bold text �x� denotes aector, while normal text and italics �e.g., x and x� denote acalar. Using a secure key K, the watermark message m ismbedded into the cover work x �which is modeled as in-ependent identically distributed data� of variance �x

2. Theatermark is defined as w=s−x with variance �w

2 . The wa-ermarked signal s is then transmitted over a channel,here for mathematical simplicity the attack is usuallyodeled as an additive white Gaussian noise �AWGN� v of

ariance �v2, resulting in an attacked work r. The

atermark-to-noise ratio �WNR� is defined as0*log 10��w

2 /�v2�. The decoder receives the signal r and,

sing the same key K that was used during embedding,xtracts the watermark message estimate m� �Fig. 1�. Costaad shown that the channel capacity for the scenario shownn Fig. 1 is independent of �x

2.2 However, Costa’s schemenvolved a huge random codebook, which makes the imple-entation of this scheme very impractical. Instead, the sca-

ar Costa scheme �SCS� uses a structured codebook, con-tructed by a concatenation of scalar uniform quantizers.3

or a Costa-type embedding of a watermark message m,CS determines an intermediate sequence q that is nearlyrthogonal to the cover work x. This message m is encodednto watermark letters d that belong to a D-ary alphabete.g., D=2 represents a binary alphabet�. For embeddingatermark information, the following sample-wise opera-

ion is performed:

n = Q��xn − ��dn/D + kn�� − �xn − ��dn/D + kn�� , �1�

here qn, xn, and dn are elements of the vectors q, x, and d,espectively, Q��.� denotes scalar uniform quantizationith step size �, and kn� �0,1� are the elements of a secureseudo-random sequence k, derived from the watermark

Fig. 1 Typical blind watermarking scenario.

ey K. The embedded watermark sequence is given by


w = �q , �2�

where � �0��1�is the watermark scale factor. SCSdoes not provide an analytical formula for determining theoptimum value of �. Instead, an optimum � �in the capacitysense� is derived by the following numeric expression:

� = ��W2 /��W

2 + 2.71�v2�� . �3�

The final watermarked data are represented by

s = x + w = x + �q . �4�

For watermark decoding, the received data r are quan-tized to the nearest codebook entry. The sample-wise ex-traction rule is

yn = Q��rn − �kn� − �rn − �kn� , �5�

where yn, rn, and kn are elements of y, r, and k, respec-tively. For binary SCS, �yn� should be close to zero ifdn=0 and close to � /2 for dn=1. This is known as mini-mum distance decoding. For a detailed geometric interpre-tation of SCS, the interested readers are referred to thefigures in Ref. 5.

3 Proposed Watermarking SchemeWe developed a new watermarking method that borrowsideas from spread transform coding and the scalar Costascheme and is specifically designed for video. TraditionalSCS has certain limitations if used for video watermarking.We try to overcome these limitations. Our scheme builds onthe basic ideas of SCS. It is designed specifically for videoand is extremely robust to a wide range of attacks. A properperceptual mask controls the levels of spatial and temporaldistortion, while a built-in bit-rate control ensures optimumwatermark bit allocation. Also, we use a locally adaptivewatermark embedding and optimum robustness distortion.

3.1 Generate the Perceptual Mask SequencesIn traditional spread transform scalar Costa scheme �ST-SCS� watermarking, the cover work x is projected onto apseudo-random vector.3–5 The disadvantage of this ap-proach is that it does not account for perceptual maskingeffects of the human visual system �HVS�. Sine impercep-tibility is only possible if an efficient perceptual mask isused during watermark embedding, in our scheme, basedon the popular Watson’s perceptual model,1 we derive aperceptual mask sequence t �derived from the host signal x

Fig. 2 Spread transform �ST� watermarking.

itself� in order to achieve imperceptibility �see Fig. 2�.

Oct–Dec 2007/Vol. 16(4)2

mslmbmcfcftscsiftmofoTctpd

t

asqrwtofpng

t

wlE

3Cordovmps

x

I


J

The generation of t differs depending on which type ofacro block is watermarked. In case of Intra macro blocks,

patial masking effects are considered. First, a Gaussianow-pass filter is applied to the macro block in order toitigate the effect of noise. Then, for each given macro

lock, a perceptual mask is computed using Watson’sodel. The Watson model estimates the perceptibility of

hanges in DCT coefficients. This model consists of arequency-sensitivity function as well as luminance andontrast masking components.1 The frequency-sensitivityunction consists of a table defined by the model. Eachable entry represents the smallest magnitude of the corre-ponding DCT coefficient in a block, which can be per-eived by the human eye. We use the standard frequency-ensitivity table of the Watson model in ourmplementation.1 Luminance masking accounts for the ef-ect of the DC component �i.e., the average brightness ofhe block� on the frequency-sensitivity table. Contrastasking takes into account the effect of visible changes in

ne frequency due to the energy present in that particularrequency. After accounting for these effects, the thresholdsr slacks for the individual DCT coefficients are obtained.hese slacks represent the amounts by which the individualoefficients may be changed before resulting in a percep-ible change in the DCT block. These slacks represent theerceptual mask, denoted by p. For Intra macro blocks, weerive the perceptual mask sequence t as follows:

= p/�p� . �6�

In case of Inter macro blocks, the effect of both spatialnd temporal masking must be considered. Previous re-earch has shown that watermark artifacts, such as “mos-uito” effects and flicker, are visible in the fast-movingegions of a frame.8 These artifacts correspond to regionsith large-motion vector values. For this reason, we want

o reduce the strength of the watermark in such regions. Inther words, we sacrifice some amount of robustness inavor of imperceptibility. This is achieved by weighting theerceptual mask by the inverse of the motion vector mag-itude. Thus, for Inter macro blocks, the perceptual mask isiven by

= p/�mv� , �7�

here p is Watson’s perceptual mask and �mv� is the abso-ute magnitude of the motion vector. The elements of t inq. �7� are then normalized.

.2 Watermark Embeddinghen recommended that watermark embedding be carriedut on the projection of the host signal x onto a pseudo-andom vector v �i.e., xTv domain�, so that the watermarkistortion will be spread over space/time and frequency.5 Inur case, a perceptual mask t instead of a pseudo-randomector v is used. Therefore, we embed in the projected do-ain xTt. Once the perceptual mask t is generated, the

rojection of x onto t is found. This operation yields acalar quantity x�:

� = xTt . �8�

n our scheme, x represents the transform domain coeffi-


cients of Intra and Inter coded macro blocks. The water-mark key K is used to generate the random scalar valuek� �0,1�. The equation for embedding a “0” bit is obtainedby putting d=0 and D=2 in Eq. �1�:

q� = Q��x� − �k� − �x� − �k� . �9�

Similarly, embedding a “1” bit is possible by setting d=1and D=2 in Eq. �1�:

q� = Q��x� − ��0.5 + k�� − �x� − ��0.5 + k�� . �10�

These components are not altered during the embeddingprocess and, for this reason, they are added back to thewatermark data. Therefore, the final watermarked data s areobtained by combining �4� with the orthogonal compo-nents:

s = �x� + �q��t + �x − x�t� . �11�

3.3 Selection of the Watermark Scale Factor �Using Robustness-Distortion Optimization

Traditional ST-SCS uses a fixed watermark scale factor �that is precomputed from global statistics �see Eq. �3��. Incontrast, our method uses a locally adaptive value for �which is computed by robustness �of the watermark�-distortion �of the cover� optimization at the macroblocklevel. For notation simplicity, this optimization is referredas robustness-distortion optimization in this paper, thoughdifferent from the meaning of R-D optimization of H.264.As a result, we have better control over the watermark scalefactor, and our watermark adapts more closely to the hostsignal characteristics.

During video encoding, several coding parameters suchas macroblock prediction modes, motion vectors, and trans-form coefficient quantization levels have to be determined.Since natural video has widely varying spatial and temporal�motion� content, it is necessary to select different codingoptions for different parts of the image. Therefore, the taskof a video encoder is to find a set of coding parameters sothat a tradeoff between the video bit rate and distortion isachieved. This means that for a given video bit rate, theencoder has to find the combination of coding options thatminimizes picture distortion. Lagrangian bit-allocationtechniques for rate-distortion coding have been widely ac-cepted due to their effectiveness and simplicity. Adding awatermark to a video stream may also affect the bit rate andquality of the picture. It is, therefore, highly desirable thatthe watermark-embedding procedure incorporates rate-distortion optimized coding in order to compute the opti-mum watermark for different regions of a video frame.

To this end, we use the Lagrangian multiplier techniqueto compute the locally optimum value of � at the macrob-lock level. The simplified Lagrangian cost function for aparticular value of � is

J� = D� + �wE�, �12�

where J� represents the cost of encoding a given macroblock, D� is the distortion �sum of squared differences�between the cover work x and the watermarked work s, �wis the Lagrangian parameter and is dependent on the choice

10,11
of the video standard used for encoding, and E� is the
Oct–Dec 2007/Vol. 16(4)3

dcDDest

D

Fmv


J

ecoding error =Dd �−�De. A detailed explanation of eachomponent in Eq. �12� can be found in Ref. 11. We defined as the decoded distance and Dc as the expected distance.e is equal to 0 if the embedded message bit d=0, and it is

qual to ±� /2 if d=1. To obtain Dd, the watermarked dataare projected onto the perceptualmask t, which results in

he scalar e�. Quantization of e� yields Dd:

d = Q��e� − �k� − �e� − �k� . �13�

or each macro block, we compute the value of � thatinimizes the Lagrangian cost function in Eq. �12�. This

alue of � is the rate-distortion optimum watermark scale

Fig. 3 Our watermark embedding scheme �g

Fig. 4 The relationships between the quantiza
quantization step size, and �b� proposed watermarking

factor, which we use in our method. The cost function inEq. �12� is designed to guarantee a minimum bit-error rate,i.e., maximum watermark robustness. Briefly, theprocess of choosing � to minimize the Langragian costfunction is as follows. To obtain this cost, we first encodethe macro block with a selected mode, say “M0” �e.g.,Intra_16�16�. The difference between the original macroblock “O” and the prediction “P0” gives the macro blockresidual “R0.” We then apply the lossless H.264 integertransform and subsequent lossy quantization with a selectedQP, with the quantization coefficients entropy coded. Withthe macro block overhead included, the number of bits is

aded blocks� inside the H.264/AVC encoder.

rameter �QP� used in H.264 and �a� the H.264

ray-sh

tion pa
quantization step size.
Oct–Dec 2007/Vol. 16(4)4

cstsae“sett


J

ounted and is stored as “B0.” Then, the residual is recon-tructed by performing entropy decoding, inverse quantiza-ion, and inverse transform to give the reconstructed re-idual R0�. The distortion D is generally the SAD �sum ofbsolute differences� or the SSD �sum of squared differ-nces� between R0 and R0�. Thus, we end up with the costJ0” for encoding this macro block with mode “M0.” In aimilar fashion, we find cost J corresponding with a differ-nt mode M. Then, we select the coding mode that yieldshe minimum cost as the rate-distortion optimized mode forhis macro block.

Fig. 5 Rate distortion coding and bit-allocationwatermarking scheme �gray-shaded blocks�.

Fig. 6 Our watermark decoding scheme �gray-sha


3.4 Watermark Bit-Rate Control and WatermarkDecoder

An important facet of any video watermarking scheme isoptimum watermark bit allocation between different macroblocks.6–8 During video encoding, the source coder has todecide how to allocate bits to different regions of the pic-ture. The video coder achieves the target bit rate by distrib-uting the video bits depending on the scene content. Forexample, highly textured regions are assigned more bits,while smooth regions are assigned fewer bits. For water-

anism in the H.264 encoder that includes our
mech
ded blocks� inside the H.264/AVC decoder.

Oct–Dec 2007/Vol. 16(4)5


J

Fig. 7 �a�–�d� Watermarked frames �LHS� and the corresponding embedded watermark data �RHS�.

Fig. 8 H.264 compression attack at different bit rates.

ournal of Electronic Imaging Oct–Dec 2007/Vol. 16(4)043008-6

mdtpssctzEaabstS

Scpcwi

s, with


J

arked video, it is desirable to have a scheme that wouldetermine the best allocation of the watermark bits betweenhe different macro blocks. In video coding, the most im-ortant factor for controlling the bit rate is the residualignal coding fidelity, which is controlled by choosing auitable quantization step size for the transform coeffi-ients. We designed a scheme that finds the best distribu-ion of the watermark bits by simply changing the quanti-ation step �. This step is used to embed the watermark inqs. �9�, �10�, and �13�. The major advantage of this bit-llocation mechanism is that since the watermark bits arellocated in proportion to the video bits, the overall videoit rate is not adversely affected. As our watermarkingcheme is specifically designed for H.264/AVC, we deferhe detailed explanation of our bit allocation scheme untilection 4.

At the decoder side, complexity is kept to a minimum.ince the watermark is embedded in the transform coeffi-ients, detection is possible by partially decoding the com-ressed video stream. Thus, the watermark decoding pro-ess is computationally inexpensive. Decoding of theatermark requires knowledge of the secure key K, which

Fig. 9 H.264 compression attack at 512 kb/

s needed to generate the pseudorandom scalar k. The per-


ceptual mask t is computed for each macro block as ex-plained at the beginning of this section. The transform co-efficients are projected onto t to obtain the scalar projectiony�. This projection is then quantized using Eq. �5�, andsimple hard decision decoding is used to extract the mes-sage m�.

4 Watermarking of H.264/AVC VideoAlthough H.264/AVC is the latest and most advanced videocoding standard, to this date, there are very few watermark-ing schemes designed for it. This is mainly due to the com-pression efficiency of H.264, which presents a major chal-lenge for any video watermarking approach. One of themain challenges is that in H.264 even the Intra frames con-sist mainly of residual data that have very small initial val-ues. This means that after quantization, the majority of thecoefficients have zero values. Therefore, adding a water-mark without affecting the picture quality or the bit rate isextremely difficult. The residual macroblocks in H.264 In-tra frames are obtained by using spatial prediction within aframe, a major departure from previous coding standards

9

different watermark-to-noise ratios �WNRs�.

like MPEG-2 and MPEG-4.

Oct–Dec 2007/Vol. 16(4)7

I1nt�ldu

br1bmtmo

tmiGo


J

H.264 supports 3 types of Intra coding: Intra_4�4,ntra_16�16, and I_PCM. In Intra_16�16, the entire6�16 macro block is predicted from the 16 top and lefteighboring pixels. There are 4 Intra_16�16 modes: Ver-ical, Horizontal, DC, and Plane. In Intra_4�4, each 4

4 luma block is separately predicted using the top andeft pixels of previously encoded neighbors. There are 9irectional Intra_4�4 modes. The I_PCM coding type issed to bypass the prediction and transform steps.

For Inter macro blocks, on the other hand, variablelock-size motion compensation is used in order to obtainesidual information. The supported different sizes include6�16, 16�8, 8�16, and 8�8. The 8�8 partition cane further divided into 8�4, 4�8, or 4�4 blocks.12 Oneotion vector is transmitted for each partition. This means

hat, in cases of high motion complexity, a macro blockay be divided into several smaller partitions, each with its

wn motion vector.Another challenge that H.264 poses to watermarking is

hat it uses an entirely integer transform.13 This presents aajor concern for traditional spread spectrum watermark-

ng schemes, which embed watermarks drawn from aaussian distribution.6–8 In summary, H.264 uses 2 types

Table 1 Tr

Proposed Scheme

SequenceBit Rate�kb/s� PSNR �dB� BER�

Football 128 26.5770 5.2

384 30.3005 10.2

768 35.8705 4.2

Average 30.9160 6.5

News 128 33.9495 4.2

384 45.5590 8.7

768 50.7900 33.8

Average 43.4328 15.6

Paris 128 32.9835 2.9

384 41.5175 2.5

768 48.7240 42.4

Average 41.0750 15.9

Tennis 128 35.0400 10.0

384 41.4275 14.4

768 45.9150 35.1

Average 40.7942 19.8

Overall 38.5359 14.5

f transforms: an integer transform for the luma residual


data and an additional Hadamard transform for the 4�4array of the luma DC coefficients �only in Intra_16�16mode�. This is because in Intra_16�16 mode, much of theenergy is concentrated in the DC coefficients of each 4�4 block. The additional transform further concentratesthe energy into a smaller number of significant coefficients.After transform, the coefficients undergo scalar uniformquantization, with step size defined by the quantization pa-rameter �QP�. QP can take values between 0–51, the quan-tization step �Qstep� doubling for every increase of 6 in QP.

4.1 Watermark Embedding Inside the H.264/AVCEncoder

Our algorithm addresses all the above issues and is de-signed to work efficiently within H.264. First, to addressthe complexity introduced by the multiple size of motion-estimation macro blocks supported in H.264, we have toadjust the value of our perceptual mask p. For each subdi-vided macroblock in Eq. �7�, we divide Watson’s perceptualmask p by the motion vectors of the corresponding regions.

We watermark only the luma components of both theIntra and Inter macro blocks. Our scheme operates on the

ing attack.

Existing Scheme

Bit Rate �kb/s� PSNR �dB� BER�10−3

128 26.4825 28.2020

384 30.8525 46.7475

768 34.5555 70.7879

30.6302 48.5791

128 33.7465 14.5050

384 44.4605 50.8283

768 50.6510 45.0707

42.9527 36.8014

128 33.0845 5.6811

384 37.5255 66.4974

768 48.6150 46.4260

39.7417 39.5348

128 34.5145 35.2977

384 40.1875 69.9920

768 45.6350 52.9968

40.1123 52.7621

38.0207 44.4194

anscod

10−3

121

626

424

724

829

475

586

296

939

452

606

999

563

811

468

948

242

integer transform coefficients of the macroblock residual

Oct–Dec 2007/Vol. 16(4)8

dm�mttsmi

dmvtdiotvtacpH


J

ata. This is possible since we designed our method not toake any assumptions about the nature of the host signal x

see embedding equations �6� to �13� in Section 3�. Foracro blocks predicted using the Intra_16�16 mode, only

he Hadamard coefficients are watermarked, since they con-ain most of the significant energy of the macroblock re-idual. For macro blocks predicted using the Intra_4�4ode as well as the Inter macro blocks, we watermark the

nteger transform coefficients.For both Intra and Inter macro blocks, H.264 uses rate-

istortion optimized coding to select the best predictionodes.10 The best prediction is subtracted from the original

alues to obtain the residual data. The challenge here is thathe embedded watermark should not affect the rate-istortion tradeoff determined by the video encoder. Ourntuitive objective is to control the affect our watermark hasn the encoder’s rate-distortion optimized decisions and athe same time maintain constant watermark robustness toideo compression at different bit rates. This is a very at-ractive feature for a watermarking scheme, since in videopplications, compression itself is regarded as the mostommon watermark attack. Figure 3 shows how our pro-osed watermarking scheme is implemented inside the

Table 2 3�3 Gaussi

Proposed Scheme

SequenceBit Rate�kb/s� PSNR �dB� BER�

Football 128 20.051 1.4

384 25.001 4.9

768 33.224 6.0

Average 26.0920 4.1

News 128 18.817 3.0

384 42.483 14.1

768 48.777 62.6

Average 36.6923 26.6

Paris 128 16.077 2.0

384 37.999 4.0

768 46.156 73.2

Average 33.4107 26.4

Tennis 128 22.605 6.3

384 38.326 11.5

768 43.769 77.3

Average 34.9 31.7

Overall 32.4223 22.2

.264 encoder �gray-shaded blocks represent our water-


marking scheme�. It can be observed that our scheme isimplemented in the loop. The perceptual mask t is firstcomputed from the transform domain data at the macrob-lock level. Next, the robustness-distortion optimized localwatermark scale factor � is computed. A bit-rate controlscheme related to the H.264 encoder bit-rate controlmechanism is used, thereby eliminating the need for anexternal bit-rate controller. Using all these blocks, the wa-termark is embedded into the transform coefficients. Thewatermarked coefficients are then passed on to the quanti-zation and entropy coding blocks of the encoder.

4.1.1 Selection of scalar quantizer step size �

Let QPH.264 denote the quantization parameter �QP� valueused in H.264, Qstep denote the H.264 quantization stepsize, QPw denote our watermark quantization parameter re-lated to � �which is introduced to bridge with the quanti-zation parameter in H.264�, and � denote our watermarkquantization step size. The variables QPH.264 and Qstep are

-pass filtering attack.

Existing Scheme

Bit Rate �kb/s� PSNR �dB� BER�10−3

128 19.600 23.0707

384 22.356 34.4242

768 31.476 64.3232

24.4773 40.6060

128 18.543 11.6161

384 41.470 54.3232

768 48.426 72.9697

36.1463 46.3030

128 15.900 4.4142

384 34.079 63.0376

768 45.979 76.8710

31.9860 48.1076

128 21.981 27.9565

384 36.837 64.9135

768 43.460 94.6299

34.0927 62.5000

31.3074 49.3792

an low

10−3

141

293

202

212

707

208

464

126

565

563

181

437

354

647

330

444

305

related as follows �Fig. 4�a��:

Oct–Dec 2007/Vol. 16(4)9

Q

IiveQiwisptqat

Q


J

step = 0.6282 * exp�QPH.264 * 0.1155�,

0 � QPH.264 � 51. �14�

n order to adapt robustness of the watermark, the � values empirically set to be function of QPH.264, where a highalue of QPH.264 implies a high value of �. First, we try tostablish a relationship between our watermark step sizePw and QPH.264. We can see that Fig. 4�a� and Eq. �14� are

ndependent of the video content. In a similar manner, weanted the behavior of our watermark quantizer to behave

ndependently of the video content and be tied to the QPelected by the H.264/AVC encoder. Though we could notrovide a theoretical solution, we provide a heuristic solu-ion as follows. Evaluations over a large set of video se-uences, which represent various levels of texture, motion,nd spatial activity �as mentioned in Section 5�, indicatedhat the minimum BER is obtained when

Table 3 75% Downscalin

Proposed Scheme

SequenceBit Rate�kb/s� PSNR �dB� B

Football 128 24.6380

384 30.9530

768 37.3665

Average 30.9858

News 128 29.1530

384 45.5480

768 51.0715

Average 41.9242

Paris 128 27.2625

384 41.5440

768 48.7925

Average 39.1997

Tennis 128 35.2550

384 44.2060

768 49.2400

Average 42.9003

Overall 37.9562

Pw = 48,0 � QPH.264 � 30, �15�

ournal of Electronic Imaging 043008-1

QPw = 1.329 * QPH.264 + 6.768, 31 � QPH.264 � 51. �16�

Second, the relationship between QPw and � is exactly thesame as that between QPH.264 and Qstep. Therefore, equiva-lently,

� = 160, 0 � QPH.264 � 30, �17�

� = 1.373 * exp�QPH.264 * 0.1535�, 31 � QPH.264 � 51.

�18�

Equation �18� is shown in Fig. 4�b�. We observe that Eqs.�14� and �18� are very closely related. Both equations rep-resent exponential curves, with an initial slow ascent be-tween 0–30 and then a rapid increase in the range 31–51.This design guarantees that our watermark robustness is

k with bilinear sampling.

Existing Scheme

0−3Bit Rate�kb/s�

PSNR�dB� BER�10−3

1 128 24.1825 22.4242

4 384 30.1050 33.5757

0 768 36.0410 60.6464

1 30.1095 38.8821

0 128 28.8200 11.4747

6 384 44.5095 49.4343

7 768 50.9015 52.8285

7 41.4103 37.9125

9 128 27.3795 3.8348

9 384 37.7385 63.2307

1 768 48.6920 55.3736

3 37.9367 40.8130

8 128 34.6905 29.4650

3 384 42.9885 66.3717

1 768 48.9820 92.8710

4 42.2203 62.9026

4 37.1789 45.1275

g attac

ER�1

0.606

3.838

1.697

2.047

1.596

7.454

41.090

16.713

1.124

0.499

51.647

17.757

8.195

13.475

73.511

31.727

17.061

constant at different compression rates �i.e., bit rates�.

Oct–Dec 2007/Vol. 16(4)0

4

Obtooirfi�

�

TlftsQt�e


J

.1.2 Watermark bit-rate control and selection ofthe scalar factor �

ur watermark is embedded on the transform coefficientsefore the quantization process. The reason for this is that ifhe watermark was embedded in the nonzero coefficientsbtained after quantization, it would have an adverse effectn the bit rate of the video. In H.264, video bit-rate controls achieved through proper selection of the quantization pa-ameter �QPH.264�, which controls the quantization step sizeor the transform coefficients. It has been shown that theres a strong relationship between the Lagrangian parameter

used for rate-distortion coding and QPH.264:10

= 0.85 * pow�2,�QPH.264 − 12�/3� . �19�

herefore, bit-rate control in H.264 is achieved by control-ing QPH.264 and accordingly adjusting the value of � usedor rate-distortion coding. Similarly, in our method, the wa-ermark bit allocation is controlled by choosing the stepize of the scalar uniform quantizer � �or, equivalently,Pw� and adjusting the value of �w used in Eq. �12�. Due to

he similarity in our problem and the fact that Eqs. �14� and18� are closely related, the Lagrangian parameter �w is

Table 4 5° Counterclockwise rotatio

Proposed Scheme

SequenceBit Rate�kb/s� PSNR �dB� BE

Football 128 28.5860

384 33.1200

768 39.7035

Average 33.8032

News 128 32.0565

384 46.9700 1

768 52.8655 6

Average 43.9640 2

Paris 128 30.5190

384 43.3755

768 50.5625 7

Average 41.4857 2

Tennis 128 31.7040

384 42.0270

768 46.8560 3

Average 40.1957 1

Overall 39.8144 1

mpirically set to be function of QPw as


�w = 0.85 * pow�2,�QPw − 12�/3� . �20�

This value of �w in then used in Eq. �12� to deduce thelocally optimum watermark scale factor �. Such � is cho-sen to ensure that our watermark will not affect the rate-distortion optimized coding decisions of the H.264 encoder.When we vary QPH.264 in the H.264 encoder in order toachieve the desired overall video bit rate, QPw also changesproportionally, as in Eqs. �15� and �16�. Thus, the water-mark bits are allocated in proportion to the H.264 encoder’sbit-rate control algorithm. This ensures that the overallvideo bit rate is not adversely affected. Therefore, ourscheme has a built-in mechanism for watermark bit alloca-tion through the parameters QPw and �w. This is clearlydifferent from existing schemes, such as those in Refs. 6–9,which require an explicit bit-rate controller.

Figure 5 illustrates the entire rate-distortion coding andbit-allocation mechanism of our proposed watermarkingscheme. First, the H.264 bit-rate controller adjusts QPH.264based on the target bit-rate and residual signal fidelity cri-teria. Next, using Eqs. �15� and �16�, we determine theappropriate QPw to be used for watermark embedding.Then, for this value of QPw, the Lagrangian parameter �w is

ration attack with bilinear sampling.

Existing Scheme

−3Bit Rate�kb/s� PSNR �dB� BER�10−3

128 27.8005 23.5959

384 32.5375 34.3030

768 38.5150 62.7879

32.9510 40.2290

128 31.8580 12.5050

384 45.8785 53.4344

768 52.7015 76.6265

43.4793 47.5220

128 30.7320 4.5051

384 39.4385 63.4125

768 50.4590 78.6661

40.2098 48.8612

128 31.0265 26.5487

384 40.8080 62.9022

768 46.6315 58.6785

39.4887 49.3765

39.1261 46.4972

n-resto

R�10

1.8990

5.5354

5.9799

4.4714

3.1515

3.6568

5.5152

7.4411

2.1247

4.0279

4.1216

6.7581

5.7323

8.4973

8.6161

7.6153

9.0715

determined using �20�. Rate-distortion coding is used to

Oct–Dec 2007/Vol. 16(4)1

dtTvcm

4

Tdatmtce

5TvmpciT�HipS

S

F

N

P

T

O


J

etermine the locally optimum watermark scale factor � forhis macro block, which minimizes the cost function J�.he watermark step size � �corresponding to the computedalue of QPw�, the watermark scale factor �, and the per-eptual mask sequence t are used to embed the watermarkessage m, with the secure key K.

.2 Watermark Decoding Inside the H.264/AVCDecoder

he entire watermark decoding process inside the H.264ecoder is shown in Fig. 6. First, the transform coefficientsre reconstructed after inverse entropy coding and dequan-ization. Next, the perceptual mask sequence t for this

acro block is computed. Using the secure key K, the wa-ermark message m� is extracted using simple hard decisionoding in �see Eq. �5��. Thus, the watermark can easily bextracted from the partially decoded bitstream.

Experimental Resultshe performance of our scheme was tested on 10 standardideo sequences �Carphone, Coastguard, Football, Fore-an, Flower Garden, Mother Daughter, News, Paris, Tem-

ete, and Tennis�, which represent a wide variety of videoontent. The sequences were watermarked and encoded us-ng H.264 at various bit rates, with a frame rate of 25 fps.he group-of-pictures �GOP� structure consisted of an Intra

I-� frame followed by 4 Inter �P-� frames. We used the.264/AVC reference software version JM9.3 for our

mplementation.14 Under the same picture quality, we com-ared the robustness of our method against the traditionalT-SCS scheme in the following different attack categories:

1. H.264 compression at different bit rates,2. H.264 compression at a fixed bit rate of 512 kb /s,

with different watermark-to-noise ratios �WNR�,3. transcoding—H.264 bitstreams decompressed and re-

compressed at the same bit rate but using a differentGOP structure �Intra period =10�,

4. filtering: 3�3 Gaussian filter of variance 0.5,5. scaling restoration: spatial scaling with a factor of

75%,6. rotation restoration: 5°, with bilinear sampling,7. average collusion: averaging attack using 5 different

watermarked copies of a video sequence at a fixed bit

Table 5 Avera

Proposed Scheme

equenceBit Rate�kb/s�

PSNR�dB�

ootball 512 25.0010

ews 512 42.4830

aris 512 37.9990

ennis 512 38.3260

verall 35.6820

rate of 512 kb /s.


In the current study, a correct QP value is critical and isneeded in the decoder. Our simulations were carried out asfollows: During ENCODING, the video sequences werecompressed using different QP values �and hence differentdistortion levels� each time. The watermark was insertedduring encoding, using a quantization step proportional toH.264’s QP value. Then later during the decoding process,the H.264 decoder �and thus also our decoder� would usethe same value of QP as that used during encoding. Forsimplicity, we assume that the watermarked/attacked videois perfectly registered �i.e., spatially aligned� with the origi-nal unwatermarked video.

For both schemes, we fixed the watermark message bitrate to 1 bit per macro block. We also ensured that the samewatermark power �w, spread transform size �256 elements�,and pseudorandom key K were used for both watermarkingmethods. For traditional ST-SCS, the watermark scale fac-tor � was calculated at the macro block level using Eq. �3�.The left-hand sides of Figs. 7�a�–7�d� show four represen-tative frames watermarked by our proposed scheme. Weobserve that the resulting watermarked video maintains ex-cellent subjective quality. The corresponding watermarkdata that were embedded in the two frames are shown onthe right-hand sides. It can be seen that the watermark sig-nal is distributed �as expected� in accordance with the per-ceptual importance of the different regions of the frame.

Figure 8 shows the bit error rates �BER� caused byH.264 compression at various bit rates, for 4 representativestreams. In order to obtain video compression at differentbit rates, the quantization parameter QPH.264 was variedfrom 0–51. We observe that our method significantly out-performs ST-SCS. As expected, the robustness of ourmethod is almost constant throughout the different bit ratesand results in dramatic improvements over ST-SCS at lowbit rates. This indicates that the watermark bit-allocationmechanism in Eqs. �17� and �18� and the rate-distortionoptimization in Eq. �20� are performing very well by adapt-ing the watermark step size � and scale factor � to thecompression rate. On the other hand, the performance oftraditional ST-SCS suffers in spite of using the same water-mark power �w

2 during embedding. On an average, ourscheme achieves bit error rates of about 2 orders of magni-tude less than ST-SCS for the 10 tested video sequences.

lusion attack.

Existing Scheme

Bit Rate�kb/s�

PSNR�dB�

BER�10−3

3 512 22.3560 67.475

512 41.4700 47.131

7 512 34.0790 69.554

512 36.8370 81.255

33.4510 66.354

ge col

BER�10−3

5.131

31.010

8.396

21.269

16.452

In the second attack category, the sequences were water-

Oct–Dec 2007/Vol. 16(4)2

mws“mwtISwtStittlos

fqac

S

F

N

P

T

O


J

arked at different watermark-to-noise ratios �WNRs�,hile the video bit rate was fixed at 512 kb /s. Figure 9

hows the BER plots for this category of attacks. TheFootball” sequence has a high amount of spatial detail andotion. As expected in this case, our scheme embeds theatermark data much more effectively due to the percep-

ual mask and the locally optimum watermark scale factor.n fact, our scheme requires about 3 dB less WNR thanT-SCS, in order to achieve minimum BER. “News,”hich represents the other extreme in terms of scene con-

ent, yields about 1 dB less WNR for our method than ST-CS. This reduction in the improvement is due to the fact

hat “News” has a low level of spatial activity, thus makingt harder to embed the watermark without affecting percep-ual quality. On average, taking into account results fromhe 10 different streams, our scheme requires about 2 dBess WNR in order to achieve minimum BER. We alsobserve that, for the same WNR, the BER achieved by ourcheme is about 2 orders of magnitude lower than ST-SCS.

Table 1 shows the results from the transcoding attack forour representative sequences. For this attack, the video se-uences were first watermarked at the given bit rate, withn Intra period of 5 frames. Next, the sequences were de-

Table 6 Visual quality comparison �Structural Similarity Index�.

Proposed Scheme Existing Scheme

equenceBit Rate�kb/s� SSIM �0-1�

Bit Rate�kb/s� SSIM �0-1�

ootball 128 0.54299 128 0.52896

384 0.74624 384 0.64665

768 0.93665 768 0.91701

Average 0.74196 0.69754

ews 128 0.56201 128 0.55225

384 0.98727 384 0.98498

768 0.99571 768 0.99551

Average 0.84833 0.84425

aris 128 0.46781 128 0.46313

384 0.9816 384 0.95452

768 0.99605 768 0.99597

Average 0.81515 0.80454

ennis 128 0.63542 128 0.6154

384 0.94445 384 0.93007

768 0.9806 768 0.97982

Average 0.85349 0.84176

verall 0.82613 0.80854

ompressed and recompressed at the same bit rate, but with


a different GOP structure �Intra period =10�. This attacksignificantly changes the spatial and temporal residual data,when compared to the original compressed stream. Threedifferent bit rates are listed here for our comparisons: 128,384, and 768 kb /s. We observe that, for the same picturequality �PSNR�, the BERs obtained by our method areabout 3 times lower than those achieved by ST-SCS. This isalso the average result obtained from all 10 sequences.

Table 2 summarizes the 3�3 Gaussian low-pass filter-ing attacks on both watermarking schemes. It can be ob-served that our scheme shows superior performance on the“Football” sequence, in which case it achieves a BER thatis one-tenth the bit-error rate of ST-SCS. For sequenceshaving less spatial activity, the BERs achieved by ourscheme are less than half those obtained by ST-SCS. This isalso the case when we consider the average performance ofour scheme over the 10 test sequences.

In Table 3, the 75% downscaling attacks are summarizedfor 4 streams. Bilinear sampling was used for the scalingoperation. In order to avoid synchronization errors, thedownscaled video was restored to its original dimensionsbefore performing watermark detection. On average, ourscheme yields a BER that is more than 2 times lower thanthat of ST-SCS. In fact for “Football,” the improvement isabout 20 times compared to ST-SCS.

Table 4 shows BER results for the 5°counterclockwiserotation attack with bilinear sampling. Here, too, the origi-nal dimensions of the video were restored before water-mark detection. We observe that, on average, our schemeachieves BERs that are less than half those of ST-SCS.

For the collusion attack, five copies of each video se-quence are compressed at 512 kb /s and watermarked usingdifferent keys, hence resulting in five different watermarksequences. Next, the five watermarked video sequences areaveraged in order to obtain a sixth video sequence. Then,using the five original watermark keys, we extract the wa-termark from the colluded video sequence. Table 5 showsthe average BER results obtained from this attack. We ob-serve that on average our scheme achieves BERs that areless than one-fourth those of ST-SCS. This result indicatesthat our scheme is inherently more robust to collusion thantraditional ST-SCS. This can be attributed to the locallyadaptive watermark scale factor � and rate-distortion opti-mum coding, which ensures that our watermark is stronglyadapted to the video content. This is in accordance with thebasic rule for preventing video collusion attacks.15,16

Finally, since it is well known that PSNR alone is not agood measure of perceptual quality, we also measured thevisual quality of the watermarked video using the StructuralSimilarity Index �SSIM�.17 The SSIM score is given on ascale of 0–1, by comparing the watermarked against theoriginal �unwatermarked� video. A higher value indicatesthat the watermarked video is perceptually closer to theoriginal sequence. The visual quality obtained by the twowatermarking schemes is tabulated in Table 6. The resultsindicate that our scheme always maintains a higher percep-tual quality score than ST-SCS.

In summary, our scheme significantly outperforms ST-SCS under compression, transcoding, geometric, and collu-sion attacks. Further, our scheme maintains a higher per-ceptual quality of the watermarked video, compared to ST-
SCS.
Oct–Dec 2007/Vol. 16(4)3

6WmOmclbdodbwifssmScawWmBtdooqSv

R


J

Conclusione have developed a new, quantization-based, video water-arking scheme that is designed to work with H.264/AVC.ur scheme uses a locally adaptive, rate-distortion opti-ized watermark that is inserted in the transform coeffi-

ients of macroblock residuals. A unique perceptual maskimits the spatial and temporal distortion, while a built-init-allocation mechanism ensures optimum watermark bitistribution between different macroblocks. Our watermarkffers constant robustness to H.264 video compression atifferent bit rates without adversely affecting the overallite rate and quality of the video stream. Our proposedatermark scheme operates in real time and can easily be

mplemented within the existing H.264 encoder-decoderramework. Experimental results over 10 standard videoequences showed that, in the category of H.264 compres-ion, our scheme yields bit-error rate improvements ofore than two orders of magnitude compared to traditionalT-SCS. Our scheme maintains a constant robustness toompression at different bit rates. In addition, our schemechieves the same BER as ST-SCS, using 2 dB lessatermark-to-noise ratio �WNR�. For the same level ofNR, the BER improvement is more than two orders ofagnitude. In case of transcoding, our scheme achievesERs 3 times less than those of ST-SCS. Furthermore, in

he categories of 3�3 Gaussian low-pass filtering, 75%ownscaling, and 5° rotation attacks, the BERs obtained byur scheme are less than half those of ST-SCS. In the casef the linear collusion attack with 5 watermarked se-uences, our scheme is four times more robust than ST-CS. Finally, the perceptual quality of our watermarkedideo is consistently better than ST-SCS.

eferences1. I. J. Cox, M. L. Miller, and J. A. Bloom, Digital Watermarking,

Academic Press, New York �2002�.2. M. H. M. Costa, “Writing on dirty paper,” IEEE Trans. Info. Thy.

29�3�, 439–441 �1983�.


3. J. J. Eggers, R. Buml, R. Tzschoppe, and B. Girod, “Scalar costascheme for information embedding,” IEEE Trans. Signal Process. 15,1003–1019 �2003�.

4. B. Chen and G. W. Wornell, “Quantization index modulation: A classof provably good methods for digital watermarking and informationembedding,” IEEE Trans. Info. Thy. 47�4�, 1423–1443 �2001�.

5. B. Chen, “Design and analysis of digital watermarking, informationembedding and data hiding systems,” PhD thesis, MIT, Cambridge,MA �2000�.

6. F. Hartung and B. Girod, “Watermarking of uncompressed and com-pressed video,” Signal Process. 66�3�, 283–301 �1998�.

7. G. C. Langelaar, R. L. Lagendijk, and J. Biemond, “Real-time label-ing of MPEG-2 compressed video,” J. Visual Commun. Image Rep-resent 9�4�, 256–270 �1998�.

8. A. M. Alattar, E. T. Lin, and M. U. Celik, “Digital watermarking oflow bit-rate advanced simple profile MPEG-4 compressed video,”IEEE Trans. Circuits Syst. Video Technol. 13, 787–800 �2003�.

9. Draft ITU-T Recommendation and Final Draft International Standardof Joint Video Specification �ITU-T Rec. H.264�ISO/IEC 14496-10AVC�, Joint Video Team �JVT�, Doc. JVT-G050 �2003�.

10. T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan,“Rate-constrained coder control and comparison of video codingstandards,” IEEE Trans. Circ. Syst. 13�7� �2003�.

11. G. Sullivan and T. Wiegand, “Rate-distortion optimization for videocompression,” IEEE Signal Process. Mag. 15, 74–90 �Nov. 1998�.

12. T. Wedi and H. Musmann, “Motion- and aliasing-compensated pre-diction for hybrid video coding,” IEEE Trans. Circuits Syst. VideoTechnol. 13, 577–586 �July 2003�.

13. H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low-complexity transform and quantization in H.264/AVC,” IEEE Trans.Circuits Syst. Video Technol. 13, 598–603 �July 2003�.

14. K. Sühring, H.264/AVC Software Coordination �Online�. Available athttp://bs.hhi.de/~suehring/tml/.

15. G. Doërr and J.-L. Dugelay, “A guide tour of video watermarking,”Signal Process. Image Commun. 18�4�, 263–282 �2003�.

16. K. Su, D. Kundur, and D. Hatzinakos, “A novel approach tocollusion-resistant video watermarking,” in Security and Watermark-ing of Multimedia Content IV, Proc. SPIE 4675, 491–502 �2002�.

17. Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image qualityassessment: From error visibility to structural similarity,” IEEETrans. Image Process. 13�4�, 600–612 �2004�.

18. M. Ramkumar and A. Akansu, “Signaling methods for multimediasteganography,” IEEE Trans. Signal Process. 52�4�, 1100–1111�2004�.

19. H. Sencar, M. Ramkumar, and A. Akansu, “An analysis of quantiza-tion based embedding-detection techniques,” in Proc. ICASSP 2004,Montreal, Canada �2004�.

Biographies and photographs of the authors not available.

Oct–Dec 2007/Vol. 16(4)4

Robust digital video watermarking scheme for H.264 …zjanew/journal/ZJW_JE_WM_5.pdfRobust digital video watermarking scheme for H.264 advanced video coding standard Adarsh Golikeri

Documents