Wavelet Video Coding

Wavelet Video Coding –Principles, Applications and Standardization

Mihaela van der SchaarElectrical and Computer Engineering Department

University of California Davis

2

Outline

IntroductionScalable coding – principles (review)Basic principles of wavelets (review)Motion Compensated Wavelet Coding – basic principles and classification Motion Compensation Temporal Filtering (MCTF)Overcomplete Motion Compensated Wavelet CodingEncoding of spatio-temporal wavelet coefficientsScalable coding of motion informationError resilience aspectsCurrent status in MPEG standardizationComparisons with state-of-the-art non-scalable coding techniques

Introduction

4

Challenges for ubiquitous multimedia communication

Encoder+

Server

IP-based

< 64 k

< 512 k

< 11 M802.11b

6 M

802.11a

< 2 M802.11

64 k -

2 M3G/4G

Internet/Internet2

5

Sample of concrete problems/questions

Signal processingcompression efficiency versus quality of signal reproduction (rate-distortion tradeoffs)compression efficiency versus robustness to losses

Networkingrealistic channel models for effective joint source/channel codingsource-channel interface control strategies for efficient network resource usage and high quality signal reproduction

Computer Architecturecompression efficiency versus computational complexity

6

Possible solution: compression meets the network

Do not require the transport mechanism to be flawless (modulation,

channel coding, transmission protocol etc.), just design the coding

system and transmission jointly

Do not design for worst-case scenario - just adapt on the fly based on

the network and device characteristics

Hence:

A. Scalable Coding

B. Adaptive Streaming

7

Principles of Scalable Coding

Encoding of video signal with different resolutionscales

Downscaling of video signal byCoding noise insertion – SNR ScalabilitySpatial subsampling – Spatial ScalabilitySharpness reduction – Frequency ScalabilityTemporal subsampling – Temporal ScalabilitySelection of content – Content related Scalability

ScaleConversion& Encoding

low

medium

high

Rate / R

esolution

VideoInput

8

The Simple Way – Advance Scaling

Requires feedback about channel / decoder statusOnly point-to-point connection supportedExample : Stream switching

Coder Network DecoderScaleConverter

9

The Parallel Way - Simulcast

Run independent encoders in parallelRequires a priori knowledge about network and decoder capabilities to select optimum scalinglevelsPoint-to-multipoint connections possible

Low Scale Coder

Med. Scale Coder

High Scale Coder

Multiplex

10

Simulcast

Multiplexed transmission of streams

Loss in efficiency due to multiple streamsCan cause network overloadRestricted number of scales

Multiplex Stream

Medium rate stream

Low rate stream

High rate stream

11

The Embedded Way – Layered Coding

"Chain of layers" - information from low resolutionutilized to encode next-higher resolution

Σ

Coder Layer 1

...

Layer 1

...

Σ

Σ

...

−+

Σ−+

y1

y2

yT

(Base layer)

(Enhancem

ent layers)

x

...

Σ+

+

+ +

+ +

Q1

Q2

QT

Q1

Q2

QT

Preprocessing 1

Midprocessing 1

Preprocessing 2

Midprocessing 2

Midprocessing 1

Midprocessing 2

Coder Layer 2

Coder Layer T

Decoder Layer 1

Decoder Layer 2

Layer 2

Layer T

Decoder Layer 1

Decoder Layer 2

Decoder Layer T

12

Layered Coding

Layered coding supports embedded streamsRe-configuration of bit stream for reconstruction withdifferent spatial/temporal/quality resolution

Possible loss in efficiency depends on coding schemeIn theory, arbitrary number of scales could be achieved

Full multiplex = high rate stream

Partial multiplex = medium rate stream

Low rate stream

13

SNR Scalability – Re-quantisation

Example : 2-stage quantizer

Q1

Q2Σ

-

+

Base

Enhancement

Large steps

Small steps (≤Q1/2)Reconstruction value

Decision (threshold) value

Q1

Q2

14

SNR Scalability – Bit-plane Coding

Quantization related to bit planes No zero reconstruction,

unsignedZero reconstruction,

sign/magnitudeZero reconstruction,

sign/magnitude, dead zone

... ... ...

Bit 1 Bit 2 Bit 3 Bit 1 Bit 2 Bit 3

MIN

MAX

0

MAX

Reconstruction value

Decision (threshold) value

Bit 1 Bit 2 Bit 3

0

MAX

... ... ...... ... ... ... ... ...

15

SNR Scalability – Bit-plane Coding

Magnitude of MSB encoded by run-length orbinary entropy codingSign and remaining bits encoded binary, conditional on MSB

Sample1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Bit 5

Bit 4Bit 3

Bit 2

Bit 1

Sign

0

0

0

0

1

1

0

0

0

0

0

0

0

1

0

1

1

0

0

0

0

0

1

0

1

1

0

0

1

1

0

0

1

1

0

0

0

0

0

1

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

1

1

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

1

11

Run-length code4,9

2,103,5,2

3,4,1

0,1,1,0,2

binary coded

16

Spatial Scalability

Base-to enhancement prediction

Q1

Q2Σ

-

+

Base

Enhancement

Low passFilter N:1

Low passFilter 1:N

Decimation

Interpolation

17

Temporal Scalability

Temporal downsampling with temporal anti-alias filter or by frame skippingTemporal upsampling by MC prediction

Q1

Q2Σ

-

+

Base

Enhancement

Low passFilter N:1

Low passFilter (MC) 1:N

Temporalsubsampling(optional)

18

Frequency Scalability / "Data Partitioning"

Popular in context of Transform CodingAllocation of coefficients to different layersdepending on frequencyVery low complexity

DataPartitioning

Single-layerEncoder

OriginalVideo

Outputstream

Base-layerstream

Enhancement-layer stream

MUX

PriorityBreakPoint

Base-layercoefficients

Enhancement-layercoefficients

19

Multiresolution Concepts

Generate different resolution levels by successivedown/upsampling operationsResolution pyramids example : Spatial resolutionreduction by factors of 2

Full resolution

Lowest resolution...

c0c1

c2

cU-1

20

Multiresolution Concepts – Pyramids

Gaussian PyramidEach layer is self-containedCorresponds to Simulcast conceptMore samples to be encoded

4:1 ...

...

H(z1,z2) 4:1H(z1,z2)

cU-2

c0

c1

cU-1

x(m,n)

21


Laplacian Pyramid (Differential Pyramid)All lower-resolution layers required to reconstruct high-resolution layersCorresponds to Layered Coding conceptNot critically sampled – more samples than original

ΣΣ

...

+

-+

-

cU-1

c0

c1

4:1H(z1,z2)

...

G(z1,z2)

1:4

4:1H(z1,z2)

G(z1,z2)

1:4

x(m,n)

22


Advantages :Pyramids can be combined with any coding scheme forthe different resolution levelsDownsampling can be made alias-free

Disadvantages :Number of pixels higher than in original signalHigher data rate than one-layer coding

Possible solution :Critically sampled pyramids (Wavelets)Disadvantage : Downsampled signals bear alias

Basic Principles of Wavelets

24

Filter Pairs

Critically sampled filter bank with 2 bands

Analysis low-/highpass filter pairs H0/H1

Synthesis low-/highpass filter pairs G0/G1

Number of samples c in frequency bands equal to total number of samples in signal x

Σ

H0(z) 2:1 1:2 G0(z)

H1(z) G1(z)

c0

c1

y(n)x(n)

2:1 1:2Analysis Synthesis

25

Filter Pairs

Perfect reconstruction is possible

Subsampled signals c usually bear alias !

Σ

H0(z) 2:1 1:2 G0(z)

H1(z) G1(z)

c0

c1

y(n)x(n)

2:1 1:2

[ ]

[ ] ).()()()()(21

)()()()()(21)(

1100

1100

zXzGzHzGzH

zXzGzHzGzHzY

−⋅⋅−+⋅−+

⋅⋅+⋅=

Analysis Synthesis

=2⋅z-k

=0

26

Biorthogonality Principle

Perfect reconstruction conditions

[ ]

[ ]

0 0 1 1

0 0 1 1

1( ) ( ) ( ) ( ) ( ) ( )21 ( ) ( ) ( ) ( ) ( ).2

Y z H z G z H z G z X z

H z G z H z G z X z

= ⋅ + ⋅ ⋅

+ − ⋅ + − ⋅ ⋅ −

)()(

)()(

01

10

zHzzG

zHzzGk

k

−⋅−=

−⋅=

[ ]0 1 1 01( ) ( ) ( ) ( ) ( ) ( )2

kY z H z H z H z H z X z z= ⋅ − − ⋅ − ⋅ ⋅

0 1( ) ( ) 2 with ( ) ( ) ( )kP z P z z P z H z H z−⇒ − − = ⋅ = ⋅ −

0)()(

)()(

01

10

=−⋅⋅−−

−⋅⋅−⇒

zHzzH

zHzzHk

k

=2⋅z-k

27


H0(z)/G1(-z) and H1(z)/G0(-z) constitute orthogonal pairsLow-/Highpass transfer functions not symmetricLinear phase or non-linear phase filters possibleLow-/Highpass impulse responses may have different length

Σ

H0(z) 2:1 1:2 G0(z)

H1(z) G1(z)

c0

c1

y(n)x(n)

2:1 1:2

28


A simple biorthogonal filter pair (5/3 integer)

( )(5/3) 2 1 1 20

1( ) 2 6 28

H z z z z z− −= − + ⋅ + + ⋅ −

Σ

H0(z) 2:1 1:2 G0(z)

H1(z) G1(z)

c0

c1

y(n)x(n)

2:1 1:2

( )(5/3) 1 21

1( ) 1 22

H z z z− −= − + ⋅ − ( )(5/ 3) 1 1 2 31

1( ) 2 6 28

G z z z z z− − −= − + + ⋅ + ⋅ −

( )(5/3) 1 10

1( ) 22

G z z z− −= + +

29

Lazy Transform(even/odd sam

pleseparation)

PredictionP1(z)

UpdateU1(z)

+

PredictionPK(z)

UpdateUK(z)

+...

...

x

x

KL

KH

L Out

H Out

In

A

B

..ABAB..

Lifting Filters

Biorthogonal filter pairs can be factorized to be implementable in a "ladder structure""Prediction" and "Update" steps using very short filter kernels are then iteratively performed"Lifting scheme" is most efficient implementation of wavelet filters available so far

30

+ +

Lazy Transform(even/odd sam

plegrouping)

PredictionP1(z)

UpdateU1(z)

PredictionPK(z)

UpdateUK(z)

1/KL

1/KH

L In

H In

Out

A

B..ABAB..

Lifting Filters

Synthesis filter pair is implemented by inverse signal flowPerfect reconstruction is obvious

Quantization of signals in the ladder branches gives integer realization of analysis and synthesis

31

Lifting Filters

Signal flow diagrams of lifting implementations for (5/3) filters and Haar (2/2) filters

−1

1

1

−1

1

1/2

1

1

1

1x(2m')

1/2

1/2

−1

−1/2

−1/2

1

1

1

1/4

1

1

1

1

−1/2

−1/2

−1/2

−1/2

1/4

1/4

1/4

1/4

1/4

x(2m'+1)

c1(m')

c0(m')

x(2m'+3)

x(2m'+2)

x(2m'+5)

x(2m'+4)

x(2m'+6)

c1(m'+1)

c0(m'+1)

c1(m'+2)

c0(m'+2)

c0(m'+3)

x(2m')

x(2m'+1)

c1(m')

c0(m')

x(2m'+3)

x(2m'+2)

x(2m'+5)

x(2m'+4)

x(2m'+6)

c1(m'+1)

c0(m'+1)

c1(m'+2)

c0(m'+2)

c0(m'+3)

a) b)

Motion Compensated Wavelet Coding –basic principles and classification

33

Wavelet Video Coding - Classification

Intraframe coding (e.g. MJPEG)3D wavelet coding without MCHybrid video coding using wavelet-based texture codingIn-Band Motion Compensation PredictionMotion Compensated Temporal FilteringIn-Band Motion Compensated Temporal Filtering

34

History

Using transforms for interframe coding goes back to the 1970/1980s (e.g. Karlsson/Vetterli)Drawback was lack of motion compensation –first approch to filter over motion trajectories proposed by Kronander (1990)Solution avoiding an overcomplete transformdeveloped by Ohm (1991,1994)Solution for perfect reconstruction in case of half-pel motion by Ohm/Rümmler (1997), Hsiang and Woods (1999)Different researchers proposed combination with temporal axis lifting scheme which makesvirtually any MC possible : Pesquet/Bottreau, Luo/Li/Zhang, Secker/Taubman (2001)

35

H

LH

LLL LLH

video sequence

1st temporal level

2nd temporal level

3rd temporal level

Three-dimensional Wavelet

Temporal decomposition of a group of 8 frames(3 levels of wavelet transform)

36

Three-dimensional Wavelet Coding

Extensionof zero treeapproachto temporaldimension

Non-recursive coding structure

Examples:"3D SPIHT" by Pearlman et al.Layered Zero Coding (LZC) by Taubman and Zakhor (only constant displacement motion compensation)

37

Wavelets and Motion Compensation

Motion compensation is keyTo achieve good compression performance To guarantee visual quality – non MC/interframe coding with same SNR usually looks worse

Motion-compensated Wavelet video codingTemporal MC prediction followed by Wavelet TransformWavelet Transform followed by temporal MC prediction inwavelet domain3D Wavelet with MC

38

Hybrid Video Coding using Wavelets

Replacement of DCT by Wavelet for 2D encodingin MC prediction loop

Σ Q

ΣMC

+

-

+

+

DWT

IDWT

C Σ+

MC

+D... IDWT

ME

39

Hybrid Video Coding using Wavelets

Problems and possible solutions:

Wavelet analysis is block-overlapping, discontinuities inmotion vector field cause problems

Overlapping-block MC

Local switching between Intra/inter modes not block-wiseSymmetric extension at block discontinuities

Drift problem in MC loop is not solved

This is not a real scalable solution

40

Motion compensation in the wavelet domain

Multi-resolution nature of wavelet decomposition is ideal for providing spatially scalable video (QCIF, CIF, SD, and HD)Subbands are highly correlated in the temporal direction

Motion estimation and compensation can significantly reduce the temporal correlation

Classical approachMRMC (multi-resolution motion compensation scheme)Ref: Y. Zhang and S. Zafar, IEEE CSVT, Sept. 1992.

41

Multi-Resolution Motion Compensation

42

MC in Wavelet Domain - Encoder

2DDWT

Q

MC

Σ+

-

ΣFS(low)

+

+

Q

MC

Σ+

-

ΣFS(med.)

+

+

Coeff.COD.

MotionEstimation

:2

:2

... ...

MUX

MotionCoding

Q

MC

Σ+

-

ΣFS(high)

+

+

43

MC in Wavelet Domain

Variable block size of the m-th layer subbands for M-level decomposition

Motion vector for each subband (j=1,..,3)

Adaptive search range for each subband

mMmM pp −− × 22

),(2),(),( )(,

)(0,

)(, yxyxVyxV m

jimMm

imji ∆+= −

i: frame number, j: subband index (j=0,…,3), m: layer number

44

MC in Wavelet Domain – Advantages andDrawbacks

Multiple (separate) MC loops for wavelet bandsone set of motion parameters may be used for all

No drift problem in spatial scalabilityPossible to skip higher frequency bands

Switching to "intra" coding mode without penaltyInverse DWT is applied to images (not differences)

Inefficiency of MC prediction in high bandsSignificant performance loss compared to ME/MC in spatial domain (e.g.1-2dB) The shift variant property of wavelet decomposition

Motion-Compensated Temporal Filtering

46

Motion Compensated Haar Filters

Non-orthonormal Haar filter basisMC shift

Delay by one frame( )10 1 2 3

11 1 2 3

1( ) 12

( ) .

k l

k l

H z z z z

H z z z z

−

−

= + ⋅ ⋅

= − ⋅ +

% %

47


This motion-compensated filtering is no problem whenever unique sample-wise correspondences exist between two frames

Real motion vector fields are discontinuous, such that correspondences may not be unique

( , ) ( , )

( , ) ( , )( , )

with( , ).

B B A A B A A A

B B A A B A A A

m n m n

m n m n

k k m m k m nl l n n l m n

=

=

− = +⎧⎨− = +⎩

%

%

?

? ?

??

covered/multiple connecteduncovered/unconnected

?

origin of motion trajectory

48


Substitution technique for covered/uncovered areas allows perfect reconstruction at motion discontinuities

covered

uncovered

MC prediction fromprevious frame

Insert originalvalues from B

A B H L

?

?

B-1

( )( )

( , ) 0.5 ( , ) 0.5 ( , ), ( , )

( , ) ( , ) ( , ), ( , )

L m n B m n A m k m n n l m n

H m n A m n B m k m n n l m n

= ⋅ + ⋅ + +

= − + +

% %

if 'unconnected'( , ) ( , )ˆ( , ) ( , ) ( , ) if 'multiple connected'

L m n B m n

H m n A m n A m n

=

= −

( )( )1

ˆ 'backward mode'( , ) ( , ), ( , )ˆ 'forward mode'( , ) ( , ), ( , )

A m n B m k m n n l m n

A m n B m k m n n l m n−

= + +

= − −

"unconnected"

"multipleconnected"

49


Synthesis is straightforward in case of full-pixel correspondences

( )( )

( , ) ( , ), ( , ) 0.5 ( , )

( , ) ( , ) 0.5 ( , ), ( , ) ,

A m n L m k m n n l m n H m n

B m n L m n H m k m n n l m n

= + + +

= − + +

%

% %%

( , ) ( , ) if 'unconnected'ˆ( , ) ( , ) ( , ) if 'multiple connected'.

B m n L m n

A m n A m n H m n

=

= +

%

%

50

Σ

Σ

:2

L

H

M

2:1

2:1

Motionestimation

Connection &mode switch

analysis

Motioncompensation

I

+

+

-

+

z-1

U/I

Switch positions :U - unconnectedI - intraframeF/B - forward/backward

Motioncompensation

Frame B

Frame A

I

z-1 Frame B-1

F/B

S


2-band temporal Haar analysis filterH0

H1

51

L

H

M

x

Σ Σ O1:2

1:2Σ

0.5 1

+

+

+

-

z-1

M/I

M/I*)

U/I

Switch positions :U - unconnectedM - multiple connectedI - intraframeF/B - forward/backward

*) Switch open for I

Frame B

Frame A

Connection &mode switch

control

Motioncompensation

Motioncompensation

:2:2 Motioncompensation

z-1

Frame B-1

F/B

S


2-band temporal Haar synthesis filterG1

G0

52

Coding of motion information

IL

HM

IL

HM

IL

HM

IL

HMO

L

H M OL

H M OL

H MO

L

H M

2-D W

avelet decom

position,quantization, encoding

x y

Decoding of motion information

Motion Compensated Temporal Wavelet Tree

Scaling and Wavelet coefficientsfrom temporal analysis (arranged as 2D images)

53

Motion-compensated Lifting Filters

Signal flow diagram

−β

β−1

1

1

−β

β−1

1

(1−β)/2

1

1

1

β/2

1

"A" "B" "H" "L"

B*

A* (1−β)/2

β/2

(1−β)/2

−β

β−1

β/2

B(m,n-1)

B(m,n)

B(m,n+1)

A(m-k,n-l)

A(m-k,n-l+1)

Vertical shiftby pixels

l β+

54

Motion-compensated Lifting Filters

Extensible to longer interpolation filters, e.g. (9/7)

2(1−β)p1

1

1

1

1

1

1

1

"A" "B" "I1" "I2"

B*

A*1

1

1

1

1

"H"

1

1

1

1

1

"L"

1

1

1

1

1

1

2βp1

2(1−β)p1

2βp1

2(1−β)u1

2βu1

2(1−β)p2

2βp2

2(1−β)u2

2βu2

2(1−β)p1

2βp1

2(1−β)p1

2βp1

2(1−β)u1

2βu1

2(1−β)u1

2βu1

2(1−β)u1

2βu1

2(1−β)p2

2βp2

2(1−β)p2

2βp2

2(1−β)p2

2βp2

2(1−β)u2

2βu2

2(1−β)u2

2βu2

2(1−β)u2

2βu2

With β=0.5: Equivalent to the half-pel P.R. method proposed in [Ohm,Rümmler 97]and used in [Hsiang, Woods 99]

55

Motion-compensated Lifting filters

The principle is straightforwardly extensible tolonger wavelet filters separable (or non-separable 2D filters)change of a with any position (e.g. MC based on affine model, dense motion vector fields)

Coincidence of motion correspondences in adjacent prediction and update steps must be observedLifting implementation of temporal wavelet filtering also leads to an elegant interpretation of previous covered/uncovered pixel substitutionVery efficient implementation

56


Adaptation at motion boundaries :"uncovered/unconnected" caseAdditional "lazy"pixel(s) in frame B −β1

β1−1

1

1

−β2

β2−1

1

(1−β1)/2

1

1

1

β1/2

1

"A" "B" "H" "L"

B*

A*

(1−β2)/2

β2/2

−β2

β2−1

−β1

1

1

β1/2

(1−β1)/2

β1−1

(1−β2)/2

Motion boundary#

A*

B*

57


Adaptation at motion boundaries :"Covered/multiple connected" caseAdditional predictionpixel(s) in frame A/H −β1

β1−1

1

1

−β2β2−1

1

(1−β1)/2

1

1

1

β2/2

1

"A" "B" "H" "L"

B*

A*

−β2β2−1

−β1

1

1

β1/2

(1−β2)/2

β1−1

β2/2

(1−β2)/2

(1−β2)/2

Motion boundary

#

B*

A*

This pixel might alsotake the 'unconnected'role!

58


Frame-wise or localized implementation of intra coding is a key concept in MC prediction coders

Switching to intra mode is applied whenever no motion correspondence can be found, e.g. scene changes or uncovered areas

In MC temporal filteringthe equivalent is an adaptation of wavelet tree depthbut intra coding could also be applied individually in the prediction and update steps

In general, localized mode switching can be included in a simple way in the lifting structure

59

More Flexibility in MC Lifting Filters

Different view ofone transform level: Temporal-axis lifting filters, including 2D MC in cross pathsMC and IMCshould be related such that pixels fromA correspondto L

video sequence

MC MC MC

IMC IMC

highpass sequence

lowpass sequence

B A B A B A

IMC

-1

1/2 1/2

1 -11 -11

1/2 1/2 1/2 1/2

H H H

L L L

60


Extension to longertemporal filters (5/3)H frames equivalentto bidirectionalpredictionForward/backwardswitching possibleBetter codingefficiencyNo temporalblockingMore memoryHigher delayMore motion vectors

MC MC MC MC MC MC

IMC IMC IMC IMC

B A B A B A B

IMC IMC

MC

IMC

MC

IMC

-1/2 -1/2 -1/2 -1/2 0 -11 1 1

1/4 1/41 1/4 1/41 1/4 01 1/2 1/41

H H H

L L L L

video sequence

highpass sequence

lowpass sequence

switch touni-directional

61


Non-dyadicdecompositionTemporal blockunits of 3 framesE.g. 30/10 HztemporalscalabilityCan be extendedto bidirectionalMC in prediction step

video sequence

MC

IMC

highpass sequence

lowpass sequence

B A

-1

1/4

1

H

L

MC

IMC

B

-1 1

1/41/2

H

MC

IMC

B A

-1

1/4

1

H

L

MC

IMC

B

-1 1

1/41/2

H

62

Low-Delay modes in MC Temporal Filtering

Frame 1 Frame 2 Frame 3 Frame 4

Leave as original

A AH H

Leave as original

A A A H

Level 1

Level 2Filter A frames fromprevious level

Temporal pyramid decomposition with omission of update step ("A" frames left as originals)

63



A HH A

Modified MCTF Scheme

Leave as original Leave as original

"A" frames can be inserted at arbitrary locations ->

the sequence can be decoded at non-dyadic lower frame rates

64


A frames allow implementation of a low-delay mode

A frames can be encoded and transmitted immediately, but must be stored for future referenceH frames can be encoded and transmitted immediately in any of the schemes

Disadvantage : lower coding efficiencyMay be compensated by improved prediction

65



A HH A

Leave as original Leave as original

Inclusion of bi-directional prediction

Choice between 3 modes:

- Use backward prediction

- Use forward prediction- Use the average block of the backward and forward predictions while filtering

66


Prediction step can be enriched by selecting multiple reference frames

Reference Frame 1 Reference Frame 2 Current Frame

AdvantagesImproved coding efficiencyEasy to incorporate the advanced MC & ME options used by predictive coders (H.264/AVC, MPEG-4 etc.)Reduced no. of unconnected pixels

DisadvantagesSacrifice Temporal ScalabilityPrediction drift can become a problem when decoding at lower bit-rates

67





A A A A

A A A A

AHAH Scheme

Bi-directionalAHAH Scheme

AHHA Scheme

Bi-directionalAHHA Scheme


Different configurations at any level of pyramid

OvercompleteMotion Compensated Wavelet Coding

•Shift-variant property of wavelets•Frame theory - overcomplete wavelets•Low band shifting method•Inband motion compensated temporal filtering•Simulation results

69

The Shift Variance Property of Wavelets

Haar filter output of step edgesHaar - DWT

L 1

L 1

H 1

H 1

Signal

Signal shiftedby one pixel

Low pass channel :prediction by linear interpolation

High pass channel :no prediction possible !

70


71


Suppose By substituting

Hence,

( ) 2/0

)2/(11 1

2221)()( ωπω πωπωωω jvjvj eeXHeXY −−− −⎟

⎠⎞

⎜⎝⎛ +⎟

⎠⎞

⎜⎝⎛ +=−

Aliasing components (zero only when v = even)

72

Optimal Aliasing Reduction Filter Approach

In order to minimize the aliasing in wavelet domain ME/MC (X.Yang, K. Ramchandran, IEEE-TIP, May, 2000)

L : aliasing reduction filter

73


74


Still not efficient as motion estimation in spatial domain

Any ultimate solution ? Shift invariant Overcomplete Wavelets

75

Frame Theory – Overcomplete Wavelets

Properties of redundant frameNoise reductionMore sparse representation matching pursuitRedundant representation multiple description codingShift invariant property

Improved motion estimation/compensation in wavelet domainOnly motion references need to be overcompleteTexture coding is still in complete wavelet domain

76

Haar-DWT

L1

L1

H1

H1

Haar-ODWT Signal

Shift Invariance of Overcomplete Wavelets

Overcomplete representation without downsampling

Signal shiftedby one pixel

Low pass channel :prediction by linear interpolation

High pass channel :no prediction possible !

Prediction possiblein any case !

77

Low-Band-Shift Method

• Optimal way of generating overcomplete wavelet coefficients for every shift

78

(0,0) (2,0) (1,0) (3,0)

(0,2) (2,2) (1,2) (3,2)

(0,1) (2,1) (1,1) (3,1)

(0,3) (2,3) (1,3) (3,3)

Low Band Shift Method for 2-D

Originalimage

(0,0) (1,0)

(0,1) (1,1)

(0,0) (1,0)

(0,1) (1,1)

(0,0) (1,0)

(0,1) (1,1)

(0,0) (1,0)

(0,1) (1,1)

LL HL LH HH

(0,0) (2,0) (1,0) (3,0)

(0,2) (2,2) (1,2) (3,2)

(0,1) (2,1) (1,1) (3,1)

(0,3) (2,3) (1,3) (3,3)

(0,0) (2,0) (1,0) (3,0)

(0,2) (2,2) (1,2) (3,2)

(0,1) (2,1) (1,1) (3,1)

(0,3) (2,3) (1,3) (3,3)

(0,0) (2,0) (1,0) (3,0)

(0,2) (2,2) (1,2) (3,2)

(0,1) (2,1) (1,1) (3,1)

(0,3) (2,3) (1,3) (3,3)

LLLL LLHL LLHHLLLH

(x,y): shift in (x,y) pixels in original image

# of reference frames= 3n+1

: bands used for complete wavelet expansion

79

Conventional Wavelet Transform

Original image

LL HL LH HH

LLLL LLHL LLLH LLHH

80

Overcomplete Wavelet Transform by Low-Band Shift Method

LL

Original frame

LL HL LH HH

LLLL LLHL LLLH LLHH

81

Overcomplete Wavelet MC Coding - Coder

Q

MC

Σ+

-

ΣFS(low)

+

+

Q

MC

Σ+

-

ΣFS

(med.)

+

+

IDWTODWT

Q

MC

Σ+

-

ΣFS

(high)

+

+

IDWTODWT

Coeff.COD.

MotionEstimation

:2

:2

... ...

MUX

MotionCoding

2DDWT

82

Overcomplete Wavelet MC Coding - Decoder

Coeff.Decoder

MC

Σ+

+

FS(low)

MC

Σ+

+FS

(med.)IDWTODWT

MotionDecoder

:2

:2

...

MC

Σ+

+FS

(high)IDWTODWT

Low resolutionreconstruction

Medium resolutionreconstruction

High resolutionreconstruction

DMUX

...

83

Overcomplete Wavelet MC Coding

ODWT is Wavelet without subsamplingMore samples than original, like Pyramid representation

Allows Wavelet domain MC for high frequency componentssignal does not bear frequency-inversion alias component

Still only necessary to encode critically sampled coefficientsOvercomplete transform coefficients can be generated locally within the decoder

Still does not resolve the drift in SNR scalabilitymay be solved by multiple loops in each wavelet band

Solution: In-Band MCTF (IBMCTF)

84

Spatial-Domain MCTF (SDMCTF)

EC

DWT

SBC

T R A N S M I S S I O N

Video METemporal Filtering

MV and Ref. Frame No.

MCTF

Current frame

MVCDWT: Discrete Wavelet Transform SBC: Sub-Band Coder EC: Entropy Coder ME: Motion Estimation MVC: Motion Vector Coder

85

In-band MCTF (IBMCTF)

EC

DWT

SBC

TRANSMI SSI ON

Video

METemporal Filtering

MV and Ref. Frame No.

MCTF

CODWTCurrent frame

MVC

DWT: Discrete Wavelet Transform SBC: Sub-Band Coder EC: Entropy Coder CODWT: Complete to Overcomplete DWT ME: Motion Estimation MVC: Motion Vector Coder

86

IBMCTF: concept

temp

hor

ver

87

IBMCTF Wavelet Video

temp

hor

ver

For efficient IBMCTF, ME should be performed in overcomplete wavelet domain

88

3-D decomposition structure

temp

ver

hor

temp

hor

ver

SD- MCTF Inband MCTF

89

Block diagram of IBMCTF coder

Wavelet transform

Input Video

TextureCoding

Motion Estimation

Temporal Filtering

MV and Ref.Frame No.

IBMCTF 1

Break into GOFsBand 1

Band 2

Band N

….. Break into GOFs

Break into GOFs

TextureCoding

Motion Estimation

Temporal Filtering


IBMCTF 2

TextureCoding

Motion Estimation

Temporal Filtering


IBMCTF N

Bitstream

90

IBMCTF coding

Allows Wavelet domain MC using shift-invariant overcomplete waveltes by Low-Band Shift methodStill only necessary to encode critically sampled coefficients Advantages in spatial scalabilityResolve the drift in SNR scalabilityAdaptive processing for each subband

Different ME accuracy, interpolation filter, temporal filtertaps, etc.

Very general framework which can be combined with other existing techniques (intra mode, UMCTF, etc)

91

Results

Foreman, 300 fs, full-pel ME/MC, 30 fps, CIF

30

31

32

33

34

35

36

500 600 700 800 900 1000 1100 1200 1300 1400 1500bitrate (Kbps)

PSN

R Y

(dB

)

IBMCTF, level-by-level CODWTSDMCTFIBMCTF, full CODWT

92

Results

Foreman, 300 fs, full-pel ME/MC, 30 fps, CIF

30

31

32

33

34

35

36

500 600 700 800 900 1000 1100 1200 1300 1400 1500bitrate (Kbps)

PSN

R Y

(dB

)IBMCTF, level-by-level CODWTSDMCTFIBMCTF, full CODWT

Foreman, 300 fs, full-pel ME/MC, 15 fps, QCIF

31

32

33

34

35

36

37

38

39

150 200 250 300 350 400 450 500bitrate (Kbps)

PSN

R Y

(dB

)


93

Results

Foreman, 300 fs, full-pel ME/MC, 7.5 fps, Q-QCIF

36373839404142434445464748

100 110 120 130 140 150 160 170bitrate (Kbps)

PSN

R Y

(dB

)


94

Generation of Wavelet Blocks

Wavelet block provides a direct association between the wavelet coefficients and what they represent spatially

ME is done based on wavelet block No motion vector overhead because the number of the motion vector to be coded is the same as that of SDMCTFPerfectly aligned with tree structure entropy coder

Entropy based motion estimation criterion !!

95

Proposed interleaving of overcomplete wavelet coefficients

Coef. for shift=0

Coef. for shift=1

Interleaved coef.

96

Overcomplete Wavelet Transform with Interleaving

L HL LH HH

LLL LLH LLL LLHH

Original frame

97

Advantages of Interleaving

Interleaving algorithm enables optimal sub-pixel accuracy motion estimation and compensation in IBMCTFBy interleaving, any existing ME module (HVSBM, FSBM, Intra Mode, etc) with any fractional pelaccuracy can be usedCan be easily used for MCTF framework with any fractional pel accuracy using lifting structure

98

3-D Lifting Structure for IBMCTF

Direct extension of SD-MCTF lifting to IBMCTF:

[ ] [ ] [ ]( ) 3,...,0,2)(),(~,, =−−−= indnmdmAnmBnmH ij

ij

ij

ij

ij

Interpolation operation for frame is not optimal (no cross-phase dependencies incorporated)

ijA

99

3-D Lifting Structure for IBMCTF

[ ] [ ] [ ]( ) 3,...,0,22,2~_,, =−−−= idndmALBSnmBnmH nj

mji

jij

ij

[ ] [ ][ ] 3,..,0)(),(2

,2~_)(),(

=−−+

+−+−=−−

indnmdmA

ddnddmHLBSndnmdmLij

ij

ij

nnmmji

jij

ij

ij

21B

11B

31B

02B 2

2B12B 3

2B

100

Results

"Foreman"300 frames, 30fps, CIF

26

28

30

32

34

36

38

40

42

0 500000 1000000 1500000 2000000 2500000

bps

PSN

R (d

B) SDMCTF (1/8)

IBMCTF (1/8)

SDMCTF

IBMCTF

101

Overcomplete wavelet coding using standard-compliant DCT base-layers

EC

SBC

T

R

A

N

S

M

I

S

S

I

O

N

Video

METemporal Filtering


MCTF

CODWTHigh -frequency

bands

MVC

DWT : Discrete Wavelet Transform SBC : Sub - Band CoderEC : Entropy Coder CODWT : Complete to Overcomplete DWTME : Motion EstimationMVC : Motion Vector Coder

Low -frequency band

MPEG -compliant coding/decoding

Residual information

DWT -

decoded pictures

Proposed by Andreopoulos, van der Schaar, et al – ICIP 2003

102

Results

Current Status in MPEG Standardization

104

MPEG's Scalable Coding History

Development of scalable video coding solutions has a long history in MPEG, starting from MPEG-2

Spatial, temporal and SNR scalability with at most 3 levelsMPEG-4 Fine granularity scalability

So far, all standardized solutions have shown deficiency in coding performance which is mainly due to recursive MC structure

Drift occurs when not all information is available Drift-free structures are less coding efficient

105

MPEG's Interframe Wavelet Coding Exploration

New embedded wavelet solutions were proposed In theDigital Cinema Call for Proposals and in the Call for Proposals on improved coding efficiency(both due July 2001) At Pattaya meeting (Dec. 2001), MPEG started an Adhoc Group to explore Interframe Wavelet CodingDifferent methods were investigated

MC prediction with intraframe (2D) waveletIn-band MC prediction based on overcomplete 2D wavelet decomposition3D (spatio/temporal) wavelet coding based on MCTF

3D (t+2D, 2D+t) showed most promising, providing excellent coding efficiency while being fully scalable in temporal, spatial and quality resolutionExperimental software was used

106

MPEG's Interframe Wavelet CodingExploration

The Interframe Wavelet exploration was successfully completed in October 20029 Call for Evidence on Scalable Coding Advances - July 200324 Call for Proposal Responses – Mach 2004

107

Some less good results (out of 10 sequences)

SNR Results from MPEG's Intraframe Wavelet Coding Exploration

3.5 dB

1.45 dB

1.75 dB

0.75 dB

AVC 1

AVC 2

MCTF

AVC 1

AVC 2MCTF

108

Some more good results (out of 10 sequences)

SNR Results from MPEG's Intraframe Wavelet Coding Exploration

3.6 dB

0.01 dB

1.42 dB

0.4 dBAVC 1

AVC 2

MCTF

AVC 1

AVC 2

MCTF

109

Acknowledgements

Jens OhmYiannis AndreopoulosJong YeKonstantin HankeClaudia MayerThomas Rusert

Wavelet Video Coding

Documents