Wavelet Video Coding – Principles, Applications and Standardization Mihaela van der Schaar Electrical and Computer Engineering Department University of California Davis
Wavelet Video Coding –Principles, Applications and Standardization
Mihaela van der SchaarElectrical and Computer Engineering Department
University of California Davis
2
Outline
IntroductionScalable coding – principles (review)Basic principles of wavelets (review)Motion Compensated Wavelet Coding – basic principles and classification Motion Compensation Temporal Filtering (MCTF)Overcomplete Motion Compensated Wavelet CodingEncoding of spatio-temporal wavelet coefficientsScalable coding of motion informationError resilience aspectsCurrent status in MPEG standardizationComparisons with state-of-the-art non-scalable coding techniques
Introduction
4
Challenges for ubiquitous multimedia communication
Encoder+
Server
IP-based
< 64 k
< 512 k
< 11 M802.11b
6 M
802.11a
< 2 M802.11
64 k -
2 M3G/4G
Internet/Internet2
5
Sample of concrete problems/questions
Signal processingcompression efficiency versus quality of signal reproduction (rate-distortion tradeoffs)compression efficiency versus robustness to losses
Networkingrealistic channel models for effective joint source/channel codingsource-channel interface control strategies for efficient network resource usage and high quality signal reproduction
Computer Architecturecompression efficiency versus computational complexity
6
Possible solution: compression meets the network
Do not require the transport mechanism to be flawless (modulation,
channel coding, transmission protocol etc.), just design the coding
system and transmission jointly
Do not design for worst-case scenario - just adapt on the fly based on
the network and device characteristics
Hence:
A. Scalable Coding
B. Adaptive Streaming
7
Principles of Scalable Coding
Encoding of video signal with different resolutionscales
Downscaling of video signal byCoding noise insertion – SNR ScalabilitySpatial subsampling – Spatial ScalabilitySharpness reduction – Frequency ScalabilityTemporal subsampling – Temporal ScalabilitySelection of content – Content related Scalability
ScaleConversion& Encoding
low
medium
high
Rate / R
esolution
VideoInput
8
The Simple Way – Advance Scaling
Requires feedback about channel / decoder statusOnly point-to-point connection supportedExample : Stream switching
Coder Network DecoderScaleConverter
9
The Parallel Way - Simulcast
Run independent encoders in parallelRequires a priori knowledge about network and decoder capabilities to select optimum scalinglevelsPoint-to-multipoint connections possible
Low Scale Coder
Med. Scale Coder
High Scale Coder
Multiplex
10
Simulcast
Multiplexed transmission of streams
Loss in efficiency due to multiple streamsCan cause network overloadRestricted number of scales
Multiplex Stream
Medium rate stream
Low rate stream
High rate stream
11
The Embedded Way – Layered Coding
"Chain of layers" - information from low resolutionutilized to encode next-higher resolution
Σ
Coder Layer 1
...
Layer 1
...
Σ
Σ
...
−+
Σ−+
y1
y2
yT
(Base layer)
(Enhancem
ent layers)
x
...
Σ+
+
+ +
+ +
Q1
Q2
QT
Q1
Q2
QT
Preprocessing 1
Midprocessing 1
Preprocessing 2
Midprocessing 2
Midprocessing 1
Midprocessing 2
Coder Layer 2
Coder Layer T
Decoder Layer 1
Decoder Layer 2
Layer 2
Layer T
Decoder Layer 1
Decoder Layer 2
Decoder Layer T
12
Layered Coding
Layered coding supports embedded streamsRe-configuration of bit stream for reconstruction withdifferent spatial/temporal/quality resolution
Possible loss in efficiency depends on coding schemeIn theory, arbitrary number of scales could be achieved
Full multiplex = high rate stream
Partial multiplex = medium rate stream
Low rate stream
13
SNR Scalability – Re-quantisation
Example : 2-stage quantizer
Q1
Q2Σ
-
+
Base
Enhancement
Large steps
Small steps (≤Q1/2)Reconstruction value
Decision (threshold) value
Q1
Q2
14
SNR Scalability – Bit-plane Coding
Quantization related to bit planes No zero reconstruction,
unsignedZero reconstruction,
sign/magnitudeZero reconstruction,
sign/magnitude, dead zone
... ... ...
Bit 1 Bit 2 Bit 3 Bit 1 Bit 2 Bit 3
MIN
MAX
0
MAX
Reconstruction value
Decision (threshold) value
Bit 1 Bit 2 Bit 3
0
MAX
... ... ...... ... ... ... ... ...
15
SNR Scalability – Bit-plane Coding
Magnitude of MSB encoded by run-length orbinary entropy codingSign and remaining bits encoded binary, conditional on MSB
Sample1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Bit 5
Bit 4Bit 3
Bit 2
Bit 1
Sign
0
0
0
0
1
1
0
0
0
0
0
0
0
1
0
1
1
0
0
0
0
0
1
0
1
1
0
0
1
1
0
0
1
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
1
1
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
1
11
Run-length code4,9
2,103,5,2
3,4,1
0,1,1,0,2
binary coded
16
Spatial Scalability
Base-to enhancement prediction
Q1
Q2Σ
-
+
Base
Enhancement
Low passFilter N:1
Low passFilter 1:N
Decimation
Interpolation
17
Temporal Scalability
Temporal downsampling with temporal anti-alias filter or by frame skippingTemporal upsampling by MC prediction
Q1
Q2Σ
-
+
Base
Enhancement
Low passFilter N:1
Low passFilter (MC) 1:N
Temporalsubsampling(optional)
18
Frequency Scalability / "Data Partitioning"
Popular in context of Transform CodingAllocation of coefficients to different layersdepending on frequencyVery low complexity
DataPartitioning
Single-layerEncoder
OriginalVideo
Outputstream
Base-layerstream
Enhancement-layer stream
MUX
PriorityBreakPoint
Base-layercoefficients
Enhancement-layercoefficients
19
Multiresolution Concepts
Generate different resolution levels by successivedown/upsampling operationsResolution pyramids example : Spatial resolutionreduction by factors of 2
Full resolution
Lowest resolution...
c0c1
c2
cU-1
20
Multiresolution Concepts – Pyramids
Gaussian PyramidEach layer is self-containedCorresponds to Simulcast conceptMore samples to be encoded
4:1 ...
...
H(z1,z2) 4:1H(z1,z2)
cU-2
c0
c1
cU-1
x(m,n)
21
Multiresolution Concepts – Pyramids
Laplacian Pyramid (Differential Pyramid)All lower-resolution layers required to reconstruct high-resolution layersCorresponds to Layered Coding conceptNot critically sampled – more samples than original
ΣΣ
...
+
-+
-
cU-1
c0
c1
4:1H(z1,z2)
...
G(z1,z2)
1:4
4:1H(z1,z2)
G(z1,z2)
1:4
x(m,n)
22
Multiresolution Concepts – Pyramids
Advantages :Pyramids can be combined with any coding scheme forthe different resolution levelsDownsampling can be made alias-free
Disadvantages :Number of pixels higher than in original signalHigher data rate than one-layer coding
Possible solution :Critically sampled pyramids (Wavelets)Disadvantage : Downsampled signals bear alias
Basic Principles of Wavelets
24
Filter Pairs
Critically sampled filter bank with 2 bands
Analysis low-/highpass filter pairs H0/H1
Synthesis low-/highpass filter pairs G0/G1
Number of samples c in frequency bands equal to total number of samples in signal x
Σ
H0(z) 2:1 1:2 G0(z)
H1(z) G1(z)
c0
c1
y(n)x(n)
2:1 1:2Analysis Synthesis
25
Filter Pairs
Perfect reconstruction is possible
Subsampled signals c usually bear alias !
Σ
H0(z) 2:1 1:2 G0(z)
H1(z) G1(z)
c0
c1
y(n)x(n)
2:1 1:2
[ ]
[ ] ).()()()()(21
)()()()()(21)(
1100
1100
zXzGzHzGzH
zXzGzHzGzHzY
−⋅⋅−+⋅−+
⋅⋅+⋅=
Analysis Synthesis
=2⋅z-k
=0
26
Biorthogonality Principle
Perfect reconstruction conditions
[ ]
[ ]
0 0 1 1
0 0 1 1
1( ) ( ) ( ) ( ) ( ) ( )21 ( ) ( ) ( ) ( ) ( ).2
Y z H z G z H z G z X z
H z G z H z G z X z
= ⋅ + ⋅ ⋅
+ − ⋅ + − ⋅ ⋅ −
)()(
)()(
01
10
zHzzG
zHzzGk
k
−⋅−=
−⋅=
[ ]0 1 1 01( ) ( ) ( ) ( ) ( ) ( )2
kY z H z H z H z H z X z z= ⋅ − − ⋅ − ⋅ ⋅
0 1( ) ( ) 2 with ( ) ( ) ( )kP z P z z P z H z H z−⇒ − − = ⋅ = ⋅ −
0)()(
)()(
01
10
=−⋅⋅−−
−⋅⋅−⇒
zHzzH
zHzzHk
k
=2⋅z-k
27
Biorthogonality Principle
H0(z)/G1(-z) and H1(z)/G0(-z) constitute orthogonal pairsLow-/Highpass transfer functions not symmetricLinear phase or non-linear phase filters possibleLow-/Highpass impulse responses may have different length
Σ
H0(z) 2:1 1:2 G0(z)
H1(z) G1(z)
c0
c1
y(n)x(n)
2:1 1:2
28
Biorthogonality Principle
A simple biorthogonal filter pair (5/3 integer)
( )(5/3) 2 1 1 20
1( ) 2 6 28
H z z z z z− −= − + ⋅ + + ⋅ −
Σ
H0(z) 2:1 1:2 G0(z)
H1(z) G1(z)
c0
c1
y(n)x(n)
2:1 1:2
( )(5/3) 1 21
1( ) 1 22
H z z z− −= − + ⋅ − ( )(5/ 3) 1 1 2 31
1( ) 2 6 28
G z z z z z− − −= − + + ⋅ + ⋅ −
( )(5/3) 1 10
1( ) 22
G z z z− −= + +
29
Lazy Transform(even/odd sam
pleseparation)
PredictionP1(z)
UpdateU1(z)
+
PredictionPK(z)
UpdateUK(z)
+...
...
x
x
KL
KH
L Out
H Out
In
A
B
..ABAB..
Lifting Filters
Biorthogonal filter pairs can be factorized to be implementable in a "ladder structure""Prediction" and "Update" steps using very short filter kernels are then iteratively performed"Lifting scheme" is most efficient implementation of wavelet filters available so far
30
+ +
Lazy Transform(even/odd sam
plegrouping)
PredictionP1(z)
UpdateU1(z)
PredictionPK(z)
UpdateUK(z)
1/KL
1/KH
L In
H In
Out
A
B..ABAB..
Lifting Filters
Synthesis filter pair is implemented by inverse signal flowPerfect reconstruction is obvious
Quantization of signals in the ladder branches gives integer realization of analysis and synthesis
31
Lifting Filters
Signal flow diagrams of lifting implementations for (5/3) filters and Haar (2/2) filters
−1
1
1
−1
1
1/2
1
1
1
1x(2m')
1/2
1/2
−1
−1/2
−1/2
1
1
1
1/4
1
1
1
1
−1/2
−1/2
−1/2
−1/2
1/4
1/4
1/4
1/4
1/4
x(2m'+1)
c1(m')
c0(m')
x(2m'+3)
x(2m'+2)
x(2m'+5)
x(2m'+4)
x(2m'+6)
c1(m'+1)
c0(m'+1)
c1(m'+2)
c0(m'+2)
c0(m'+3)
x(2m')
x(2m'+1)
c1(m')
c0(m')
x(2m'+3)
x(2m'+2)
x(2m'+5)
x(2m'+4)
x(2m'+6)
c1(m'+1)
c0(m'+1)
c1(m'+2)
c0(m'+2)
c0(m'+3)
a) b)
Motion Compensated Wavelet Coding –basic principles and classification
33
Wavelet Video Coding - Classification
Intraframe coding (e.g. MJPEG)3D wavelet coding without MCHybrid video coding using wavelet-based texture codingIn-Band Motion Compensation PredictionMotion Compensated Temporal FilteringIn-Band Motion Compensated Temporal Filtering
34
History
Using transforms for interframe coding goes back to the 1970/1980s (e.g. Karlsson/Vetterli)Drawback was lack of motion compensation –first approch to filter over motion trajectories proposed by Kronander (1990)Solution avoiding an overcomplete transformdeveloped by Ohm (1991,1994)Solution for perfect reconstruction in case of half-pel motion by Ohm/Rümmler (1997), Hsiang and Woods (1999)Different researchers proposed combination with temporal axis lifting scheme which makesvirtually any MC possible : Pesquet/Bottreau, Luo/Li/Zhang, Secker/Taubman (2001)
35
H
LH
LLL LLH
video sequence
1st temporal level
2nd temporal level
3rd temporal level
Three-dimensional Wavelet
Temporal decomposition of a group of 8 frames(3 levels of wavelet transform)
36
Three-dimensional Wavelet Coding
Extensionof zero treeapproachto temporaldimension
Non-recursive coding structure
Examples:"3D SPIHT" by Pearlman et al.Layered Zero Coding (LZC) by Taubman and Zakhor (only constant displacement motion compensation)
37
Wavelets and Motion Compensation
Motion compensation is keyTo achieve good compression performance To guarantee visual quality – non MC/interframe coding with same SNR usually looks worse
Motion-compensated Wavelet video codingTemporal MC prediction followed by Wavelet TransformWavelet Transform followed by temporal MC prediction inwavelet domain3D Wavelet with MC
38
Hybrid Video Coding using Wavelets
Replacement of DCT by Wavelet for 2D encodingin MC prediction loop
Σ Q
ΣMC
+
-
+
+
DWT
IDWT
C Σ+
MC
+D... IDWT
ME
39
Hybrid Video Coding using Wavelets
Problems and possible solutions:
Wavelet analysis is block-overlapping, discontinuities inmotion vector field cause problems
Overlapping-block MC
Local switching between Intra/inter modes not block-wiseSymmetric extension at block discontinuities
Drift problem in MC loop is not solved
This is not a real scalable solution
40
Motion compensation in the wavelet domain
Multi-resolution nature of wavelet decomposition is ideal for providing spatially scalable video (QCIF, CIF, SD, and HD)Subbands are highly correlated in the temporal direction
Motion estimation and compensation can significantly reduce the temporal correlation
Classical approachMRMC (multi-resolution motion compensation scheme)Ref: Y. Zhang and S. Zafar, IEEE CSVT, Sept. 1992.
41
Multi-Resolution Motion Compensation
42
MC in Wavelet Domain - Encoder
2DDWT
Q
MC
Σ+
-
ΣFS(low)
+
+
Q
MC
Σ+
-
ΣFS(med.)
+
+
Coeff.COD.
MotionEstimation
:2
:2
... ...
MUX
MotionCoding
Q
MC
Σ+
-
ΣFS(high)
+
+
43
MC in Wavelet Domain
Variable block size of the m-th layer subbands for M-level decomposition
Motion vector for each subband (j=1,..,3)
Adaptive search range for each subband
mMmM pp −− × 22
),(2),(),( )(,
)(0,
)(, yxyxVyxV m
jimMm
imji ∆+= −
i: frame number, j: subband index (j=0,…,3), m: layer number
44
MC in Wavelet Domain – Advantages andDrawbacks
Multiple (separate) MC loops for wavelet bandsone set of motion parameters may be used for all
No drift problem in spatial scalabilityPossible to skip higher frequency bands
Switching to "intra" coding mode without penaltyInverse DWT is applied to images (not differences)
Inefficiency of MC prediction in high bandsSignificant performance loss compared to ME/MC in spatial domain (e.g.1-2dB) The shift variant property of wavelet decomposition
Motion-Compensated Temporal Filtering
46
Motion Compensated Haar Filters
Non-orthonormal Haar filter basisMC shift
Delay by one frame( )10 1 2 3
11 1 2 3
1( ) 12
( ) .
k l
k l
H z z z z
H z z z z
−
−
= + ⋅ ⋅
= − ⋅ +
% %
47
Motion Compensated Haar Filters
This motion-compensated filtering is no problem whenever unique sample-wise correspondences exist between two frames
Real motion vector fields are discontinuous, such that correspondences may not be unique
( , ) ( , )
( , ) ( , )( , )
with( , ).
B B A A B A A A
B B A A B A A A
m n m n
m n m n
k k m m k m nl l n n l m n
=
=
− = +⎧⎨− = +⎩
%
%
?
? ?
??
covered/multiple connecteduncovered/unconnected
?
origin of motion trajectory
48
Motion Compensated Haar Filters
Substitution technique for covered/uncovered areas allows perfect reconstruction at motion discontinuities
covered
uncovered
MC prediction fromprevious frame
Insert originalvalues from B
A B H L
?
?
B-1
( )( )
( , ) 0.5 ( , ) 0.5 ( , ), ( , )
( , ) ( , ) ( , ), ( , )
L m n B m n A m k m n n l m n
H m n A m n B m k m n n l m n
= ⋅ + ⋅ + +
= − + +
% %
if 'unconnected'( , ) ( , )ˆ( , ) ( , ) ( , ) if 'multiple connected'
L m n B m n
H m n A m n A m n
=
= −
( )( )1
ˆ 'backward mode'( , ) ( , ), ( , )ˆ 'forward mode'( , ) ( , ), ( , )
A m n B m k m n n l m n
A m n B m k m n n l m n−
= + +
= − −
"unconnected"
"multipleconnected"
49
Motion Compensated Haar Filters
Synthesis is straightforward in case of full-pixel correspondences
( )( )
( , ) ( , ), ( , ) 0.5 ( , )
( , ) ( , ) 0.5 ( , ), ( , ) ,
A m n L m k m n n l m n H m n
B m n L m n H m k m n n l m n
= + + +
= − + +
%
% %%
( , ) ( , ) if 'unconnected'ˆ( , ) ( , ) ( , ) if 'multiple connected'.
B m n L m n
A m n A m n H m n
=
= +
%
%
50
Σ
Σ
:2
L
H
M
2:1
2:1
Motionestimation
Connection &mode switch
analysis
Motioncompensation
I
+
+
-
+
z-1
U/I
Switch positions :U - unconnectedI - intraframeF/B - forward/backward
Motioncompensation
Frame B
Frame A
I
z-1 Frame B-1
F/B
S
Motion Compensated Haar Filters
2-band temporal Haar analysis filterH0
H1
51
L
H
M
x
Σ Σ O1:2
1:2Σ
0.5 1
+
+
+
-
z-1
M/I
M/I*)
U/I
Switch positions :U - unconnectedM - multiple connectedI - intraframeF/B - forward/backward
*) Switch open for I
Frame B
Frame A
Connection &mode switch
control
Motioncompensation
Motioncompensation
:2:2 Motioncompensation
z-1
Frame B-1
F/B
S
Motion Compensated Haar Filters
2-band temporal Haar synthesis filterG1
G0
52
Coding of motion information
IL
HM
IL
HM
IL
HM
IL
HMO
L
H M OL
H M OL
H MO
L
H M
2-D W
avelet decom
position,quantization, encoding
x y
Decoding of motion information
Motion Compensated Temporal Wavelet Tree
Scaling and Wavelet coefficientsfrom temporal analysis (arranged as 2D images)
53
Motion-compensated Lifting Filters
Signal flow diagram
−β
β−1
1
1
−β
β−1
1
(1−β)/2
1
1
1
β/2
1
"A" "B" "H" "L"
B*
A* (1−β)/2
β/2
(1−β)/2
−β
β−1
β/2
B(m,n-1)
B(m,n)
B(m,n+1)
A(m-k,n-l)
A(m-k,n-l+1)
Vertical shiftby pixels
l β+
54
Motion-compensated Lifting Filters
Extensible to longer interpolation filters, e.g. (9/7)
2(1−β)p1
1
1
1
1
1
1
1
"A" "B" "I1" "I2"
B*
A*1
1
1
1
1
"H"
1
1
1
1
1
"L"
1
1
1
1
1
1
2βp1
2(1−β)p1
2βp1
2(1−β)u1
2βu1
2(1−β)p2
2βp2
2(1−β)u2
2βu2
2(1−β)p1
2βp1
2(1−β)p1
2βp1
2(1−β)u1
2βu1
2(1−β)u1
2βu1
2(1−β)u1
2βu1
2(1−β)p2
2βp2
2(1−β)p2
2βp2
2(1−β)p2
2βp2
2(1−β)u2
2βu2
2(1−β)u2
2βu2
2(1−β)u2
2βu2
With β=0.5: Equivalent to the half-pel P.R. method proposed in [Ohm,Rümmler 97]and used in [Hsiang, Woods 99]
55
Motion-compensated Lifting filters
The principle is straightforwardly extensible tolonger wavelet filters separable (or non-separable 2D filters)change of a with any position (e.g. MC based on affine model, dense motion vector fields)
Coincidence of motion correspondences in adjacent prediction and update steps must be observedLifting implementation of temporal wavelet filtering also leads to an elegant interpretation of previous covered/uncovered pixel substitutionVery efficient implementation
56
Motion-compensated Lifting filters
Adaptation at motion boundaries :"uncovered/unconnected" caseAdditional "lazy"pixel(s) in frame B −β1
β1−1
1
1
−β2
β2−1
1
(1−β1)/2
1
1
1
β1/2
1
"A" "B" "H" "L"
B*
A*
(1−β2)/2
β2/2
−β2
β2−1
−β1
1
1
β1/2
(1−β1)/2
β1−1
(1−β2)/2
Motion boundary#
A*
B*
57
Motion-compensated Lifting filters
Adaptation at motion boundaries :"Covered/multiple connected" caseAdditional predictionpixel(s) in frame A/H −β1
β1−1
1
1
−β2β2−1
1
(1−β1)/2
1
1
1
β2/2
1
"A" "B" "H" "L"
B*
A*
−β2β2−1
−β1
1
1
β1/2
(1−β2)/2
β1−1
β2/2
(1−β2)/2
(1−β2)/2
Motion boundary
#
B*
A*
This pixel might alsotake the 'unconnected'role!
58
Motion-compensated Lifting filters
Frame-wise or localized implementation of intra coding is a key concept in MC prediction coders
Switching to intra mode is applied whenever no motion correspondence can be found, e.g. scene changes or uncovered areas
In MC temporal filteringthe equivalent is an adaptation of wavelet tree depthbut intra coding could also be applied individually in the prediction and update steps
In general, localized mode switching can be included in a simple way in the lifting structure
59
More Flexibility in MC Lifting Filters
Different view ofone transform level: Temporal-axis lifting filters, including 2D MC in cross pathsMC and IMCshould be related such that pixels fromA correspondto L
video sequence
MC MC MC
IMC IMC
highpass sequence
lowpass sequence
B A B A B A
IMC
-1
1/2 1/2
1 -11 -11
1/2 1/2 1/2 1/2
H H H
L L L
60
More Flexibility in MC Lifting Filters
Extension to longertemporal filters (5/3)H frames equivalentto bidirectionalpredictionForward/backwardswitching possibleBetter codingefficiencyNo temporalblockingMore memoryHigher delayMore motion vectors
MC MC MC MC MC MC
IMC IMC IMC IMC
B A B A B A B
IMC IMC
MC
IMC
MC
IMC
-1/2 -1/2 -1/2 -1/2 0 -11 1 1
1/4 1/41 1/4 1/41 1/4 01 1/2 1/41
H H H
L L L L
video sequence
highpass sequence
lowpass sequence
switch touni-directional
61
More Flexibility in MC Lifting Filters
Non-dyadicdecompositionTemporal blockunits of 3 framesE.g. 30/10 HztemporalscalabilityCan be extendedto bidirectionalMC in prediction step
video sequence
MC
IMC
highpass sequence
lowpass sequence
B A
-1
1/4
1
H
L
MC
IMC
B
-1 1
1/41/2
H
MC
IMC
B A
-1
1/4
1
H
L
MC
IMC
B
-1 1
1/41/2
H
62
Low-Delay modes in MC Temporal Filtering
Frame 1 Frame 2 Frame 3 Frame 4
Leave as original
A AH H
Leave as original
A A A H
Level 1
Level 2Filter A frames fromprevious level
Temporal pyramid decomposition with omission of update step ("A" frames left as originals)
63
Low-Delay modes in MC Temporal Filtering
Frame 1 Frame 2 Frame 3 Frame 4
A HH A
Modified MCTF Scheme
Leave as original Leave as original
"A" frames can be inserted at arbitrary locations ->
the sequence can be decoded at non-dyadic lower frame rates
64
Low-Delay modes in MC Temporal Filtering
A frames allow implementation of a low-delay mode
A frames can be encoded and transmitted immediately, but must be stored for future referenceH frames can be encoded and transmitted immediately in any of the schemes
Disadvantage : lower coding efficiencyMay be compensated by improved prediction
65
Low-Delay modes in MC Temporal Filtering
Frame 1 Frame 2 Frame 3 Frame 4
A HH A
Leave as original Leave as original
Inclusion of bi-directional prediction
Choice between 3 modes:
- Use backward prediction
- Use forward prediction- Use the average block of the backward and forward predictions while filtering
66
Low-Delay modes in MC Temporal Filtering
Prediction step can be enriched by selecting multiple reference frames
Reference Frame 1 Reference Frame 2 Current Frame
AdvantagesImproved coding efficiencyEasy to incorporate the advanced MC & ME options used by predictive coders (H.264/AVC, MPEG-4 etc.)Reduced no. of unconnected pixels
DisadvantagesSacrifice Temporal ScalabilityPrediction drift can become a problem when decoding at lower bit-rates
67
Frame 1 Frame 2 Frame 3 Frame 4
Frame 1 Frame 2 Frame 3 Frame 4
Frame 1 Frame 2 Frame 3 Frame 4
Frame 1 Frame 2 Frame 3 Frame 4
A A A A
A A A A
AHAH Scheme
Bi-directionalAHAH Scheme
AHHA Scheme
Bi-directionalAHHA Scheme
Low-Delay modes in MC Temporal Filtering
Different configurations at any level of pyramid
OvercompleteMotion Compensated Wavelet Coding
•Shift-variant property of wavelets•Frame theory - overcomplete wavelets•Low band shifting method•Inband motion compensated temporal filtering•Simulation results
69
The Shift Variance Property of Wavelets
Haar filter output of step edgesHaar - DWT
L 1
L 1
H 1
H 1
Signal
Signal shiftedby one pixel
Low pass channel :prediction by linear interpolation
High pass channel :no prediction possible !
70
The Shift Variance Property of Wavelets
71
The Shift Variance Property of Wavelets
Suppose By substituting
Hence,
( ) 2/0
)2/(11 1
2221)()( ωπω πωπωωω jvjvj eeXHeXY −−− −⎟
⎠⎞
⎜⎝⎛ +⎟
⎠⎞
⎜⎝⎛ +=−
Aliasing components (zero only when v = even)
72
Optimal Aliasing Reduction Filter Approach
In order to minimize the aliasing in wavelet domain ME/MC (X.Yang, K. Ramchandran, IEEE-TIP, May, 2000)
L : aliasing reduction filter
73
Optimal Aliasing Reduction Filter Approach
74
Optimal Aliasing Reduction Filter Approach
Still not efficient as motion estimation in spatial domain
Any ultimate solution ? Shift invariant Overcomplete Wavelets
75
Frame Theory – Overcomplete Wavelets
Properties of redundant frameNoise reductionMore sparse representation matching pursuitRedundant representation multiple description codingShift invariant property
Improved motion estimation/compensation in wavelet domainOnly motion references need to be overcompleteTexture coding is still in complete wavelet domain
76
Haar-DWT
L1
L1
H1
H1
Haar-ODWT Signal
Shift Invariance of Overcomplete Wavelets
Overcomplete representation without downsampling
Signal shiftedby one pixel
Low pass channel :prediction by linear interpolation
High pass channel :no prediction possible !
Prediction possiblein any case !
77
Low-Band-Shift Method
• Optimal way of generating overcomplete wavelet coefficients for every shift
78
(0,0) (2,0) (1,0) (3,0)
(0,2) (2,2) (1,2) (3,2)
(0,1) (2,1) (1,1) (3,1)
(0,3) (2,3) (1,3) (3,3)
Low Band Shift Method for 2-D
Originalimage
(0,0) (1,0)
(0,1) (1,1)
(0,0) (1,0)
(0,1) (1,1)
(0,0) (1,0)
(0,1) (1,1)
(0,0) (1,0)
(0,1) (1,1)
LL HL LH HH
(0,0) (2,0) (1,0) (3,0)
(0,2) (2,2) (1,2) (3,2)
(0,1) (2,1) (1,1) (3,1)
(0,3) (2,3) (1,3) (3,3)
(0,0) (2,0) (1,0) (3,0)
(0,2) (2,2) (1,2) (3,2)
(0,1) (2,1) (1,1) (3,1)
(0,3) (2,3) (1,3) (3,3)
(0,0) (2,0) (1,0) (3,0)
(0,2) (2,2) (1,2) (3,2)
(0,1) (2,1) (1,1) (3,1)
(0,3) (2,3) (1,3) (3,3)
LLLL LLHL LLHHLLLH
(x,y): shift in (x,y) pixels in original image
# of reference frames= 3n+1
: bands used for complete wavelet expansion
79
Conventional Wavelet Transform
Original image
LL HL LH HH
LLLL LLHL LLLH LLHH
80
Overcomplete Wavelet Transform by Low-Band Shift Method
LL
Original frame
LL HL LH HH
LLLL LLHL LLLH LLHH
81
Overcomplete Wavelet MC Coding - Coder
Q
MC
Σ+
-
ΣFS(low)
+
+
Q
MC
Σ+
-
ΣFS
(med.)
+
+
IDWTODWT
Q
MC
Σ+
-
ΣFS
(high)
+
+
IDWTODWT
Coeff.COD.
MotionEstimation
:2
:2
... ...
MUX
MotionCoding
2DDWT
82
Overcomplete Wavelet MC Coding - Decoder
Coeff.Decoder
MC
Σ+
+
FS(low)
MC
Σ+
+FS
(med.)IDWTODWT
MotionDecoder
:2
:2
...
MC
Σ+
+FS
(high)IDWTODWT
Low resolutionreconstruction
Medium resolutionreconstruction
High resolutionreconstruction
DMUX
...
83
Overcomplete Wavelet MC Coding
ODWT is Wavelet without subsamplingMore samples than original, like Pyramid representation
Allows Wavelet domain MC for high frequency componentssignal does not bear frequency-inversion alias component
Still only necessary to encode critically sampled coefficientsOvercomplete transform coefficients can be generated locally within the decoder
Still does not resolve the drift in SNR scalabilitymay be solved by multiple loops in each wavelet band
Solution: In-Band MCTF (IBMCTF)
84
Spatial-Domain MCTF (SDMCTF)
EC
DWT
SBC
T R A N S M I S S I O N
Video METemporal Filtering
MV and Ref. Frame No.
MCTF
Current frame
MVCDWT: Discrete Wavelet Transform SBC: Sub-Band Coder EC: Entropy Coder ME: Motion Estimation MVC: Motion Vector Coder
85
In-band MCTF (IBMCTF)
EC
DWT
SBC
TRANSMI SSI ON
Video
METemporal Filtering
MV and Ref. Frame No.
MCTF
CODWTCurrent frame
MVC
DWT: Discrete Wavelet Transform SBC: Sub-Band Coder EC: Entropy Coder CODWT: Complete to Overcomplete DWT ME: Motion Estimation MVC: Motion Vector Coder
86
IBMCTF: concept
temp
hor
ver
87
IBMCTF Wavelet Video
temp
hor
ver
For efficient IBMCTF, ME should be performed in overcomplete wavelet domain
88
3-D decomposition structure
temp
ver
hor
temp
hor
ver
SD- MCTF Inband MCTF
89
Block diagram of IBMCTF coder
Wavelet transform
Input Video
TextureCoding
Motion Estimation
Temporal Filtering
MV and Ref.Frame No.
IBMCTF 1
Break into GOFsBand 1
Band 2
Band N
….. Break into GOFs
Break into GOFs
TextureCoding
Motion Estimation
Temporal Filtering
MV and Ref.Frame No.
IBMCTF 2
TextureCoding
Motion Estimation
Temporal Filtering
MV and Ref.Frame No.
IBMCTF N
Bitstream
90
IBMCTF coding
Allows Wavelet domain MC using shift-invariant overcomplete waveltes by Low-Band Shift methodStill only necessary to encode critically sampled coefficients Advantages in spatial scalabilityResolve the drift in SNR scalabilityAdaptive processing for each subband
Different ME accuracy, interpolation filter, temporal filtertaps, etc.
Very general framework which can be combined with other existing techniques (intra mode, UMCTF, etc)
91
Results
Foreman, 300 fs, full-pel ME/MC, 30 fps, CIF
30
31
32
33
34
35
36
500 600 700 800 900 1000 1100 1200 1300 1400 1500bitrate (Kbps)
PSN
R Y
(dB
)
IBMCTF, level-by-level CODWTSDMCTFIBMCTF, full CODWT
92
Results
Foreman, 300 fs, full-pel ME/MC, 30 fps, CIF
30
31
32
33
34
35
36
500 600 700 800 900 1000 1100 1200 1300 1400 1500bitrate (Kbps)
PSN
R Y
(dB
)IBMCTF, level-by-level CODWTSDMCTFIBMCTF, full CODWT
Foreman, 300 fs, full-pel ME/MC, 15 fps, QCIF
31
32
33
34
35
36
37
38
39
150 200 250 300 350 400 450 500bitrate (Kbps)
PSN
R Y
(dB
)
IBMCTF, level-by-level CODWTSDMCTFIBMCTF, full CODWT
93
Results
Foreman, 300 fs, full-pel ME/MC, 7.5 fps, Q-QCIF
36373839404142434445464748
100 110 120 130 140 150 160 170bitrate (Kbps)
PSN
R Y
(dB
)
IBMCTF, level-by-level CODWTSDMCTFIBMCTF, full CODWT
94
Generation of Wavelet Blocks
Wavelet block provides a direct association between the wavelet coefficients and what they represent spatially
ME is done based on wavelet block No motion vector overhead because the number of the motion vector to be coded is the same as that of SDMCTFPerfectly aligned with tree structure entropy coder
Entropy based motion estimation criterion !!
95
Proposed interleaving of overcomplete wavelet coefficients
Coef. for shift=0
Coef. for shift=1
Interleaved coef.
96
Overcomplete Wavelet Transform with Interleaving
L HL LH HH
LLL LLH LLL LLHH
Original frame
97
Advantages of Interleaving
Interleaving algorithm enables optimal sub-pixel accuracy motion estimation and compensation in IBMCTFBy interleaving, any existing ME module (HVSBM, FSBM, Intra Mode, etc) with any fractional pelaccuracy can be usedCan be easily used for MCTF framework with any fractional pel accuracy using lifting structure
98
3-D Lifting Structure for IBMCTF
Direct extension of SD-MCTF lifting to IBMCTF:
[ ] [ ] [ ]( ) 3,...,0,2)(),(~,, =−−−= indnmdmAnmBnmH ij
ij
ij
ij
ij
Interpolation operation for frame is not optimal (no cross-phase dependencies incorporated)
ijA
99
3-D Lifting Structure for IBMCTF
[ ] [ ] [ ]( ) 3,...,0,22,2~_,, =−−−= idndmALBSnmBnmH nj
mji
jij
ij
[ ] [ ][ ] 3,..,0)(),(2
,2~_)(),(
=−−+
+−+−=−−
indnmdmA
ddnddmHLBSndnmdmLij
ij
ij
nnmmji
jij
ij
ij
21B
11B
31B
02B 2
2B12B 3
2B
100
Results
"Foreman"300 frames, 30fps, CIF
26
28
30
32
34
36
38
40
42
0 500000 1000000 1500000 2000000 2500000
bps
PSN
R (d
B) SDMCTF (1/8)
IBMCTF (1/8)
SDMCTF
IBMCTF
101
Overcomplete wavelet coding using standard-compliant DCT base-layers
EC
SBC
T
R
A
N
S
M
I
S
S
I
O
N
Video
METemporal Filtering
MV and Ref.Frame No.
MCTF
CODWTHigh -frequency
bands
MVC
DWT : Discrete Wavelet Transform SBC : Sub - Band CoderEC : Entropy Coder CODWT : Complete to Overcomplete DWTME : Motion EstimationMVC : Motion Vector Coder
Low -frequency band
MPEG -compliant coding/decoding
Residual information
DWT -
decoded pictures
Proposed by Andreopoulos, van der Schaar, et al – ICIP 2003
102
Results
Current Status in MPEG Standardization
104
MPEG's Scalable Coding History
Development of scalable video coding solutions has a long history in MPEG, starting from MPEG-2
Spatial, temporal and SNR scalability with at most 3 levelsMPEG-4 Fine granularity scalability
So far, all standardized solutions have shown deficiency in coding performance which is mainly due to recursive MC structure
Drift occurs when not all information is available Drift-free structures are less coding efficient
105
MPEG's Interframe Wavelet Coding Exploration
New embedded wavelet solutions were proposed In theDigital Cinema Call for Proposals and in the Call for Proposals on improved coding efficiency(both due July 2001) At Pattaya meeting (Dec. 2001), MPEG started an Adhoc Group to explore Interframe Wavelet CodingDifferent methods were investigated
MC prediction with intraframe (2D) waveletIn-band MC prediction based on overcomplete 2D wavelet decomposition3D (spatio/temporal) wavelet coding based on MCTF
3D (t+2D, 2D+t) showed most promising, providing excellent coding efficiency while being fully scalable in temporal, spatial and quality resolutionExperimental software was used
106
MPEG's Interframe Wavelet CodingExploration
The Interframe Wavelet exploration was successfully completed in October 20029 Call for Evidence on Scalable Coding Advances - July 200324 Call for Proposal Responses – Mach 2004
107
Some less good results (out of 10 sequences)
SNR Results from MPEG's Intraframe Wavelet Coding Exploration
3.5 dB
1.45 dB
1.75 dB
0.75 dB
AVC 1
AVC 2
MCTF
AVC 1
AVC 2MCTF
108
Some more good results (out of 10 sequences)
SNR Results from MPEG's Intraframe Wavelet Coding Exploration
3.6 dB
0.01 dB
1.42 dB
0.4 dBAVC 1
AVC 2
MCTF
AVC 1
AVC 2
MCTF
109
Acknowledgements
Jens OhmYiannis AndreopoulosJong YeKonstantin HankeClaudia MayerThomas Rusert