July 2008 ENSC 820 - Simon Fraser University 1 Scalable Video Coding with Wavelet-Based Approaches Presenter: Mahin Torki
Feb 04, 2016
July 2008 ENSC 820 - Simon Fraser University 1
Scalable Video Coding with Wavelet-Based Approaches
Presenter: Mahin Torki
July 2008 ENSC 820 - Simon Fraser University 2
Paper Title: “State-of-the-Art and Trends in Scalable Video Compression With Wavelet-Based Approaches”
Authors: Nicola Adami, Alberto Signoroni, Ricardo Leonardi
IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 9, September 2007
July 2008 ENSC 820 - Simon Fraser University 3
Outline
Motivation Wavelet SVC (WSVC) Fundamentals Coding Architectures for WSVC Systems WSVC Reference Platform in MPEG Comparison between WSVC and SVC Conclusion
July 2008 ENSC 820 - Simon Fraser University 4
Motivation
Several working points corresponding to different quality, picture size and frame rate in a unique bit stream
Two types of SVC systems: Hybrid schemes (used in all MPEG-x or H.26x
standards) Spatio-temporal wavelet technologies
Main difference of SVC and transcoding systems Low complexity Do not require coding/decoding operations Simple parsing operation on the coded bitstream
July 2008 ENSC 820 - Simon Fraser University 5
Motivation
Encodeonce
Decode according torequired QoS or
available hardware resources.
July 2008 ENSC 820 - Simon Fraser University 6
A Typical SVC System
July 2008 ENSC 820 - Simon Fraser University 7
A possible structure of an SVC bitstream
July 2008 ENSC 820 - Simon Fraser University 8
Extracting a scaled bitstream
July 2008 ENSC 820 - Simon Fraser University 9
Tools Enabling Scalability A multi-resolution signal decomposition inherently enables a low to high resolution scalability by representing the signal in transformed domain
July 2008 ENSC 820 - Simon Fraser University 10
Tools Enabling Scalability
Inter-Scale Prediction (ISP) The simplest way to represent a signal with two
resolutions The signal x can be seen as a coarse resolution c and a
detailed signal Not critically sampled
Laplacian Pyramid An iterated version of ISP Results in a coarsest resolution signal c and a set of
details
~
d
nlld ,...,1),(~
July 2008 ENSC 820 - Simon Fraser University 11
Laplacian Pyramid
July 2008 ENSC 820 - Simon Fraser University 12
Spatial Scalability
Discrete Wavelet Transform (DWT) Projects the signal in a set of multi-resolution
(MR) subspaces Critically sampled Generates a coarse signal and a set of details
For multi-dimensional signals like images Separable pyramidal and DWT decompositions
Separate filtering on rows and columns
July 2008 ENSC 820 - Simon Fraser University 13
DWT Filter Bank
Implementing DWT by a two-channel filter bank iterated on a dyadic tree path
July 2008 ENSC 820 - Simon Fraser University 14
2D-DWT Transform
2D Wavelet decomposition inherently provides spatial scalability
Bit-planeCoder
July 2008 ENSC 820 - Simon Fraser University 15
Spatial Scalability
Lifting scheme Alternative spatial domain processing
introduced by Sweldens Generates a critically sampled (c,d)
representation of the signal x
July 2008 ENSC 820 - Simon Fraser University 16
Lifting Scheme
Signal x is split in two polyphase components, even and odd samples(each one half the original resolution)
Two components are correlated A prediction can be performed The subsampled signal could contain a lot of
aliased components, so, it should be updated Perfect reconstruction is guaranteed Every DWT can be factorized in a chain of lifting steps Has a fundamental role in MC Temporal Filtering
(MCTF)
ix2
July 2008 ENSC 820 - Simon Fraser University 17
Temporal Scalability
Motion Compensating Temporal Filter (MCTF) A key tool enabling temporal scalability while
exploiting temporal correlation
July 2008 ENSC 820 - Simon Fraser University 18
MCTF implementation by Lifting steps
Index i has now a temporal meaning P and U can be guided by motion information
July 2008 ENSC 820 - Simon Fraser University 19
ME/MC implemented according to a certain motion model
ME/MC usually generate a set of motion vector fields mv(l,k)
mv(l,k) is estimation of the trajectory of the blocks between the temporal frames, at spatial level l, involved in the kth MCTF temporal decomposition level
With lifting structure, non-dyadic temporal decomposition is possible Temporal scalability factors different from a power of two
MCTF implementation by Lifting steps
July 2008 ENSC 820 - Simon Fraser University 20
Some benefits of MCTF
By exploiting local adaptability of P and U operators and using mv(l,k) information, MCTF can handle: Handle occlusion and uncovered area problems Blocking effects can be reduced by considering
adjacent blocks When fractional pixel MVs are provided, the lifting
structure can be modified to implement the necessary pixel interpolation
July 2008 ENSC 820 - Simon Fraser University 21
MCTF
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
H2L2 H2
L2 H2L2
H3L3 H3
L1 H1L1 H1
L1 H1L1 H1
L1 H1L1 H1
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
H2H2L2L2 H2H2
L2L2 H2H2L2L2
H3H3L3L3 H3H3
L1L1 H1H1L1L1 H1H1
L1L1 H1H1L1L1 H1H1
L1L1 H1H1L1L1 H1H1
July 2008 ENSC 820 - Simon Fraser University 22
Hybrid temporal and spatial scalability
H
LH
LLL LLH
video sequence
1st temporal level
2nd temporal level
3rd temporal level
July 2008 ENSC 820 - Simon Fraser University 23
Quality Scalability
Wavelet-based image compression schemes, provide high R-D performance with limited computational complexity
They do not interfere with spatial scalability requirements High degree of quality scalability
Truncating the coded bitstream at arbitrary points Most techniques are inspired from zero tree idea
Embedded Zero Tree Wavelet (EZTW) by Shapiro SPIHT, reformulated EZTW by Said and Pearlman Embedded Zero Block Coding (EZBC), with higher performance
Embedded Block Coding with Optimized Truncation (EBCOT) Do not use zero tree idea Adopted in JPEG2000 Combines layered block coding, block-based R-D optimizations,
and Context-based arithmetic coding Good scalability and high coding efficiency
July 2008 ENSC 820 - Simon Fraser University 24
WSVC Notation
xS(n) (xT(m)): the original signal undergoes an n-level (m-level) multi-resolutional spatial (temporal) Transform S(n) (T(n))
The spatially transformed signal consist of the subband set:
is the decoded version of the original signal x, at given temporal resolution k and spatial resolution l at reduced quality rate
},...,,{ )1()(
)()()()(
dnS
ndnS
cnSnS xxxx
xlk ˆ
July 2008 ENSC 820 - Simon Fraser University 25
Basic WSVC Architectures
T+2D 2D+T Adaptive Architectures Multiscale Pyramids
July 2008 ENSC 820 - Simon Fraser University 26
Basic WSVC Architectures
T+2D Temporal transform is applied before spatial Guarantees critically sampled subbands Low spatial scalability performance Full resolution motion vectors
July 2008 ENSC 820 - Simon Fraser University 27
Basic WSVC Architectures 2D+T
Spatial transform is applied before temporal Often called In-band MCTF (IBMCTF)
Estimation of mv(l,k) is made independently on each spatial level Leading to a structurally scalable motion representation Spatial and temporal scalability are more decoupled
Lower coding efficiency especially at higher temporal resolutions
July 2008 ENSC 820 - Simon Fraser University 28
Adaptive Architectures Combine the positive aspects of T+2D and 2D+T structures Adaptive spatio-temporal decompositions optimized with
respect to suitable criteria Content-adaptive 2D+T versus T+2D improves coding
performance Multiscale Pyramids
Also called 2D+T+2D Compensates the T+2D versus 2D+T drawbacks Uses ISP to exploit the multiscale representation
redundancy Disadvantage: over-complete transforms, which result in a
full size residual image
Basic WSVC Architectures
July 2008 ENSC 820 - Simon Fraser University 29
Pyramidal WSVC with pyramidal decomposition before MCTF
July 2008 ENSC 820 - Simon Fraser University 30
Pyramidal WSVC with pyramidal decomposition after MCTF
July 2008 ENSC 820 - Simon Fraser University 31
Spatio-Temporal prediction (STP)-Tool Scheme Promising WSVC architecture which presents
some similarities to the SVC standard Adopted as a possible configuration of the
MPEG VidWav (Video Wavelet) reference software
Based on a multiscale pyramid but differs in the ISP mechanism
July 2008 ENSC 820 - Simon Fraser University 32
STP-Tool Scheme
July 2008 ENSC 820 - Simon Fraser University 33
Advantages of STP-Tool Scheme
Prediction is performed between two signals which are likely to bear similar pattern in the spatio-temporal domain
No need to perform any interpolation Instead of full resolution residuals, the spatio-
temporal subbands and residues are produced for different resolutions
July 2008 ENSC 820 - Simon Fraser University 34
WSVC Reference Platform in MPEG
In 2004, the ISO/MPEG set up a formal evaluation of SVC
Performance of H.264/AVC pyramid appeared the most competitive
Later, MPEG and IEC/ITU-T jointly adopted JSVM (Joint Scalable Video Coding) As scalable reference model and software platform
Microsoft Research Asia (MRA) was selected as the reference for wavelet technologies
The MPEG WSVC reference model and software (RM/RS) is indicated as VidWav (Video Wavelet)
July 2008 ENSC 820 - Simon Fraser University 35
VidWav: General framework
July 2008 ENSC 820 - Simon Fraser University 36
VidWav: Main modules
Spatial Transform with pre- and post-spatial decomposition, different SVC
configurations (T+2D, 2D+T, STP-Tool) can be implemented.
Temporal Transform Framewise MC wavelet transform on a lifting structure
ME and Coding MB-based motion model with H.264/AVC like partition patterns Forward, backward or bidirectional motion model for each block
Entropy coding 3D extension of the EBCOT algorithm is used for entropy coding of
the resulted coeficients
July 2008 ENSC 820 - Simon Fraser University 37
VidWav STP-Tool Configuration
July 2008 ENSC 820 - Simon Fraser University 38
Comparison between WSVC and SVC
Single layer coding tools
Scalable coding tools
July 2008 ENSC 820 - Simon Fraser University 39
Comparison between WSVC and SVC
Single layer coding tools VidWav uses a block-based motion model Block mode types are similar to JSVM but no Intra-mode is
supported by VidWav JSVM operates in a local manner
Divides frames into MB and treats MB separately in all coding phases
VidWav operates with a global approach Spatio-temporal transform applied to a group of frames
Unlike JSVM, single layer VidWav only supports open loop encoding/decoding
In-loop deblocking filter in JSVM due to closed loop encoding
July 2008 ENSC 820 - Simon Fraser University 40
Comparison between WSVC and SVC
Scalable coding tools Spatial scalability in JSVM compared to VidWav in
STP-Tool configuration Block-based versus frame-based Similar to JSVC, STP-Tool can use both closed and
open loop inter layer encoding
July 2008 ENSC 820 - Simon Fraser University 41
Objective and Visual Result Comparisons
Fair objective comparison is impaired due to Visually, the ref. seq. generated by wavelet
filters are more detailed, but sometimes have spatial aliasing effects due to different down sampling filters
Depending on the spatial down-sampling filter used, reduced spatial resolution decoded seq. differ even at full quality
PSNR is used as the performance criterion at intermediate spatio-temporal resolution levels
July 2008 ENSC 820 - Simon Fraser University 42
Objective Comparison Results
July 2008 ENSC 820 - Simon Fraser University 43
Subjective Comparison Results
Visual tests conducted by ISO/MPEG included 12 expert viewers On average JSVM 4.0 is superior Marginal gains in SNR conditions Superior gains in combined scalability settings
July 2008 ENSC 820 - Simon Fraser University 44
Applications of WSVC
Based on a series of experiments: DCT-based technologies outperform wavelet-
based ones for relatively smooth signals and vice versa
Eligible applications for WSVC are those that produce or use High Definition/High Resolution content
July 2008 ENSC 820 - Simon Fraser University 45
Home distribution of HD video using WSVC
July 2008 ENSC 820 - Simon Fraser University 46
New Application Potentials for WSVC
HD material storage and distribution Use nondyadic wavelet decomposition to
support multiple HD formats to be used in video surveillance and mobile video
efficient similarity search in large video databases
Multiple descriptions coding Space variant resolution adaptive decoding
Only a certain region of the image is decoded at high resolution
July 2008 ENSC 820 - Simon Fraser University 47
Conclusion
Brief review of different tools used in WSVC WSVC architectures are introduced Comparison of WSVC with SVC Potential applications for WSVC
July 2008 ENSC 820 - Simon Fraser University 48
Any questions?
Thank you!