Research and activity report

Activity & research report

Marco Cagnazzo

Paris, September 2013

Overview

• Activity report – Teaching and PhD supervision – Projects and other activities – Bibliometrics

• Research themes – Video coding optimization

• Motion representation • 3D video coding

– Adaptive image compression – Distributed video coding

• Multiview DVC • SI effectiveness evaluation

– Robust video streaming • Streaming protocols • Network coding

• Conclusions

2

Timeline

• 2002-2005 : PhD @ University of Naples & University of Nice-Sophia Antipolis (cotutelle)

• 2005-2006 : Post-doc @ National Multimedia Lab (Naples) & Assistant professor @ University of Naples

• 2006-2008 : Post-doc @ I3S Lab, Sophia Antipolis

• Since February 2008: Maître de conférences in Digital Video @ Telecom-ParisTech

3

2002 2004 2006 2008 2010 2012 2014

Teaching

4

Name Institution Years

Information Theory “Parthenope” University of Naples

2004-2005 Responsible

Multimedia signal processing

“Federico II” University of Naples

2005-2007 Responsible

Compression techniques

Telecom-ParisTech 2008- … Co-responsible

Digital video and multimedia

Telecom-ParisTech 2009- … Responsible

Digital television Telecom-ParisTech 2009- … Co-responsible, CE

Video over mobile Telecom-ParisTech 2009- … Co-responsible, CE

3D Video Telecom-ParisTech 2010- … Co-responsible, CE

Teaching

• Collaborative Learning Thematic Project • Tools and applications for signals, images and sound • Image processing and analysis • Advanced methods for image processing • Computer vision • Web Mining • Introduction to image processing (ATHENS) • Multimedia Indexing and Retrieval (ATHENS) • Short and long student projects (“projet libres” and “stages”) • Image and video compression • Video over IP • Signal and image processing • Wavelet and signal processing Total: ≈1100 hours (heures équivalentes TD)

5

PhD students

Name Years Subject

Marwa Meddeb 2013 - Video-conference with HEVC

Marco Calemme 2012 - 3D Video and Depth coding

Aniello Fiengo 2012 - Rate allocation for video

Giovanni Chierchia 2011 - Convex optimization

Elie Gabriel Mora 2011 - 3D Video Compression

Giovanni Petrazzuoli 2009 - 2013 DVC and IMVS

Abdel-Bassir Abou El Ailah 2009 - 2012 DVC and FRI signals

Claudio Greco 2008 - 2012 Robust video streaming

Thomas Maugey 2007 - 2010 Multiview DVC

In addition to a dozen of MSc students supervision

6

Research projects

Name Period Subject

LABNET 2001-2002 Low-complexity video coding

CNRAED 2004-2005 Hyper-spectral image coding

CPRE 46 04 06 11 2006-2007 Region based motion vector coding

Secure Media SIM 2007-2008 Secure video coding over SIM card

AIBER 2008 Wavelet-based scalable video coding

DIVINE 2007-2009 Robust video coding

DITEMOI 2007-2010 Video streaming over wireless networks (*)

PERSEE 2009-2013 Perceptual 2D and 3D video coding (*)

SWAN 2011-2013 Network coding

SURICATE Approved Video protection

WOW Submitted Interactive 3D streaming (**)

(*) Responsible for Telecom-ParisTech (**) Project coordinator Moreover: smaller contributions to ACDC, Pingo, Sebastian 2, NeVEx

7

Other responsibilities

• 8 PhD Thesis committees (4 as examiner, 4 as co-supervisor) • Area editor for 2 Elsevier journals (SPIC, SIGPRO) • Reviewer for main journals and conferences in the field • Participation to conference organization (Organizing committees of

MMSP’10, EUVIP’11, EUSIPCO’12, ICIP’14) • Special session co-organization (EUSIPCO’10, DSP’11, WIAMIS’13,

ASILOMAR’13) • Correspondant académique between Telecom-ParisTech and the

University of Naples • Yearly Erasmus lessons at University of Naples • Invited lesson at the Winter Doctoral School, University of Naples

(2010) • IEEE Senior Member (‘11), IEEE SPS Member, EURASIP Member

8

Bibliometrics • 15 journal papers: 13 published, 2 to appear

– One paper selected as “High quality paper” by the IEEE MMTC-R Letter board, and included in the January 2013 issue

• 4 submitted journal papers: 2 in first round; 2 in preparation for the second round

• 3 journal papers in preparation • 59 conference papers: 56 published and 3 to appear

– Two MMSP Top 10% awards

• One standardization contribution • One co-edited book

– F. Dufaux, B. Pesquet-Popescu, M Cagnazzo (eds.): Emerging Technologies for 3D Video. Wiley, 2013

• 9 book chapters: 3 published and 6 to appear

• According to the Google Scholar web site, my H-index is equal to 13 (update: August 31, 2013)

9

VIDEO CODING OPTIMIZATION

1 Standardization contribution

8 Conference papers

1 Submitted journal paper

4 Journal papers

Motion vector representation

• Quantization of motion vectors to reduce their coding cost

• Motion vector refinement and dense motion vector representation generated at the decoder

• Lossless coding of segmented motion fields

• Motion estimation for wavelet-based video coding

11

MC

Motion vector quantization

ME

DCT IDCT

Frame Buffer

Q

𝜆

𝒗∗

Frame Buffer

MC 𝐵

𝜃 𝑄𝑝

𝒗∗

𝐵 (𝑄𝑝)

• M. Cagnazzo, M. Agostini, M. Antonini, G. Laroche, and J. Jung, “Motion vector quantization for efficient low-bitrate video coding,” in SPIE Visual Communications and Image Processing Conference, vol. 7257, (San Jose, California), 2009.

• S. Corrado, M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “Improving H.264 performances by quantization of motion vectors,” in Picture Coding Symposium, (Chicago, IL), 2009.

• M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “A new coding mode for hybrid video coders based on quantized motion vectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, pp. 946–956, July 2011. 12

Decoder Encoder

MC

Motion vector quantization

ME

DCT IDCT

Frame Buffer

Q

𝜆

𝒗∗

Q

Frame Buffer

MC 𝐵

𝜃 𝑄𝑝

𝒗 (𝑄𝑣)

𝑄𝑣

𝐵 (𝑄𝑝, 𝑄𝑣)

13

• M. Cagnazzo, M. Agostini, M. Antonini, G. Laroche, and J. Jung, “Motion vector quantization for efficient low-bitrate video coding,” in SPIE Visual Communications and Image Processing Conference, vol. 7257, (San Jose, California), 2009.

• S. Corrado, M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “Improving H.264 performances by quantization of motion vectors,” in Picture Coding Symposium, (Chicago, IL), 2009.

• M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “A new coding mode for hybrid video coders based on quantized motion vectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, pp. 946–956, July 2011.

Decoder Encoder

Quantization step for motion vectors

• Double-pass approach – Estimation of the best step over a frame

– Actual encoding with the selected step

• Estimation: – Sum of distortions

– Oracle (used as reference)

• Results: average rate reduction ≈ 4% with respect to H.264 and ≈ 8% with respect to H.264 1/8-pel

NB: All rate reductions for video are measured using the Bjontegaard metric (approximated average rate reduction for the same PSNR over a given interval)

14

Differential techniques for ME

BMA ME

Hybrid Coder H.264

Stream

Residual

MVs

Differential MV

refinement

Input video

Side Info

Enhancement Layer

Residual

Hybrid Coder

MVs

M. Cagnazzo and B. Pesquet-Popescu, “Introducing differential motion estimation into hybrid video coders,” in SPIE Visual Communications and Image Processing Conference, vol. 1, (Huang Shan, An Hui, China), pp. 1–4, 2010. 15

Differential ME in hybrid video coding

• Layered representation of video

• Base layer compatible with any hybrid technique

• Enhancement layer uses costless refined vectors

𝛿𝐯 𝑛,𝑚 =−𝑒𝑛,𝑚

𝜆 + 𝝓𝑛,𝑚2 𝝓𝑛,𝑚

• The refinement depends on the motion compensated error image 𝑒 and on the motion compensated reference image gradient 𝝓

• Proof of principle, small improvements (up to almost 1% rate reduction)

16

Context quantization

• Target: exploit high-order statistical dependencies in segmented motion fields to reduce the coding rate (lossless coding)

• Tool: context-based lossless encoder – Implemented with an arithmetic coder

• Problem: high-order dependencies large context context dilution – I.e. too many contexts, difficult to estimate conditional

probabilities

• Solution: context quantization

• M. Cagnazzo, M. Antonini, and M. Barlaud, “Mutual information-based context quantization,” Signal Proc.: Image Comm. (Elsevier Science), pp. 64–74, Jan. 2010. 17


• Contexts (i.e. sequences of already encoded symbols) are grouped into classes

• Rate increase: the average information loss of including a context into a class

ℒ 𝑓 = 𝑝 𝑥 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝 𝑌 𝑓 𝑥

𝑥∈𝒳

𝑥: generic context

𝑌: symbol to encode

𝑓: context quantization function, i.e. context label

18


• Problem: finding optimum 𝑓

• Classical approach

– Start with a set of classes

– Move a context from a class 𝑐𝑖 to a class 𝑐𝑗 as far as the

relative entropy 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝(𝑌|𝑐𝑖) is larger than

𝐷 𝑝 𝑌 𝑥 ∥ 𝑝(𝑌|𝑐𝑗)

– Stopping criterion on the relative improvement of the objective function ℒ(𝑓) or on the number of iterations

19


• Classical approach

– Intuitive, very popular, good results

– Some open questions: • Does the basic step actually reduce the cost function at each

iteration?

• Is it the largest possible reduction?

• If not, what is the largest possible reduction, and can we achieve it?

• Contribution: answers to these questions

20


• We found the expression of the cost function variation Δℒ associated to the displacement of a context from a class to another

• We proved that with the classical approach, each iteration actually reduces the cost function…

• … but not as much as actually possible

• We found the best step

• Rate reductions: up to 3.6% on motion data and to a further 5% on synthetic data (global minimization based on dynamic programming)

21

ME criterion for WT-based video coding

• WT video coding is based on temporal transform rather than classical temporal prediction

• Therefore MSE-based ME is not assured to be optimal • The optimal criterion is the maximization of the coding gain:

CG = 𝑎𝑖𝑤𝑖𝜎𝑖

2𝑀𝑖=1

𝑤𝑖𝜎𝑖2 𝑎𝑖𝑀

𝑖=1

• where 𝑖 is the subband index, 𝜎𝑖2 the variance, 𝑎𝑖 is the relative

number of coefficients, and 𝑤𝑖 the normalization factor of the 𝑖-th subband

• M. Cagnazzo, F. Castaldo, T. André, M. Antonini, and M. Barlaud, “Optimal motion estimation for wavelet video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 907–911, July 2007. 22

ME criterion for WT-based video coding

• We showed that we only have to minimize 𝜌2 = 𝜎𝑖2 𝑎𝑖𝑀

𝑖=1

• In general each MV influences all the subbands, the problem is still complex

• However, the CG can be analytically maximized for a particular class of MC-ed lifting schemes, the (𝑁, 0) LS

𝒗𝑩∗ , 𝒗𝑭

∗ = argmin𝑣𝐵,𝑣𝐹

ℰ 𝜖𝐵 + ℰ 𝜖𝐹 + 2 𝜖𝐵 , 𝜖𝐹

• Average rate reduction: 8%

23

x7 x6 x5 x8 x3 x2 x1 x4

h1 h2

l1 l2

h3 h4

l3 l4

x9 … Input frames

High-frequency subband

Low-frequency subband

(2,0) LS

NB: All rate reductions for video are measured using the Bjontegaard metric

3D video coding

• MVD format : multiple views plus depth

• Inter-view and inter-component redundancy

• Three contributions for the upcoming standard 3D-HEVC

24

Modification of the Merge candidate list for 3D-VC

• In the Merge mode, a block is predicted using a vector from a short list (Merge list)

• Coding the index list is much less costly than coding the vector • It can be a motion vector or a disparity vector • In 3D-HEVC, MVs are much more frequently selected than DVs • We have proposed to insert a further DV in the Merge list • Several positions in the primary and secondary list have been

tested • Best results obtained with the first position of the secondary list • We obtained both a rate reduction (0.6%) and a complexity

reduction (4%) • Contribution accepted into the standard

• E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. "Modification of the merge candidate list for dependent views in 3D-HEVC". In IEEE International Conference on Image Processing, September 2013. Melbourne, Australia.

• E. Mora, B. Pesquet, M. Cagnazzo and J. Jung. Modification of the Merge Candidate List for Dependent Views in 3DV-HTM. Document JCT3V-B0069 for Shanghai meeting (MPEG number m26793). Shanghai (PRC), October 2012. 25

Intra mode inheritance for 3D-HEVC

• Observation: blocks with strong contours and one dominant direction tend to be encoded with the same Intra directional mode in Texture and Depth

• Idea: when coding Depth, add the co-located Intra mode to the Most Probable Mode list when a dominant direction is detected

• Dominant direction is revealed by the presence of a single peak in the histogram of the gradient angle for the current block

• E. Mora, J. Jung, M. Cagnazzo, and B. Pesquet-Popescu, “Codage de vidéos de profondeur basé sur l’héritage des modes intra de texture,” in Compression et Représentation des Signaux Audiovisuels, vol. 1, (Lille, France), pp. 1–4, 2012.

• E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. “Depth Video Coding Based on Intra Mode Inheritance From Texture”. Submitted to APSIPA Transactions on Signal and Information Processing (2013) 26

2. Compute the gradient statistics

1. Find the reference texture block

3. If a dominant direction is detected, add to the MPM list

Intra mode inheritance for 3D-HEVC

• The Dominant angle revealed to be an effective feature for detecting blocks were inheritance is effective

• Inserting the inherited mode in the MPM list allows an average coding rate reduction of ≈1%

• Tests performed over MPEG sequences under Common Test Conditions

27

Enhanced quad-tree coding for 3D-HEVC

• The 3D-HEVC codec uses quad-trees for encoding texture and depth

• These trees are quite correlated

• We propose an inter-component coding tool for both reducing complexity and rate by exploiting the quad-tree redundancy

• Two variants, according to the component that is encoded first (texture or depth)

• Contribution to 3D-HEVC working draft and reference software

• E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. “Initialization, limitation and predictive coding of the depth and texture quad-tree in 3D-HEVC

Video Coding”. Accepted into IEEE Transaction on Circuits and Systems for Video Technology 28

Enhanced quad-tree coding for 3D-HEVC

• Observation: texture coding units are very often as much partitioned as depth

• Therefore we can limit the depth map partitioning level if we know texture…

• … or we can initialize the texture partitioning if we know depth • Complexity reduction (less configuration to examine): up to -31%

encoder saving time • Rate reduction (easier prediction of coding modes): up to -1.8%

29

Don’t Care Regions

A depth pixel only needs to be reconstructed such that the resulting geometric error leads to an acceptable distortion in the synthesized view

Disparity value

Error in the synthesized pixel value

DCR G. Valenzise, G. Cheung, R. Galvao, M. Cagnazzo, B. Pesquet-Popescu, and A. Ortega, “Motion prediction of depth video for depth-image-based rendering using Don’t Care Regions,” in Picture Coding Symposium, vol. 1, (Krakow, Poland), pp. 1–4, 2012. 30

DCR Example (Kendo, frame 10, t = 5)

31


We embedded DCR into a H.264/AVC encoder, changing three basic aspects:

1. Motion estimation

2. Residual coding

3. Skip mode

32


33


• We compute and encode prediction residuals wrt the DCRs

• For SKIP mode, no prediction residuals are coded

– The reconstructed values could be far outside the DCR, leading to an arbitrarily high distortion in the synthesized view

– We adopt a conservative policy: prevent SKIP selection when any reconstructed pixel is outside its DCR

• Results: average rate saving of 7%

• High preprocessing complexity

34

Other work in 3D video coding

• Dense disparity field for MVV and MVD coding

• Depth coding using elastic curve model

• I. Daribo, M. Kaaniche, W. Miled, M. Cagnazzo, and B. Pesquet-Popescu, “Dense disparity estimation in multiview video coding,” in IEEE Workshop on Multimedia Signal Processing, (Rio de Janeiro, Brazil), 2009.

• M. Cagnazzo and B. Pesquet-Popescu, “Depth map coding by dense disparity estimation for MVD compression,” in IEEE Digital Signal Processing, (Corfu, Greece), 2011.

• E. Mora, J. Jung, B. Pesquet-Popescu, M. Cagnazzo. "Modification of the disparity vector derivation process in 3D-HEVC". In IEEE Workshop on Multimedia Signal Processing, vol. 1, September 2013. Cagliari, Italy. 35

OBJECT-BASED IMAGE CODING

6 Conference papers

2 Journal papers

Region-based hyperspectral image coding

Multispectral / Hyperspectral Image

Map

Segmentation (TS-VQ)

Map Coding

Region Coding

• M. Cagnazzo, R. Gaetano, S. Parrilli, and L. Verdoliva, “Region based compression of multispectral images by classified KLT,” in EUSIPCO. 2006. • M. Cagnazzo, R. Gaetano, S. Parrilli, and L. Verdoliva, “Adaptive region-based compression of multispectral images,” in Proceed. of IEEE Intern. Conf.

Image Proc., (Atlanta, GA), pp. 3249–3252, Oct. 2006 • M. Cagnazzo, S. Parrilli, G. Poggi, and L. Verdoliva, “Costs and advantages of object-based image coding with shape-adaptive wavelet transform,”

EURASIP J. Image Video Proc., 2007 37


• Spectral transform: WT, global KLT, class-based KLT, region-based KLT

• Spatial transform: WT, SA-WT

• Encoder: SA-SPIHT with optimal rate allocation among objects

• Results:

– 0.5 dB better than JP2K-Multicomponent

– Better post-processing (i.e. classification) results

• M. Cagnazzo, G. Poggi, and L. Verdoliva, “Region-based transform coding of multispectral images,” IEEE Trans. on Image Processing, vol. 16, pp. 2916–2926, Dec. 2007. 38


AVIRIS image 32 bands, 0.3 bps (original @16bps)

Landsat TM image 6 bands, 0.6 bps (original @8bps)

39

Adaptive wavelet and rate allocation

• Adaptive wavelets (implemented via lifting schemes) allows to change filters according to the signal characteristics

• Further constraint: reconstruction without sending side information

x(k)

xd(k)= y01 (k)

U -P D Split

d(k)

xa(k)=y00 (k)

• S. Parrilli, M. Cagnazzo, and B. Pesquet-Popescu, “Distortion evaluation in transform domain for adaptive lifting schemes,” in IEEE Workshop on Multimedia Signal Processing, (Cairns, Australia), pp. 200–205, 2008.

• S. Parrilli, M. Cagnazzo, and B. Pesquet-Popescu, “Estimation of quantization noise for adaptive-prediction lifting schemes,” in IEEE Workshop on Multimedia Signal Processing, (Rio de Janeiro, Brazil), 2009. 40

x(k)

xd(k)= y01 (k)

U -P D Split

d(k)

xa(k)=y00 (k)

Adaptive wavelet and rate allocation

• The resulting transform is highly non-orthogonal

• Problem: distortion evaluation in the transform domain in order to perform rate allocation

• Solutions for uncorrelated noise

– Good error energy evaluation

– Performance improvement for ALS up to 3dB

– Improved SSIM (+3%)

41

• M. Cagnazzo and B. Pesquet-Popescu, “Perceptual impact of transform coefficients quantization for adaptive lifting schemes,” in International Workshop on Video Processing and Quality Metrics for Consumer Electronics, (Scottsdale, AZ), 2010.

• M. Abid, M. Cagnazzo, and B. Pesquet-Popescu, “Image denoising by adaptive lifting schemes,” in European Workshop on Visual Information Processing, vol. 1, (Paris, France), 2010

DISTRIBUTED VIDEO CODING

17 Conference papers

2 Submitted journal paper

3 Journal papers

Distributed video coding • Coding of many correlated sources

• Encoders do not communicate one with another

• Same RD performance of centralized coding (in theory only!)

Slepian-Wolf Coder

Quantizer Turbo

Encoder Min Distort

Reconstr

Q Q’ Buffer

Turbo Decoder

WZ WZ WZ SI

Image Interpolation

KF KF Intra

Coder Intra

Decoder Decoded KFs

Decoded WZFs

Encoder Decoder

43

Image interpolation: High-order trajectories for ME in DVC

• G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. "High order motion interpolation for side information improvement in DVC". In International Conference on Acoustics, Speech and Signal Processing, March 2010. Dallas, TX

• G. Petrazzuoli, M. Cagnazzo, and B. Pesquet-Popescu, “Fast and efficient side information generation in distributed video coding by using dense motion representation,” in European Signal Processing Conference, (Aalborg, Denmark), 2010.

• G. Petrazzuoli, T. Maugey, M. Cagnazzo, and B. Pesquet-Popescu, “Side information refinement for long duration GOPs in DVC,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010. 44

Rate −3.3%

Image interpolation: Pel-based motion estimation

• Block-based object trajectory used as initialization

• Within each block, pixel-by-pixel vectors are obtained by refining the initialization (Cafforio-Rocca algorithm)

• Refinement equations have been re-written and solved () since in this case the reference image does not exist

• Rate reductions: 3.5% to 6% • M. Cagnazzo, T. Maugey, and B. Pesquet-Popescu, “A differential motion estimation method for image interpolation in distributed video coding,” in

International Conference on Acoustics, Speech and Signal Processing, vol. 1, (Taiwan), pp. 1861–1864, 2009. • W. Miled, T. Maugey, M. Cagnazzo, and B. Pesquet-Popescu, “Image interpolation with dense disparity estimation in multiview distributed video

coding,” in International Conference on Distributed Smart Cameras, (Como, Italy), 2009. • T. Maugey, W. Miled, M. Cagnazzo, and B. Pesquet-Popescu, “Méthodes denses d’interpolation de mouvement pour le codage vidéo distribué monovue

et multivue,” in Colloque GRETSI - Traitement du Signal et des Images, (Dijon (France)), 2009. • M. Cagnazzo, W. Miled, T. Maugey, and B. Pesquet-Popescu, “Image interpolation with edge-preserving differential motion refinement,” in IEEE

International Conference on Image Processing, vol. 1, (Cairo, Egypt), pp. 361–364, 2009. 45

The Cafforio-Rocca algorithm: Sample results

46

Local and global SI fusion

• Given the WZF, feature points on the reference frames are extracted by SIFT

• Matching features allow to perform a global motion compensation (first SI)

• Local motion compensation (traditional method) is also performed (second SI)

• The two SI are merged using partial channel decoding and re-estimating motion

• Experiments show average rate reduction of ≈ 25% with respect to literature references

• A. Abou-El Ailah, F. Dufaux, M. Cagnazzo, B. Pesquet-Popescu, and J. Farah, “Successive refinement of side information using adaptive search area for long duration GOPs in distributed video coding,” in International Conference on Telecommunications, (Beirut), 2012.

• A. Abou-El Ailah, F. Dufaux, M. Cagnazzo, and J. Farah, “Fusion of global and local side information using support vector machine in transform-domain DVC,” in EUSIPCO, vol. 1, (Bucharest, Romania), pp. 1–5, Aug. 2012.

• A. Abou-El Ailah, G. Petrazzuoli, J. Farah, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Side Information Improvement in Transform-Domain Distributed Video Coding". In SPIE - Applications of Digital Image Processing,. San Diego, CA (USA), Aug. 2012

• A. Abou-El Ailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Fusion of global and local motion estimation for distributed video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, n. 1, pp. 158-172, Jan. 2013. 47

Multiview DVC

• Motion models for temporal image interpolations – High order motion

interpolation

– Pixel-based motion vector refinement

• Multi-hypothesis SI fusion based on observed parity bits and Bayesian classification

48

Views

Time

KF WZ WZ KF

KF WZ KF WZ KF

KF WZ WZ KF

KF WZ KF WZ KF

WZ

WZ

• G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. “Novel solutions for side information generation and fusion in multiview distributed video coding”. Submitted to Eurasip Journal of Advances in Signal Processing

Multiview DVC

• Step 1: produce a temporal estimation with HOMI

• Step 2: produce a inter-view estimation with occlusion reduction (use disparity to estimate foreground objects)

• Step 3: produce a fusion of the two estimations using Left-Right Consistency Check to remove residual occlusions

• Step 4: Select one out of these three images as side information

49

Multiview DVC

• For one image out of 𝑁 we ask for parity bits for temporal and inter-view estimation

• We compare the number of bits needed for correcting the two estimations: – If they are close, we

choose the fusion image

– If not, we select the image with the least rate

• Equivalent to Bayesian decision

𝐷 = argmax𝑑

𝑃 𝐷 = 𝑑 𝛿𝑅

= argmax𝑑

𝑝 𝛿𝑅 𝑑 𝑃 𝑑 = argmax𝑑

𝑓𝑑(𝛿𝑅)

50

Multiview DVC

• Experiments show that the Bayesian classifier selects very often the best SI

• It only may be wrong when the decoding rates are very near each to the other, but thus, selecting a suboptimal SI does not degrade performance

• Cumulated gain w.r.t to the state of the art: ≈ 9.1% rate reduction

51

Side information effectiveness

• Side information is corrected with parity bits to produce the decoded WZ frame

• Intuitively, the most the SI “is similar” to the original image, the less parity bits are needed

• Traditionally, PSNR between SI and WZF has been used to evaluate the SI quality

• However it is easy to build some toy example where two iso-PSNR images requires a very different number of correction bits

SI PSNR: 29.1 dB SI PSNR: 29.1 dB

Parity bits: 137kb Decoded quality: 39.3 dB

Parity bits: 192kb Decoded quality: 35.4 dB

• T. Maugey, J. Gauthier, M. Cagnazzo, B. Pesquet. “Evaluation of side information effectiveness in distributed video coding”. IEEE TCSVT, accepted 52

Side information effectiveness

• Questions: why PSNR is not always reliable? Can we find better metrics?

• Applications: Hash-based DVC systems, Witsenhausen-Wyner video coding systems, …

• New framework for metric comparison based on end-to-end RD performance

• Proposed metrics:

SIQ𝑎 𝐼0, 𝐼1 = 10 log10

2552

𝐼0 𝒑 − 𝐼1 𝒑 𝑎𝒑

HSIQ 𝐼0, 𝐼1 = 10 log10

𝑁bits

𝑑H 𝐼 0, 𝐼 1

• SIQ1 and HSIQ improves wrt PSNR both theoretical and practical effectiveness measures (Hash-based system: 20% rate reduction)

• PSNR works well for homogenous errors and start failing for large but spatially concentrated errors

• T. Maugey, C. Yaacoub, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Side information enhancement using an adaptive hash-based genetic algorithm in a Wyner-Ziv context,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010 53

IMVS using DVC

Views

Time

All frames are Intra Coded

Each image is coded and stored only once Large bandwidth requested Relatively low server space requested

54

IMVS using DVC

Views

Time

P-frames are used: all possible frame dependencies are coded

Each image is coded many times Smallest bandwidth requested

Very large server space requested

55

IMVS using DVC

Views

Time

WZ-frames are used: only parity bits are coded Each image is coded and stored only once Trade-off between server space and bandwidth

56

IMVS using DVC

57

Bandwidth

Server space

Only Intra

Predictive coding: Each image coded many times

Ideal Case: Path known at encoding time

WZ coding

Operation region

IMVS for MVD using DVC

• We proposed several strategies for view-switching

• The best (adaptive) achieves a rate reduction of more than 15% wrt to reference methods

G. Petrazzuoli, M. Cagnazzo, F. Dufaux, and B. Pesquet-Popescu, “Using distributed source coding and depth image based rendering to improve interactive multiview video access,” in IEEE International Conference on Image Processing, vol. 1, (Bruxelles, Belgium), pp. 605–608, 2011. G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Enabling Immersive Visual Communications through Distributed Video Coding". IEEE MMTC E-Letter (May 2013). 58

Other work on DVC

• Fusion schemes for multiview DVC

• Iterative methods for SI refinement

• DVC for multiple-view-plus-depth video

• DVC and interactive multiview streaming

• Local and global SI fusion

• Nine further conference papers

• A. Abou-El Ailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Fusion of global and local motion estimation for distributed video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, n. 1, pp. 158-172, Jan. 2013.

• G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Enabling Immersive Visual Communications through Distributed Video Coding". IEEE MMTC E-Letter (May 2013). 59

ROBUST VIDEO DISTRIBUTION

7 Conference papers

1 Submitted journal paper + 2 in preparation

2 Journal papers

ABCD protocol

• Problem: reliable diffusion of video on wireless network

• Construction of overlays to carry MDC video

• Minimization of the number of sent packets (both video and management packets)

• First contribution: a reliable extension of the IEEE 802.11 broadcast communication, using a control peer

• Once a reliable broadcast channel is provided, the nodes attach to the stream as soon as they hear about it

• C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “H.264-based multiple description coding using motion compensated temporal interpolation,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010

• C. Greco, G. Petrazzuoli, M. Cagnazzo, and B. Pesquet-Popescu, “An MDC-based video streaming architecture for mobile networks,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Hangzhou, China), pp. 1–4, 2011.

• C. Greco and M. Cagnazzo, “A cross-layer protocol for cooperative content discovery over mobile ad-hoc networks,” International Journal of Communication Networks and Distributed Systems, vol. 6, July 2011.

• C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “ABCD : Un protocole cross-layer pour la diffusion vidéo dans des réseaux sans fil ad-hoc,” in Colloque GRETSI - Traitement du Signal et des Images, (Bordeaux, France), 2011. 61

ABCD protocol


𝑠 𝑝1

𝑝2

Advertisement Attachment

Attachment

62

Video Data Video Data & Attachment

Attachment

ABCD protocol


𝑠 𝑝1

𝑝2

𝑝3

𝑝4

63

ABCD protocol: parent switch 𝑝∗ = argmin

𝑝𝑤ℎℎ 𝑝 + 𝑤𝑎𝑎 𝑝 + 𝑤𝑑𝑑 𝑝 − 𝑤𝑔𝑔(𝑝)

64

ABCD: simulation results (ns2)

65

ABCD/CoDiO

• ABCD may suffer from high delay in large, crowded networks

• To reduce the delay, we introduced a Congestion-Distortion Optimization (CoDiO) in the per-hop wireless broadcast transmission

• We adjust the RTS/CTS retry limit k of each packet in a Co-Di optimized fashion

• Small values of k reduce the congestion but the distortion increases, as the probability of obtaining the channel is lower

• High values of k lower the distortion, but congestion increases due to the channel occupation

Cost function: 𝐽 𝑘 = 𝐷 𝑘 + 𝜆𝐶(𝑘)

• C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “Low-latency video streaming with congestion control in mobile ad-hoc networks,” IEEE Transactions on Multimedia, vol. 14, n. 4, pp. 1337-1350, Aug. 2012. Paper selected as “High quality paper” by the IEEE MMTC-R Letter board 66

ABCD/CoDiO

Challenges:

• Model the effects of a single-node decision on the entire network

• Even if a node switches off, alternative paths may be formed

• Information about alternative paths is gathered at leaves and conveyed upstream

• The information is refined where it actually matters, i.e. near the root – where a single decision affects a lot of nodes

67

ABCD/CoDiO: simulation results (ns2)

68

Network coding for video delivery

• Network coding allows incrementing network throughput by letting intermediate nodes processing packets instead of simply relaying them

• NC can easily be extended to wireless networks

69

Network coding

• Using ABCD as overlay to implement NC in wireless network

• Optimized scheduling for MDC in Expanded Window NC

• Optimized scheduling for multiview video over NC

• Blind source separation for reducing the NC overhead

70


• RDO-scheduling in NC-based delivery • A generation is composed by the frame of a multi-view GOP

or a MDC GOP • Each node must decide the schedule of frames

• I. Nemoianu, C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “A framework for joint multiple description coding and network coding over wireless ad-hoc networks,” in International Conference on Acoustics, Speech and Signal Processing, (Kyoto, Japan), 2012

• I. Nemoianu, C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “A network coding scheduling for multiple description video streaming over wireless networks,” in EUSIPCO, vol. 1, (Bucharest, Romania), pp. 1–5, Aug. 2012.

• I. Nemoianu, C. Greco, M. Cagnazzo, B. Pesquet-Popescu. "Multi-View Video Streaming over Wireless Networks with RD-Optimized Scheduling of Network Coded Packets". In SPIE Visual Communications and Image Processing Conference, San Diego, CA (USA), Nov. 2012. 71


• RDO calls for a unique scheduling (send first the frame that maximally reduces the RD cost function)

• NC calls for different scheduling at each node (pseudo-random selection) in order to maximize the throughput

• Solution: to collect frames into groups with “similar” RD characteristics, and randomly select within a group

72

BSS for NC

• In NC the intermediate nodes of a network send linear combinations of the packets they have previously received, with random coefficients taken from a finite field

• The random coefficients must be added to the packet as headers, incurring an overhead

• In a blind source separation (BSS) based approach, it could be possible to relieve the nodes from the need to include the coefficients in the packets

• BSS consists in recovering a set of source signals 𝑆 from a set of mixed signals 𝑋 = 𝑓(𝑆), also referred to as observations, without knowing the sources themselves nor the mixing process parameters; in NC we have linear mixing, 𝑋 = 𝐴𝑆

• I. Nemoianu, C. Greco, M. Castella, B. Pesquet-Popescu, M. Cagnazzo. "On a practical approach to source separation over finite fields for network coding applications". In International Conference on Acoustics, Speech and Signal Processing, May 2013. Vancouver, Canada. 73

BSS for NC • Literature BSS approach in finite fields:

– Iterative scan of packet combinations – Minimization of a contrast function

• Our idea: add to packets a signature that is degraded by linear combination

• Then, the contrast function can be computed only on candidates having a valid signature

• Problems: how to choose the signature to reduce the probability that a linear combination of packets still carries a valid signature

• Simple solution: odd-parity bit • Drastic reduction of the search space

74

CONCLUSION

Perspectives

• “Classical” video coding: advanced models for rate control

• 3D VC: – combined use of motion and disparity compensation to produce

improved reference frames;

– elastic deformation model for lossless coding of depth contours

• DVC: – Improved SI generation using an elastic deformation model for

estimating object shapes;

– Geometry-based DVC system for MVD (no backward channel, no channel coding)

• NC and streaming: use of “social” information to optimize interactive multiview streaming with a NC approach

76

New themes

• Forensic, forgery detection

• Feature representation and compression

• Video protection

• Immersive communications: holoscopy / holography, high dynamic range

77

Research and activity report

Technology

video compression video

secure video coding

ce video

waveletbased video coding

d video telecomparistech

robust video coding

scalable video coding

coresponsible digital