Top Banner
Efficient Prediction Structure for Multi-view Video Coding Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007
23

Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Jan 11, 2016

Download

Documents

Donald McDowell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Efficient Prediction Structure for Multi-view Video Coding

Philipp Merkle, Aljoscha Smolic Karsten Müller,

Thomas Wiegand

CSVT 2007

Page 2: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

OutlineMulti-view video coding (MVC) introductionRequirements and test conditions for MVCPrediction structuresExperimental resultsConclusion

2

Page 3: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

MVC IntroductionMVC: Multi-view Video CodingMulti-view video (MVV): A system that uses

multiple camera views of the same scene is called.

Usage: 3DTV, free viewpoint video(FVV), etc.

3

Page 4: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Requirements for MVCTemporal random accessView random accessScalabilityBackward compatibilityQuality consistencyParallel processing

4

Page 5: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Temporal and inter-view correlation

5

T

T

T

temporal/inter-view mixed mode

Inter-view

temporal/inter-view mixed modeTemporal

Page 6: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Temporal and inter-view correlation analysis

6

H.264/AVC encoder was used with the following settings: Motion compensation block size of 16*16 Search range of ±32 pixels Lagrange parameter (λ) of 29.5

denotes the decrease of the average in comparison to temporal prediction only.J J

Page 7: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Simply including temporal and inter-view prediction modes

7

Temporal and inter-view correlation analysis (cont’d)

Page 8: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Lagrangian cost functionLagrangian cost function:

D denotes distortion.R denotes number of bits to transmit all components of

the motion vector.For each block in a picture, algorithm chooses

MV within a search rage that minimizes .

The distortion in the subject macroblock B is calculated by:

8

J D R (1)

argmin ( , ) ( , )i i im D S m R S m (2)

iS imM J

2

( , )

, ( , , ) ( , , )i x y tx y B

D S m s x y t s x m y m t m

(3)

Page 9: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

1D camera: Ballroom, Exit, Rena, Race1, Uli, (line)

Breakdancers (arched) 2D camera: Flamenco2 (cross), AkkoKayo

(array)

Use 5 to 16 camera views Target high quality TV-type video (640*480

or 1024*768) then limited channel communication-type video.

9

Test data and test conditions

Page 10: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Knowledge – hierarchical B picture, QP cascadingHierarchical B picture, key picture, non-key

picture:

QP cascading : [1]

10

key picture key picture

1 ( 1?4 :1)k kQP QP k

[1] “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

Page 11: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Knowledge – DPB sizeDecoded Picture Buffer (DPB) size is

increased to: [2]

11

2* _ _ _GOP length number of views

[2] “Efficient Compression of Multi-view Video Exploiting Inter-view Dependencies Based on H.264/AVC”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

Memory-efficient reordering of multi-view input for compression

Page 12: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Two tasks1. To adapt the multi-view prediction schemes

to the specific camera arrangements of the test data sets.

2. To adapt the prediction structures to the random access specification.

12

Page 13: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Prediction structureSimulcast coding structureTo allow synchronization and random access,

all key pictures are coded in intra mode.

13

Page 14: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Prediction structure (cont’d)The first view is called base view (remains

the I frame).

14

0S

Page 15: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Prediction structure (cont’d)Alternative structures of inter-view for key

pictures

15

KS_IPP KS_PIP KS_IBP

KS_IPP

KS_PIP

KS_IBP

Linear camera arrangement 2D Camera array

Page 16: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Prediction structure (cont’d)Inter-view prediction for key and non-key

pictures

16

AS_IPP mode

Page 17: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Experimental results – objective evaluation

17

Ballroom test result

Average coding gains compared with anchor coding

Page 18: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Experimental results – subjective evaluationDifferent bit-rates were selected for the

different data sets.

18

Ballroom test result

Race1 test result

Page 19: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Experimental results – subjective evaluationAS_IBP outperforms the anchors significantly.The gain decreases slightly with higher bit-rates.

19

Average results over all test sequences

Page 20: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Influence of camera densityUsing Rena sequence, and

consisting of 16 linear arranged cameras with a 5 cm distance between two adjacent cameras

Repeated for each shifted set of 9 adjacent cameras

The structure are applied to every time instance of the MVV sequence without temporal prediction.

20

Page 21: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Results of experiments on camera density

Coding gain increases with decreasing camera distance and decreasing reconstruction quality.

21

Page 22: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

Results of experiments on camera density (cont’d)

Results of average per camera rate relative to the one camera case(→)

A larger QP value leads to a larger coding gain

22

Page 23: Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.

ConclusionResulting multi-view prediction: achieving

significant coding gains and being highly flexible.

Parallel processing is supported by the presented sequential processing approach.

Problems:Large disparities between the different views

of multi-view video sequencesIllumination and color inconsistencies across

views

23