RD Optimized Coding for Motion Vector Predictor Selection

Guillaume Laroche,Joel Jung,Beatrice Pesquet-Popescu

CSVT 2008

1

For the purpose of reducing the bitrate, the paper proposes two schemes:

A competition-based spatial-temporal scheme for the prediction of motion vector

Increasing the amount of skipped macroblocks via using a competition-based SKIP mode

2

IntroductionMV prediction and selection

MV and SKIP mode competitionCompetition-based MV codingCompetition-based Skip modeMultiple reference framesMV competition for B-slice

Experimental Results

Conclusion

3

mvcolmv1

mv7mv0

mv6

mv3

mv2mv5

mv4

mv

mvcmvb

mva

mvd

Frame NFrame N-1

mvcol is the collocation of macroblock “mv”

FrameN

FrameN-1

mvmvcol

4

We choose ： Motion vector residual is given by:

εmv ： motion vector residual

mv ： motion vector p ： motion vector predictor (MVp)

mv mv p

E

B C

A E

B CA E

BC

A

median{ , , }a b cp mv mv mv

5

A skipped MB only has the mode itself needing to be transmitted

Most used in static background

6

Two types: spatial and temporal Spatial direct mode uses neighboring MV to

predict MV In temporal direct mode list0 and list1 predicted

vectors are scaled

Ref1Ref0Ref2

Current B frame

mvcolL1

mvcolL00

1Lmv

11Lmv

dL0L2

dL0L1

dL0

1

1

01 0

0 1

11 0 0 1

0 1

( )

L

L

colLL

L L

colLL L L

L L

mvmv d

d

mvmv d d

d

7

By minimizing the RD-criterion:

D ： distortionLR ： weighted rate and the corresponding

bitrate components:

Rr ： the rate for block residue (luma+chroma)

Rm ： the rate of the macroblock mode (SKIP or intra/inter prediction and macroblock partition type)

Rmv ： the rate of the motion vector residue

Ro ： the rate of the others components (header, CBP…)

J D LR

r m m mv mv o oLR R R R R

8

For SKIP mode, the RD-criterion becomes:

where no any Ro, Rr, or Rmv is necessary to be transmitted in SKIP mode.

In practice, the cost λmRm is negligible compared with the distortion.

9

SKIP SKIP m mJ D R

Predictor set:Spatial predictors:

mva, mvb, mvc, mvd ,H.264 median predictor mvH.264, and extended spatial predictor mvspaEXT, where

if 3 vectors are available. Otherwise equal to mva, , otherwise equal to mvb, otherwise mvc, or 0 if none is available.

10

mv

mvcmvb

mva

mvd

Frame N

median{ , , }spaEXT a b cmv mv mv mv

Ref: J. Jung and G. Laroche, “Competition-based scheme for motion vector selection and coding” ITU-T VCEG, Klagenfurt, Austria, 2006, Information VCEG-AC06

Predictor set:Temporal predictors:

mvcol, mvtf, mvtm5, mvtm9, where

11

mvcolmv1

mv7mv0

mv6

mv3

mv2mv5

mv4

mv

mvcmvb

mva

mvd

Frame NFrame N-1Ref1 Ref0

Current frame

Current block

Collocated block

mvH.264mvtf

mvcol

5

9

median{ ,{ ,0 4}}

median{ ,{ ,0 8}}tm col i

tm col i

mv mv mv i

mv mv mv i

Predictor set:

Spatial-temporal predictors:

It gives a higher importance to the mvcol value

12

median{ , , , , }spt col col a b cmv mv mv mv mv mv

Choices of MV:Adaptive choices

Based on content or statistical criteria No need to transmit index of the mode if decoder is

able to determine the mode

Exhaustive choices All possible predictions are tested A mode needs to be transmitted in the bit stream An index i and a residual εmvi

are associated with

each predictor :

where n is the number of predictors in the defined predictor set P

13

ip P , 1,

imv imv p i n

For the selection of the MV, the bitrate of the motion vector residue Rmv is replaced by Rmv/mm to yield:

where Rmv/mm contains the cost of the

residual εmvi and the cost of the index

information i

14

/r m m mv mv mm o oLR R R R R

/ 1,...,min ( ) ( )

imv mm mv i nR i

We change the equation

to

JSKIPi： RD cost

DSKIPi： distortion related to pi

where Ps is the set of motion vectors for the SKIP mode

If Skip mode is chosen, the index of the predictor is sent.

15

SKIP SKIP m mJ D R

( ) , 1,...,i iSKIP SKIP m m sJ D R i i n

i sp P

Assuming an object moves with constant speed, the predictor mvcolR0

is scaled according to the temporal

distances of the reference pictures used to the current block and the temporal distance between Ref0 and Refj.

16

Current frameRefiRefj Ref0

mvmvcolR0

dj

di

0

0

R

R

col

Scol ij

mvmv d

d

mvScolR0： Scaled predictor

Ref0 ： previous reference frame

Another predictor: the sum of temporally successive collocated vectors

Considering the all MV in each reference frame only point to their first previous frame. In this configuration, mvScoli

is scaled MV collocated in

Refi pointing to Refi+1

The sum of these successive temporal predictors

mvTsumj is defined by:

j ： the reference frame number of the current predictor block17

0

,j i

i j

Tsum Scoli

mv mv j N

We consider mvtfsumj , a sum of predictors derived from the

predictor mvtf :

mvStfRi is the MV at the position given by mvStfRi-1

in Refi-1 pointing to

Refi ,except mvStfR0 which is mvScol0

18

0

,j Ri

i j

tfsum Stfi

mv mv j N

Ref3 Ref1 Ref0Ref2 Current B frame

1Scolmv2Scolmv

3Scolmv

mvStfR1mvStfR2mvStfR30Scolmv

mvStfR0=

No modification of the Direct mode is proposed The MV resulting from the spatial Direct mode

is not considered in the set of predictors Considering the case of N successively coded

B-frames

19

Ref1Ref0Ref2

Current B frame

mvcolL1

mvcolL00

1Lmv

11Lmv

dL0L2

dL0L1

dL0

1

1

01 0

0 1

11 0 0 1

0 1

( )

L

L

colLL

L L

colLL L L

L L

mvmv d

d

mvmv d d

d

0

0

02 0

0 2

12 0 0 1

0 2

( )

L

L

colLL

L L

colLL L L

L L

mvmv d

d

mvmv d d

d

02Lmv

12Lmv

Vector mvcolB-1L0 and mvcolB-1L1

are used for

the scaling of predictors pair: , and

, respectively.

20

Ref1Ref0

Current B frame

mvcolB-1L0

dL0B-1

dL0L1

dL0

B-1

mvcolB-1L1

1 0

1 0

03 0

0 1

13 0 0 1

0 1

( )

B L

B L

colLL

L B

colLL L L

L B

mvmv d

d

mvmv d d

d

1 1

1 1

04 0

0 1 0 1

14 0 1 0

0 1 0 1

( )

B L

B L

colLL

L B L L

colLL L L

L L L B

mvmv d

d d

mvmv d d

d d

03Lmv 1

3Lmv

14Lmv0

4Lmv

Bitrate saving on the first and second B-frame for CIF sequencesFirst predictor: mvH.264

mvcolL1: MV collocated in the

future frame without scaling

mvBcol = (collocated block == intra mode ? mva : mvScol L1

)

mvScolL0 and mvScolL1

proves that MV field of a B-frame is

more correlated with the future reference frame

21

Two profile: Baseline profile, High profile 32*32 search range 8*8 transform 4 reference frames Test set: 9 CIF, 4 SD(640*480), and 2

720p(1280*720) sequences QP=28, 32, 36, 40

22

Predictor sets: 11 predictors in the set P:

Percentage of the selection of each proposed predictor for MV competition for the CIF test set in the Baseline profile:

23

0.264 5

9

, , , , , ,

, , , ,R

j j

H a b c Scol tm

tm tf spt Tsum tfsum

mv mv mv mv mv mv

mv mv mv mv mv

Comparing P sets containing two predictorsFor all CIF sequences, mvH.264 is combined one

by one with each predictor.The bitrate savings for different pairs of

predictors:

24

Selecting the optimal number of predictors in the sets

P sets of MV predictor are:

Ps sets of MV SKIP mode are:

25

0

0

1 .264

2 .264

4 .264 9

{ }

{ , }

{ , , , }R

R

H

H Scol

H Scol a tm

P mv

P mv mv

P mv mv mv mv

0

1 .264

2

4 .264

{ }

{ , }

{ , , , }R

s H

s spaEXT a

s spaEXT H Scol a

P mv

P mv mv

P mv mv mv mv

Spatial and temporal predictor competitionTemporal predictors are useful

The temporal selection is correlated with the reference frame

26

The percentage of increase of the number of macroblocks encoded with the SKIP mode

27

For sequences with large objects and fluid motion

A spatial predictor as the second predictor is less efficient for sequences with static background

A compression gain is acquired for all test sequences

28

For simple or no motion sequences, SKIP mode is widely used, so the gains are lower.

Fast or complex motion sequences take full advantage of the temporal prediction

RD curves for 4 of the test sets At low bitrate, motion

information tends to become a significant part of the total bitstream

The bitrate reduction is not related to the resolution, but related the frame rate

29

The problem is modified due to the presence of B pictures and multiple reference frames Is the P set used for the P-frames in the Baseline

profile still adapted to the High profile, where the temporal distance between P-frames is increased?

Which set is the most adapted to the B-frames, and is it the same for all the B-frames between two P-frames?

30

The same sets as the ones proposed for the Baseline profile gives the best results

The temporal distance between two P-frames is larger, so the temporal correlation between motion vector fields is smaller

31

Distribution of the predictor selection in the High IBBP profile for the P- and B-frames

Bitrate saving in the high IBBP profile (only computed for CIF sequences)

Bitrate saving on the first and second B-frame for CIF sequencesFirst predictor: mvH.264

mvcolL1: MV collocated in the

future frame without scaling

mvBcol = (collocated block == intra mode ? mva : mvScol L1

)

mvScolL0 and mvScolL1

proves that MV field of a B-frame is

more correlated with the future reference frame

32

Bitrate reduction of each sequences

The gain is lower than the Baseline profile is explained by the results obtained on P-frames

33

Average bitrate reduction of Baseline and High profile are 7.7% and 4.3% respectively.

The MV predictions are selected via an RD-criterion that considers the cost of the residual and the index for the prediction.

An adaptation of predictors set according to the statistical characteristics for the sequence should allow to increase even more bitrate saving.

34

RD Optimized Coding for Motion Vector Predictor Selection

Documents