Top Banner
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based Three-Step Search Scheme for Motion Estimation With Application * KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, 106 Taiwan E-mail: [email protected] The three-step search (TSS) scheme has been widely used in block-based motion estimation and has also been incorporated into several motion estimation methods suc- cessfully to improve their performance. Instead of adopting the conventional square search pattern used in the TSS scheme, this paper presents a new prediction- and affine transformation-based TSS (PATSS) scheme and it leads to a more efficient search pat- tern. When employing our proposed PATSS scheme into some existing well-known mo- tion estimation algorithms, such as the newly published E3SS algorithm by Jing and Chau, experimental results show that not only the image quality can be improved, but the number of search points can be reduced significantly. Keywords: affine transformation, block motion estimation, MPEG, prediction, three-step search 1. INTRODUCTION Motion estimation plays an important role in video encoding, such as in MPEG standards. Usually successive frames in a video are rather similar so that it is enough to code only their difference. Motion estimation is a standard paradigm to remove the tem- poral redundancy. The full search (FS) algorithm is the simplest motion estimation algorithm and pro- vides a global optimal solution since it searches all candidates in the search window. However, the FS algorithm is rather time-consuming. Many fast block matching algo- rithms to achieve local optimal solutions have been presented to reduce the execution- time required in searching candidates. These algorithms include the 2D-algorithm search (LOGS) [11], the three-step search (TSS) [13], the new three-step search (NTSS) [15], the four-step search (4SS) [19], the block-based gradient descent search (BBGDS) [16], the horizontal- or vertical projection-based search [14], the diamond search (DS) [20, 22], the adaptive search order [2], the adaptive rood pattern search [18], the cross-diamond search [6], the set-based predictive search [7], the efficient three-step search (E3SS) [12], the block sum- block variance-based search [17], and so on. The above survey focus on those developed algorithms for predicting the initial search point, the search pattern, and the size of the search window. Besides such research issues, Huang et al. [10] consider Received August 21, 2006; revised November 10, 2006; accepted February 6, 2007. Communicated by Liang-Gee Chen. * This research was supported by the National Science Council of Taiwan, R.O.C. under contracts No. NSC 94-2213-E-011-041 and NSC 95-2221-E-011-152.
15

New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

Apr 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008)

1095

New Prediction- and Affine Transformation-Based Three-Step Search Scheme for Motion Estimation

With Application*

KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG

Department of Computer Science and Information Engineering National Taiwan University of Science and Technology

Taipei, 106 Taiwan E-mail: [email protected]

The three-step search (TSS) scheme has been widely used in block-based motion

estimation and has also been incorporated into several motion estimation methods suc-cessfully to improve their performance. Instead of adopting the conventional square search pattern used in the TSS scheme, this paper presents a new prediction- and affine transformation-based TSS (PATSS) scheme and it leads to a more efficient search pat-tern. When employing our proposed PATSS scheme into some existing well-known mo-tion estimation algorithms, such as the newly published E3SS algorithm by Jing and Chau, experimental results show that not only the image quality can be improved, but the number of search points can be reduced significantly. Keywords: affine transformation, block motion estimation, MPEG, prediction, three-step search

1. INTRODUCTION

Motion estimation plays an important role in video encoding, such as in MPEG standards. Usually successive frames in a video are rather similar so that it is enough to code only their difference. Motion estimation is a standard paradigm to remove the tem-poral redundancy.

The full search (FS) algorithm is the simplest motion estimation algorithm and pro-vides a global optimal solution since it searches all candidates in the search window. However, the FS algorithm is rather time-consuming. Many fast block matching algo-rithms to achieve local optimal solutions have been presented to reduce the execution- time required in searching candidates. These algorithms include the 2D-algorithm search (LOGS) [11], the three-step search (TSS) [13], the new three-step search (NTSS) [15], the four-step search (4SS) [19], the block-based gradient descent search (BBGDS) [16], the horizontal- or vertical projection-based search [14], the diamond search (DS) [20, 22], the adaptive search order [2], the adaptive rood pattern search [18], the cross-diamond search [6], the set-based predictive search [7], the efficient three-step search (E3SS) [12], the block sum- block variance-based search [17], and so on. The above survey focus on those developed algorithms for predicting the initial search point, the search pattern, and the size of the search window. Besides such research issues, Huang et al. [10] consider

Received August 21, 2006; revised November 10, 2006; accepted February 6, 2007. Communicated by Liang-Gee Chen. * This research was supported by the National Science Council of Taiwan, R.O.C. under contracts No. NSC

94-2213-E-011-041 and NSC 95-2221-E-011-152.

Page 2: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG

1096

the motion status of the current block and its neighboring blocks to design an adaptive fast block-matching algorithm by switching search patterns; Ahmad et al. [1] presented an efficient FAME algorithm which utilizes both spatial and temporal domains to control the search and to select the shape search pattern in order to accelerate motion search. The proposed FAME algorithm outperforms the previous PMVFAST [21]. Some hardware, system, and error concealment considerations have been discussed in [3-5, 10].

Among these developed search algorithms mentioned above, excepting the FS algo-rithm, the TSS scheme is the most well-known due to its simplicity and effectiveness. For each current block in the current frame, the TSS scheme uses a square search pattern to check 25 candidates in a 15 × 15 search window instead of checking 255 candidates and it leads to about 90% execution-time improvement ratio, but has a little quality deg-radation. The TSS scheme performs well in large motion videos because of evenly dis-tributed search points, but its time performance in slow motion videos is not as well as in large motion videos when compared with other fast block matching algorithms, such as the NTSS, 4SS, DS, and E3SS. The NTSS algorithm employs a center-biased search pat-tern to correspond with video nature behavior and the halfway-stop technique to speed up the search of stationary blocks. The NTSS algorithm has better performance than the TSS scheme in terms of time and quality. The DS algorithm by Zhu and Ma [22] achieves close image quality but requires less computation effort by up to 22% when compared with the NTSS algorithm. The E3SS algorithm by Jing and Chau [12] has similar per-formance to the DS algorithm, but has better quality in large motion videos due to utiliz-ing the TSS- based search pattern. Recently, Huang, Cho, and Wang [10] presented a new finite-state model to efficiently utilize the speed and quality advantages of different mo-tion estimation algorithms based on motion behavior of blocks. In [1], Ahmad et al. pre-sented an efficient FAME algorithm which utilizes both spatial and temporal domains to control the search and to select the shape search pattern in order to accelerate motion search. The proposed FAME algorithm outperforms the previous PMVFAST [21].

In this paper, we present a new efficient prediction- and affine transformation-based three-step search (PATSS) scheme for block motion estimation. Our proposed PATSS scheme can find a more efficient compact search pattern such that for slow (large) mo-tion blocks, the proposed PATSS scheme has better execution-time (quality) performance while keeping the similar quality (execution-time) when compared to the TSS scheme. When employing our proposed PATSS scheme into some existing well-known motion estimation algorithms, such as the newly published algorithm by Jing and Chau [12] which is taken as the representative, experimental results show that not only the image quality can be improved, but the number of search points can be reduced significantly.

The rest of this paper is organized as follows. Section 2 reviews two relevant past works for block estimation, the TSS scheme and the E3SS algorithm. Section 3 presents our purposed PATSS scheme and the PATSS-based E3SS (PAE3SS) algorithm. Section 4 demonstrates the performance comparison among the FS algorithm, the TSS algorithm, the E3SS algorithm, our proposed PATSS algorithm, and the proposed PAE3SS algo-rithm. Some conclusions are addressed in section 5.

2. PAST WORKS

In this section, two past works for block motion estimation, the TSS scheme by

Page 3: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

IMPROVED THREE-STEP-SEARCH SCHEME

1097

Koga et al. [13] and the currently published E3SS algorithm by Jing and Chau [12] are surveyed. 2.1 The TSS Scheme

In the TSS scheme, initially the current frame is divided into a set of fixed-size

blocks, each block being of size b × b, where b = 2K. For each current block Bc, it wants to find the best matching reference block within the search window in the reference frame according to a predefined matching criterion. The motion vector is defined to be the displacement between the current block Bc and the best matching reference block. Throughout this research, the matching criterion used is the mean square error (MSE) which is defined by

22

1 1

1MSE( , ) [ ( , ) ( , )]b b

x y c r x yx y

v v B x y B x v y vb = =

= − + +∑∑ (1)

for − (b − 1) ≤ vx, vy ≤ (b − 1) where Bc(x, y) denotes the gray level of the pixel at position (x, y) in the current block Bc and Br(x + vx, y + vy) denotes the gray level of the pixel at position (x + vx, y + vy) in the reference block Br in the reference frame. By Eq. (1), the best matching reference block within the search window is the found block with the minimal MSE(vx, vy) where the vector (vx, vy) denotes the found motion vector. For con-venience, in the TSS scheme, the terminology “winning point” denotes the search point with minimal MSE among all search candidates in the search window after performing one computation of MSEs. All possible search points considered in the first computation of MSEs is called the coarse search pattern. To compute the MSEs of the coarse search pattern is called coarse search.

As shown in Fig. 1, a 15 × 15 search window with search distance ± 7 is adopted as the example. The coarse search pattern in the TSS scheme is a 9 × 9 square grid, consist-ing of nine search points, and the nine square points include four corners, four midpoints of each outline and the center point. In the first computation of MSEs, we compute the MSEs of the nine search points in the coarse search pattern and obtain the winning point. In the second computation of MSEs, the center of the square grid is moved to the win-ning point obtained by the first computation of MSEs and take the winning point as the new origin. At the same time, each side of the moved square grid is shrunk by half, which results in a 5 × 5 new square grid, and the MSEs of the nine search points in the new square grid are computed. In the third computation of MSEs, according to the win-ning point obtained by the second computation of MSEs, the 5 × 5 grid is moved and shrunk again to create a 3 × 3 new square grid. Finally, the winning point of the 3 × 3 square grid is outputted and the position of the winning point is treated as the motion vector.

The TSS scheme consisting of the following three search steps is described as fol-lows. Step 1: Compute the MSEs of nine evenly distributed search points, which are four cor-

ner points, four midpoints, and one center point in the 9 × 9 square grid. The winning point is thus obtained.

Page 4: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG

1098

Fig. 1. The coarse search pattern of TSS

scheme. Fig. 2. An example of the search points in the first

two computations of MSEs in E3SS.

Step 2: Move the square grid such that its center is located at the winning point obtained

by step 1 and shrink each side of the grid by half. Compute the MSEs of new eight search points in new square grid and the center of new grid is the winning point obtained by step 1. Next, compute the MSEs of nine search points in new square grid and obtain the winning point.

Step 3: According to the winning point obtained by the step 2, shrink the square grid and then move the new shrunk square grid again. Then compute the MSEs of new eight search points. Output the winning point and its position as the motion vec-tor.

According to the above three search steps, it is easy to verify that the number of search points in a 15 × 15 window is 25 (= 9 + 8 + 8), regardless of block contents.

2.2 The E3SS Algorithm

As shown in Fig. 2, a 15 × 15 search window is adopted again as the example. Fig. 2 shows an example of the search patterns used in the first two computations of MSEs in the E3SS algorithm [12]. The coarse search pattern in the E3SS algorithm uses the search pattern of the TSS scheme and the search pattern of the small diamond search pattern (SDSP) [22] simultaneously where the five black circles in Fig. 2 denote the SDSP. That means at least MSEs of 13 search points, nine evenly distributed points in the search window departed from each other for distance four and meanwhile five points, (0, 0), (0, 1), (1, 0), (0, − 1) and (− 1, 0), clustered around the center of search window forming a SDSP, are computed, but the origin (0, 0) should be calculated only once. If the winning point is located at the origin in the search window, output the origin and its position as the motion vector. If the winning point is around the origin, one of the other four black circles shown in Fig. 2, then we move the SDSP such that the new center of SDSP is lo-cated at the winning point and we repeat this pattern movement until some winning point is located at the center of SDSP. Otherwise, when the winning point is located at the eight black square points of the 9 × 9 outer grid shown in Fig. 2, then the following process is the same as the TSS scheme. The E3SS algorithm is described as follows.

Page 5: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

IMPROVED THREE-STEP-SEARCH SCHEME

1099

Step 1: Compute MSEs of 13 (= 5 + 9 − 1) search points, which are 5 points of SDSP and 9 points evenly distributed over the search window where the point at the origin overlaps in both search patterns. If the obtained winning point is at the ori-gin of the search window, output the origin and its position as the motion vector. Otherwise, if the winning point is on the SDSP (excepting the origin), go to step 2; if the winning point is located on the outer grid (see the eight black square points in Fig. 2), go to step 3.

Step 2: Set the winning point obtained by last computation of MSEs, e.g. the winning point located at (1, 0), as the center of the new SDSP to compute the MSEs of new search points, e.g. points located at (1, − 1), (1, 1), and (2, 0). According to the winning point obtained from the new search points, repeat the same SDSP- based movement and computations of MSEs until once the winning point is lo-cated at the center of the SDSP. Output the winning point at the SDSP’s center and its position as the motion vector. Stop the E3SS algorithm.

Step 3: Move the center of the 9 × 9 square grid to the winning point obtained by step 1 and shrink each side of the outer grid by half. At that time, the new shrunk grid is of size 5 × 5 and has 9 triangle black points shown in Fig. 2. Compute the MSEs of these new eight search points. According to the winning point obtained from the new eight search points , repeat the moving and shrinking process until a 3 × 3 grid can’t be shrunk anymore. Output the current winning point and its position as the motion vector.

For a stationary block, whose motion vector is (0, 0), the E3SS algorithm reaches its low bound of search points, i.e. 13 search points. To a block with small motion within the central 5 × 5 area, it may cost 16 to 21 search points. It takes 29 search points in worst case.

3. OUR PROPOSED PATSS SEARCH SCHEME AND INCORPORATING IT WITH E3SS ALGORITHM

In this section, we first present the proposed PATSS search scheme. Second, we de-scribe how to incorporate the proposed PATSS search scheme with the recently published E3SS motion estimation algorithm by Jing and Chau [12] in order to reduce the number of search points and improve the image quality.

3.1 The Proposed PATSS Search Scheme

Due to the spatial locality property in the current frame and the temporal locality property between the reference frame and the current frame, this subsection presents a novel prediction- and affine transform-based (PA-based) search scheme. The first stage to speed up the motion estimation for the current block B(i, j, k) in the kth frame is to find a suitable coarse search pattern such that the search work for determining the motion vec-tor of B(i, j, k) can be proceeded more efficiently. As shown in Fig. 3, according to the spatial locality property in the current frame and the temporal locality property between the reference frame and the current frame, the five motion vectors of blocks, B(i − 1, j − 1, k), B(i, j − 1, k), B(i + 1, j − 1, k), B(i − 1, j, k), and B(i, j, k − 1), are used to determine

Page 6: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG

1100

our proposed PA-based coarse search pattern P(i, j, k) and the corresponding five motion vectors are denoted by MV(i − 1, j − 1, k) = (X1, Y1), MV(i, j − 1, k) = (X2, Y2), MV(i + 1, j − 1, k) = (X3, Y3), MV(i − 1, j, k) = (X4, Y4), and MV(i, j, k − 1) = (X5, Y5), respectively. For the following exposition, let M = (Mx, My) = MV(i − 1, j − 1, k) ∪ MV(i, j − 1, k) ∪ MV(i + 1, j − 1, k) ∪ MV(i − 1, j, k) ∪ MV(i, j, k − 1).

The PA-based coarse search pattern P(i, j, k) (see the window pointed by arrows in Fig. 3) is mainly determined by the three affine transformation-based parameters, namely the translation factor, the scaling factor, and the rotation factor. First, the translation fac-tor can be used to determine an approximated motion vector of the current block, and then the center of the coarse search pattern can be translated by the determined motion vector. Next, by using the scaling factor to control the size of the coarse search pattern, we thus can scale the coarse search pattern as small as possible. Further, rotating the coarse search pattern by the rotation factor, it could to cover more feasible search points and it may lead to a smaller MSE. The PA-based coarse search pattern determined by the three parameters is quite different from the previous one used in the TSS scheme, but it can reduce the number of search points in motion estimation and improve the quality of the resulting decompressed image.

Fig. 3. Predicting the coarse search pattern by utilizing four spatial neighboring motion vectors and

one temporal neighboring motion vector.

The translation factor (TF) can be used as the new center of the PA-based coarse

search pattern and naturally can be defined by

1 2 5 1 2 5TF ,5 5

X X X Y Y Y+ + + + + +⎛ ⎞= ⎜ ⎟⎝ ⎠

… … (2)

to move the coarse search pattern center to the mean of the four spatial motion vectors and one temporal motion vector mentioned in Fig. 3. From Fig. 2, it is known that a 15 × 15 (= (2d + 1) × (2d + 1)) search window is used, where the search distance d = 7. The coarse search pattern in the TSS scheme is a (d + 2) × (d + 2) (= 9 × 9) square grid which consists of nine search points. However, the coarse search pattern used in the TSS scheme may not be the best one for slow motion blocks since the search pattern can be shrunk to reduce the number of required search points. Therefore, the size of the PA- based coarse search pattern can be bounded by the following maximum and minimum values of x axis and y axis in the set of motion vectors M:

( , )mini jX = Min{X1, X2, X3, X4,

Page 7: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

IMPROVED THREE-STEP-SEARCH SCHEME

1101

X5}, ( , )maxi jX = Max{X1, X2, X3, X4, X5}, ( , )

mini jY = Min{Y1, Y2, Y3, Y4, Y5}, and

( , )max

i jY = Max{Y1, Y2, Y3, Y4, Y5} where as mentioned before, (X1, Y1), (X2, Y2), (X3, Y3), and (X4, Y4) denote the four spatial motion vectors of the neighboring blocks of the current block; (X5, Y5) denotes the temporal motion vector of the reference block in the reference block. Ac-cording to the minimum (maximum) values in x axis and y axis, which are denoted by

( , ) ( , )maxmin ( )i j i jX X and the

( , ) ( , )maxmin ( ),i j i jY Y respectively, we can scale the size of PA-based

coarse search pattern from (d + 2) × (d + 2) to ( , ) ( , )( , ) ( , )

max maxmin min( 1) ( 1)i j i ji j i jX X X X− + × − + and the scaling factor (SF) of the PA-based coarse search pattern over the original search pattern in the TSS scheme can be defined by

( , ) ( , )( , ) ( , )max maxmin min1 1SF .,

2 2

i j i ji j i jX X Y Yd d

⎛ ⎞− + − += ⎜ ⎟+ +⎝ ⎠

(3)

After TF and RF have been determined, the calculation of the rotation factor (RF) is described as follows. For the set of the five known motion vectors M, the (p + q)th order moment mpq [8] is defined by mpq = ∑(x,y)∈M xpyqf(x, y) where f(x, y) = 1 and p and q are

positive integers. In fact, the position ( )10 01

00 00,m m

m m is the centroid of the four known spa- tial and one known temporal neighboring motion vectors. The (p + q)th order central

moment upq is defined by upq = ( ) ( )10 01

00 00( , ) ( , ).p q

x y Mm mm mx y f x y

∈− −∑ The central mo-

ment is a good way to determine the angle of the major axis of one object. Therefore, we use it to calculate the orientation of the PA-based coarse search pattern, i.e. RF, and the value of RF can be determined by

11

20 02

1 2RF arctan .

2u

u u=

− (4)

After describing how to determine the three parameters for the PA-based coarse search pattern, we now take an example to explain how to determine the PA-based search pattern via Eqs. (2), (3), and (4). Return to Fig. 2 again, we have d = 7. Suppose the five known spatial and temporal neighboring motion vectors are (7, − 1), (5, − 5), (2, − 3), (− 1, − 1), and (− 3, − 5). From the above definition, the maximum and minimum values in x axis and y axis can be obtained and we have

( , ) ( , )( , ) ( , )max maxmin min 3, 7, 5,i j i ji j i jX X Y Y= − = = − =

= − 1. By Eqs. (2), (3), and (4), the three parameters, TF, SF, and RF, can be calculated by

1 2 5 1 2 5

( , ) ( , )( , ) ( , )max maxmin min

TF ,5 5

7 5 2 ( 1) ( 3) 1 ( 5) ( 3) ( 1) ( 5) (2, 3),5 5

1 1SF ,2 2

7 ( 3) 1 1 ( 5) 1 11 5 , ,7 2 7 2 9 9

i j i ji j i j

X X X Y Y Y

X X Y Yd d

+ + + + + +⎛ ⎞= ⎜ ⎟⎝ ⎠

+ + + − + − − + − + − + − + −⎛ ⎞= = −⎜ ⎟⎝ ⎠⎛ ⎞− + − += ⎜ ⎟

+ +⎝ ⎠− − + − − − +⎛ ⎞ ⎛ ⎞= =⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠

… …

Page 8: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG

1102

11

20 02

2 2 2 2 2 2 2 2 2 2

21RF arctan21 2((5)(2) (3)( 2) (0)(0) ( 3)(2) ( 5)( 2)) arctan2 ((5) (3) (0) ( 3) ( 5) ) ((2) ( 2) (0) (2) ( 2) )

0.149249 8.55 .

uu u

=−

+ − + + − + − −=

+ + + − + − − + − + + + −

= =

According to the calculated values of the above three parameters, the center of the PA-based search pattern is moved from (0, 0) to (2, − 3) due to TF = (2, − 3); the width and the height of the PA-based search pattern are 11 (= 9 × 11/9) and 5 (= 9 × 5/9), re-spectively, due to SF = (11/9, 5/9), and the positions of nine search points are determined by rotating the search pattern 8.55°. Consequently, the coarse search pattern found by our proposed PATSS scheme is shown in Fig. 4. In next section, experimental results will show that the execution time required in the determination of PA-based coarse search pattern can be ignored when compared to the search time required in block motion esti-mation.

To avoid prediction failure in a stationary block accompanied with neighboring blocks with large motions, the MSE between the current frame and the reference frame at the origin of the search window (see the black rhombus point in Fig. 4) is first computed. If the calculated MSE is less than the threshold T, output the origin, position (0, 0), as the motion vector; otherwise we compute the MSEs of the nine search points on the PATSS- based rectangular search pattern and then move the center of PATSS-based rectangle to the winning point while shrinking the rectangle each side and each time by half in further computations of MSEs. If the shrunk rectangle is still large enough that no overlap hap-pens, eight new search points will be taken into consideration; otherwise fewer (3 × 2 − 1), (3 × 1 − 1), (2 × 2 − 1) or (2 × 1 − 1) search points may need to be considered. Repeat the moving and shrinking process until no new search points appear.

In order to improve the image quality, a value δ is used to enlarge the PATSS-based rectangular search pattern to contain more missing true motion vectors from predicted rectangular search pattern. In the coarse search, the δ is not used yet. After the first shrinking process of the rectangular search pattern being shrunk by half, we add δ to each side of the rectangle and then compute the MSEs of the new rectangular search

Fig. 4. An example of the coarse search pattern

of our proposed PATSS scheme. Fig. 5. An example of the coarse search pattern

of the proposed PAE3SS scheme.

Page 9: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

IMPROVED THREE-STEP-SEARCH SCHEME

1103

pattern. In the other shrinking processes, the δ will not be added again. In other words, we only add δ once before the second computation of MSEs. For example, if the coarse search pattern of the PATSS-based rectangular search pattern is a 9 × 5 rectangle, the second search pattern computes the MSEs of a (5 + δ) × (3 + δ) rectangular search pat-tern and the third search pattern computes the MSEs of a (3 + δ/2) × (1 + δ/2) rectangular search pattern. Empirically, the value of δ is set to 3 for PATSS and 4 for PAE3SS. Our proposed PATSS search scheme is described below. Step 1: Compute the MSE between the current frame and the reference frame at the ori-

gin of the search window. If the calculated MSE is less than the threshold T, stop the PATSS search scheme and output the origin as the motion vector of the cur-rent block.

Step 2: According to the four spatial and one temporal neighboring motion vectors of the current block, by Eqs. (2), (3), and (4), the three prediction- and affine transfor-mation-based parameters, TF, SF, and RF, are calculated, and then the PATSS- based rectangular search pattern is determined.

Step 3: Perform the coarse search in the PATSS-based rectangular search pattern. That is, we first compute the MSEs of the nine search points or fewer search points when overlap happens on the rectangular search pattern, and then we obtain the win-ning point.

Step 4: Move the center of the rectangular search pattern to the winning point obtained by last computation of MSEs and shrink each side of the rectangle by half. If this is the first shrinking, each side is added by the δ after shrinking. Compute the MSEs of new eight or fewer search points. According to the winning point ob-tained from the new eight or fewer search points, repeat the moving and shrink-ing process until the rectangular search pattern can not be shrunk anymore. Out-put the position of the final winning point as the motion vector.

3.2 Incorporating PATSS with E3SS Algorithm

In this subsection, we present how to employ the proposed PATSS search scheme

into some existing well-known motion estimation algorithms, such as the newly pub-lished E3SS algorithm [12] which is taken as the representative, to improve the image quality and reduce the number of search points.

The search scheme used in the currently published E3SS algorithm [12] is a hy-brid search scheme consisting of the SDSP search pattern and the TSS search scheme. The proposed PATSS-based E3SS (PAE3SS) algorithm mainly replaces the TSS search scheme in the E3SS algorithm by the PATSS search scheme. The SDSP search pattern used in the E3SS algorithm is translated to the center of our proposed PATSS-based rec-tangular search pattern. However, the translated SDSP pattern will not be scaled to re-serve the efficiency if the center of the PATSS-based rectangular search pattern is near the true motion vector. Since the search pattern of the translated SDSP is condensed, ro-tating it is unnecessary. The predicted PATSS search pattern associated with the trans-lated SDSP is shown in Fig. 5 where the translated SDSP is denoted by the set of five circles and the square points represent the PATSS-based coarse search pattern.

Based on the proposed PAE3SS search scheme as shown in Fig. 5, our proposed

Page 10: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG

1104

PAE3SS algorithm for motion estimation is described below. Since step 1 is quite similar to the one in the proposed PATSS search scheme described in section 3.1, we thus omit step 1 in the proposed PAE3SS algorithm. Step 2: According to the four spatial and one temporal neighboring motion vectors of the

current block, by Eqs. (2), (3), and (4), the three prediction- and affine transfor-mation-based parameters, TF, SF, and RF, are calculated, and then the PAE3SS- based rectangular search pattern is determined and an SDSP search pattern with five search points around the center of the PAE3SS-based rectangular search pat-tern is constructed.

Step 3: Perform the coarse search on the nine search points of the PAE3SS-based rec-tangular search pattern and on the four search points on the SDSP. The winning point is thus obtained. If the winning point is at the center of the SDSP pattern, output the position of the SDSP center as the motion vector; otherwise, if the winning point is on the SDSP (excepting the center of SDSP), then go to step 4, else if the winning point is located on the outer rectangular search pattern, go to step 5.

Step 4: Perform the same search operation in step 2 of the E3SS algorithm (see subsec-tion 2.2), except that the location of the initial SDSP search pattern is translated according to the translation factor TF by Eq. (2).

Step 5: Perform the same search operation in step 4 of the PATSS-based search scheme described in subsection 3.1.

4. EXPERIMENTAL RESULTS

The experiments use “tennis” and “Susie” video sequences in SIF 352 × 240 format with 61 and 75 frames, respectively, “Akiyo” video sequence in CIF 352 × 288 format with 60 frames, and “football,” “tennis” and “Susie” video sequences in ITU601 720 × 480 format with 59, 39, and 99 frames, respectively, to compare the computation per-formance and the image quality performance. The “football” sequence mainly contains some large motion contents; the “tennis” sequence contains large motion contents of the ball and contains small motion contents of the background before camera zooming. The “Akiyo” and “Susie” sequences mainly contain small motion contents. The concerned programs are implemented by Borland C++ Builder 6.0 language and a Pentium 4 3.2GHz PC with 1GB RAM.

Table 1. Performance evaluation for SIF and CIF sequences.

Tennis sequence (SIF) Susie sequence (SIF) Akiyo sequence (CIF) MSE Search points MSE Search points MSE Search points FS TSS PATSS (δ=3) E3SS PAE3SS (δ=3)

91.613 157.821 113.318 127.490 109.736

859.455 30.633 19.289 19.543 11.427

20.70227.35221.88823.96121.808

859.455 30.484 18.566 16.905 9.613

3.646 3.847 3.652 3.679 3.666

869.333 30.616 6.364 10.243 2.447

Page 11: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

IMPROVED THREE-STEP-SEARCH SCHEME

1105

Table 1 shows the results of SIF and CIF sequences where the block size is of 16 × 16 and the size of search window is 31 × 31. The threshold T and δ is set to 1 and 3, re-spectively. When compared to the TSS search scheme, our proposed PATSS search scheme can save 37% to 79% search points while reducing 5% to 28% MSE and signifi-cantly outperforms the TSS search scheme in terms of image quality and search speed. In the same table, experimental results also show that the PAE3SS search algorithm can save 42% to 76% search points while reducing 0.3% to 14% MSE when compared to the current published E3SS search algorithm which has similar performance to diamond search shown in [12]. And full search is also compared for its least distortion. In addition, Table 2 shows the percentages of search points and MSE reduction by using the threshold T to stop the search at the origin of the search window, by using TF to predict the initial search point, and by using SF and RF (i.e. SF + RF) to determine the new search pattern. From Table 2, it is observed that in tennis and Susie sequences, using TF to predict the initial search point can reduce MSE efficiently and using SF and RF to determine the new search pattern can significantly reduce the number of the required search points. Since the motion vector of each frame in the Akiyo sequence is very small, a large num-ber of the search points can be reduced by using the threshold T and TF. For saving space, as shown in Fig. 6, we only depict the detailed frame comparisons for tennis sequence in SIF format. From Fig. 6, it is observed that the performance of frame by frame compari-sons is similar to the average performance. The influences of δ on the number of search points and the MSEs are shown in Table 3. When δ ≤ 3, the larger the value of δ is, the more the number of search points is; the higher the image quality is. In this paper, the value of δ is set to 3 since the MSE is increased when δ > 3.

0 10 20 30 40 50 600

50

100

150

200

250

300

Frame

MS

E

TSSPATSS

0 10 20 30 40 50 60

14

16

18

20

22

24

26

28

30

32

Frame

Sea

rch

Poi

nt

TSS

PATSS

(a) MSE comparison for TSS and PATSS. (b) Search points comparison for TSS and PATSS.

(c) MSE comparison for E3SS and PAE3SS. (d) Search points comparison for E3SS and PAE3SS.

Fig. 6. Frame by frame comparisons for tennis sequence in SIF format under 31 × 31 search window.

Page 12: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG

1106

Table 2. The percentages of the search points and MSE reduction by using the threshold T, TF, and SF + RF for SIF and CIF sequences.

Tennis sequence (SIF) Susie sequence (SIF) Akiyo sequence (CIF) MSE Search points MSE Search points MSE Search points PATSS (T) PATSS (TF) PATSS (SF+RF) PAE3SS (T) PAE3SS (TF) PAE3SS (SF+RF)

0% 68.55% 31.45%

0% 85.81% 14.19%

0% 17.88% 82.12%

0% 17.39% 82.61%

0% 78.67%21.33%

0% 87.09%12.91%

0% 8.47% 91.53%

0% 21.39% 78.61

0% 88.21%11.79%

0% 92.31%7.69%

19.57% 59.02% 21.41% 37.84% 41.28% 20.88%

Table 3. Number of search points and the MSEs over different δ ’s for SIF and CIF se-quences.

Tennis sequence (SIF) Susie sequence (SIF) Akiyo sequence (CIF) MSE Search points MSE Search points MSE Search points PATSS (δ = 1) PATSS (δ = 2) PATSS (δ = 3) PATSS (δ = 4)

136.386 120.815 113.318 118.577

9.907 16.910 19.289 24.248

25.011 22.24121.88821.738

9.553 16.349 18.566 23.466

3.681 3.659 3.652 3.659

3.647 6.270 6.364 8.904

PAE3SS (δ = 1) PAE3SS (δ = 2) PAE3SS (δ = 3) PAE3SS (δ = 4)

127.474 114.869 109.736 111.719

8.827 10.479 11.427 12.398

23.74822.02821.80821.718

7.989 8.982 9.613 10.114

3.669 3.667 3.666 3.666

2.427 2.440 2.447 2.453

Table 4. Performance evaluation for ITU601 sequences.

Tennis sequence Susie sequence Football sequence MSE Search points MSE Search points MSE Search points

FS TSS PATSS (δ=3) E3SS PAE3SS (δ=3)

149.412 204.646 167.776 195.285 168.519

910 31.759 19.785 19.730 10.162

16.28522.61718.68620.99118.945

910 31.774 21.153 20.192 12.827

193.301245.210242.592244.088242.920

910 31.763 20.798 21.845 15.677

Table 4 shows the results of ITU601 sequences. When compared to the TSS search

scheme, our proposed PATSS algorithm can save 33% to 38% search points while reduc-ing 1% to 18% MSE. Experimental results also show that the PAE3SS search algorithm can save 28% to 48% search points while reducing 1% to 14% MSE when compared to the current published E3SS search algorithm. For saving space, we do not show the per-centages of search points and MSE reduction by using the threshold T, TF, SF and RF for ITU601 sequences. In fact, the percentages of search points and MSE reduction for ITU601 sequences are very similar to those for SIF and CIF sequences. For the same reason, the influences of δ on the number of search points and the MSEs in ITU601 se-quences are not provided here.

Besides the TSS algorithm and the E3SS algorithm, the FAME algorithm [1] has been considered in the experiments to evaluate the performance of our proposed PATSS

Page 13: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

IMPROVED THREE-STEP-SEARCH SCHEME

1107

algorithm and PAE3SS algorithm. The experimental results show that under the same level of the MSE, the required search points in the FAME algorithm is between those in the PATSS algorithm and the PAE3SS algorithm when the video sequences have small and modest moving objects. For example, the number of search points in the FAME al-gorithm is 12.321 (17.848) for SIF (ITU601) Susie sequence and is 15.839 (15.248) for SIF (ITU601) tennis sequence. If most motion vectors in the video sequence are large, the number of search points required in the FAME algorithm will slightly larger than those of the PATSS algorithm and the PAE3SS algorithm.

Because of lower percentage of the required preprocessing time over the total time, Table 5 indicates that the required preprocessing time can be negligible. In Table 5, the preprocessing time denotes the execution time required in the determination of the pre-dicted PATSS-based rectangular search pattern by using our proposed method; the total execution time denotes the execution time required in the proposed PATSS algorithm.

Table 5. Total execution time and preprocessing time requirement (seconds). Total execution time Preprocessing time Percentage of preprocessing

time over the total time Tennis (SIF) 1.560 0.047 3.01% Susie (SIF) 1.859 0.062 3.33% Akiyo (SIF) 0.794 0.032 4.03% Football (ITU601) 8.656 0.219 2.53% Tennis (ITU601) 5.271 0.199 3.78% Susie (ITU601) 12.691 0.375 2.95%

5. CONCLUSION

In this paper, the proposed prediction- and affine transformation-based PATSS search scheme has been presented. Due to the prediction accuracy of the PA-based coarse search pattern, experimental results for SIF and CIF (ITU601) sequences demonstrate that the proposed PATSS search scheme can save 37% to 79% (42% to 76%) search points while reducing 5% to 28% (0.3% to 14%) MSE. After employing the proposed PATSS search scheme into the existing E3SS search algorithm, experimental results demonstrate that our proposed PAE3SS search algorithm outperforms the existing E3SS search algorithm in both time and image quality, and it meets the advantages of the pro-posed PATSS search scheme.

Besides, maybe our proposed PATSS search scheme can be considered as one of the candidates in the adaptive fast block-matching algorithm by switching search patterns [1, 10].

REFERENCES

1. I. Ahmad, W. G. Zheng, J. C. Luo, and M. Liou, “A fast adaptive motion estimation algorithm,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, 2006, pp. 420-438.

2. L. C. Chang, K. L. Chung, and T. C. Yang, “An improved search algorithm for mo-

Page 14: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

KUO-LIANG CHUNG, TA-JEN YAO AND YONG-HUAI HUANG

1108

tion estimation using adaptive search order,” IEEE Signal Processing Letters, Vol. 8, 2001, pp. 129-130.

3. C. Y. Chen, C. T. Huang, Y. H. Chen, and L. G. Chen, “Level C+ data reuse scheme for motion estimation with corresponding coding orders,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, 2006, pp. 553-558.

4. L. G. Chen, W. T. Chen, Y. S. Jehng, and T. D. Chuieh, “An efficient parallel mo-tion estimation algorithm for digital image processing,” IEEE Transactions on Cir-cuits and Systems for Video Technology, Vol. 1, 1991, pp. 378-384.

5. M. J. Chen, C. S. Chen, and M. C. Chi, “Temporal error concealment algorithm by recursive block-matching principle,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 15, 2005, pp. 1385-1393.

6. C. H. Cheung and L. M. Po, “A novel cross-diamond search algorithm for fast block motion estimation,” IEEE Transactions on Circuits and Systems for Video Technol-ogy, Vol. 12, 2002, pp. 1168-1177.

7. K. L. Chung and L. C. Chang, “A new predictive search area approach for block mo-tion estimation,” IEEE Transactions on Image Processing, Vol. 12, 2003, pp. 648- 652.

8. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Section 11:1.2: Po-lygonal Approximations, 2nd ed., Prentice Hall, New York, 2002.

9. C. T. Huang, C. Y. Chen, Y. H. Chen, and L. G. Chen, “Memory analysis of VLSI architecture for 5/3 and 1/3 motion-compensated temporal filtering,” in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 2005, p. 93.

10. S. Y. Huang, C. Y. Cho, and J. S. Wang, “Adaptive fast block-matching algorithm by switching search patterns,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 15, 2005, pp. 1373-1384.

11. J. R. Jain and A. K. Jain, “Displacement measurement and its application in inter-frame image coding,” IEEE Transactions on Communications, Vol. COM-29, 1981, pp. 1799-1808.

12. X. Jing and L. P. Chau, “An efficient three-step search algorithm for block motion estimation,” IEEE Transactions on Multimedia, Vol. 6, 2004, pp. 435-438.

13. T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion compensated interframe coding for video conferencing,” in Proceedings of National Telecommu-nication Conference, 1981, pp. G5.3.1-G5.3.5.

14. Y. C. Lin and S. C. Tai, “Fast full-search block-matching algorithm for motion-com- pensated video compression,” IEEE Transactions on Communications, Vol. 45, 1997, pp. 527-531.

15. R. Li, B. Zeng, and M. L. Liou, “A new three-step search algorithm for block motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 4, 1994, pp. 438-442.

16. L. K. Liu and E. Feig, “A block-based gradient descent search algorithm for block motion estimation in video coding,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 6, 1996, pp. 419-423.

17. V. A. Nguyen and Y. P. Tan, “Efficient block-matching motion estimation based on integral frame attributes,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, 2006, pp. 375-385.

Page 15: New Prediction- and Affine Transformation-Based Three-Step ...€¦ · JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1095-1109 (2008) 1095 New Prediction- and Affine Transformation-Based

IMPROVED THREE-STEP-SEARCH SCHEME

1109

18. Y. Nie and K. K. Ma, “Adaptive rood pattern search for fast blockmatching motion estimation,” IEEE Transactions on Image Processing, Vol. 11, 2002, pp. 1442-1449.

19. L. M. Po and W. C. Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 6, 1996, pp. 313-317.

20. J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion estimation,” IEEE Trans-actions on Circuits and Systems for Video Technology, Vol. 8, 1998, pp. 369-377.

21. A. M. Tourapis, O. C. Au, and M. L. Liou, “Fast block-matching motion estima-tion using predictive motion vectorfield adaptive search technique (PMV-FAST),” ISO/IEC JTC1/SC29/WG11 MPEG2000/M5866, Noordwijkerhout, The Netherlands, 2000.

22. S. Zhu and K. K. Ma, “A new diamond search algorithm for fast blockmatching mo-tion estimation,” IEEE Transactions on Image Processing, Vol. 9, 2000, pp. 287-290.

Kuo-Liang Chung (鍾國亮) received the Ph.D. degree from National Taiwan Univ. Prof. Chung received the Distinguished Research Award (2004-2007) from the National Science Council, Taiwan. His research interests include image/video compression, image/video processing, and multimedia applications.

Ta-Jen Yao (姚達人) received the M.S. degree in Computer

Science and Information Eng. from National Taiwan University of Science and Technology. His research interests include image processing and image compression.

Yong-Huai Huang (黃詠淮) received the M.S. degree in Computer Science and Information Engineering from National Taiwan University of Science and Technology. He is now pursu-ing the Ph.D. degree in the same department. His research inter-ests include image processing, image/video compression, and multimedia applications.