-
Video Stabilization based on Point Feature Matching
Technique
Labeeb Mohsin Abdullah, Nooritawati Md Tahir* & Mustaffa
Samad Faculty of Electrical Engineering
Universiti Teknologi MARA (UiTM) 40450, Shah Alam, SELANGOR
Corresponding author: norita_tahir@ yahoo.com*
Abstract This study proposed an algorithm to stabilize jittery
videos directly without the need to estimate camera motion. A
stable output video will be attained without the effect of jittery
that caused by shaking the handheld camera during video recording.
Firstly, salient points from each frame from the input video is
identified and processed followed by optimizing and stabilize the
video. Optimization includes the quality of the video stabilization
and less unallied area after the process of stabilization. The
output of using such method showed good result in terms of
stabilization and discarded distortion from the output videos
recorded in different circumstances. Initial results showed that
the proposed technique is suitable to be used and provide great
deal of stabilization.
Keywords: image processing, video stabilization, point feature
matching, salient points, image quality measurement
I. INTRODUCTION Recently, the market of handheld camera has
growth rapidly.
However, video capturing by non-professional user normally will
lead to unanticipated effects. Hence, many researchers study such
drawbacks to enhance the quality of casual videos. Currently,
hardware stabilizers are attached to the cameras as effective
solution. On one hand, pre-processing techniques such as nonlinear
filters is applied to discard the unwanted noise. On the other
hand, using multi-stages for pre and post processing could
aggravate the existing problems according to errors accumulative.
However, there are shortcomings related to process the videos with
complicated motion such as multiple moving foreground objects
[1].
Generally the process of stabilization have to go through three
phases namely motion estimation, motion smoothing and image
composition [2]. For the first phase the purpose is to estimate the
motion between frames. After that, the parameters of estimated
motion which is obtained from the first phase will be sent to
motion compensation, where removing the high-frequency distortion
and calculating the global transformation, which is very important
to stabilize the current frame. Next, warping will be done by image
composition for the frame under
processing [9] .This three-step frameworks are the essential
steps in most of the video stabilization algorithms.
II. MATERIALS AND METHOD This section will present the overview
of the proposed
methodology and implementation as depicted in Figure 1.
Figure 1: Overview of the proposed method of video
stabilization.
Reading frames from video sequence recorded
Identify salient points from each frame Perform Harris Corner
Detection
Start
Select Correspondences between points Apply sum of squared
differences SSD Estimate transform from noisy
correspondences Find initial points of two frames
Apply Gaussian filter Color composite of affine and S-R-T
transform output Transform approximation and smoothing Corrected
frame sequence
END
2012 IEEE Control and System Graduate Research Colloquium
(ICSGRC 2012)
978-1-4673-2036-8/12/$31.00 2012 IEEE 303
-
A. Identification of salient points from each frame & Harris
Corner Detection
The main goal of this step is to correct the distortion between
the two frames by finding a transformation that will be done by
applying an object system which returns affine transform [3]. The
input for this stage should supply the object with a set of point
correspondences between the two frames [4]. Firstly, the wanted
points from the two chosen frames have to be identified followed by
selecting the common correspondence between the frames. At this
point, the candidate points for each frame are identified but to
make sure that these points will have corresponding points in the
second frame, it is necessary to find points around salient image
features, like corners. Thus, Corner Detector System Object is used
to find corner values using Harris Corner Detection which is one of
the fastest algorithms to find corner values.
B. Select correspondences between points & SSD After the
salient points from each frame are obtained the
correspondence between the points that are identified previously
need to be picked [4]. For each point, the matching of lowest cost
between the points that existed in frame A and B are also needed to
be found for all points. Hence, it is necessary to divide the
sequence of frames image into 99 block. The matching cost means the
distance between frame A and B measured in pixel. To find this
cost, the technique of Sum of Squared Differences (SSD) can be used
between the consecutive frame images. Each point in frame A is
compared with the points in frame B to find the lowest matching
cost or in other words the shortest distance between them measured
in pixels.
III. RESULTS AND DISCUSSION In this section, the results
attained based on the proposed
methodology will be discussed. Table I showed the basic
characteristics of each video utilized as database in this study.
In addition, the values of the size and the number of bytes for the
salient points existed in each video are also tabulated.
A. Strong corners Detection Firstly, an algorithm is developed
based on Harris and
Stephens corner detection algorithm [2] to identify all salient
points or strong corners from each frame. These points are
considered as the anchor points as benchmark for points to be
considered and vice versa. Sample of detected points obtained from
two frames are as demonstrated in Figure 2. Furthermore, it is
observed the total points covered are the same frame features for
instance the salient points along the trees, corners of the
sidewalk and the moving object.
Figure 2: The detected strong corners from both frames where
they marked with green dots.
B. Corresponding points Next, the initial correspondences
between the points that is identified from the previous step will
be invoked. Correspondences between the invoked points have to be
picked for each point, for that purpose a matrix of 9 x 9 blocks
will be extracted around each point from its consecutive image
frames. The most important here is matching the cost between points
by finding the Sum of Squared Differences (SSD) between the
consecutive image regions of frames. Thus we have to find the
lowest costs to consider them in the solution [8]. Figure 3 showed
the same positions for the green color points of the initial
corresponding points existed in both frames.
Figure 3: Corresponding points between frames.
However, not all these correspondence points are correct, which
means many of them are redundancy points, but at the same time
there is a significant number of outlier points as well. This lack
will be considered in the next step. SSD will ensure to find
the
2012 IEEE Control and System Graduate Research Colloquium
(ICSGRC 2012)
304
-
minimum cost matching point in points B with the aid of
features, which resulted a loop over points A that search for best
matches in points B with features contribution.
C. Accurate correspondence As mentioned above, there are several
incorrect point correspondences but strong estimation of geometric
transform between the two image frames can be determined using the
random sample consensus algorithm (RANSAC) [5][6]. This algorithm
searched through the given set of point correspondences
specifically valid linear correspondence as in Figure 4.
Figure 4: Correct correspondences according to RANSAC.
From Figure 4, the inliers correspondences consecrated in the
image background, not in the foreground, which itself is not
aligned; is observed. The reason stand behind this is the
background features are far enough that act as if they were on an
infinitely distant plane. We can assume that background plane is
static and will not change dramatically between the first and
second frame, instead, this transform is capturing the motion of
the camera. Thus correcting process will stabilized the video.
Furthermore, as long as the motion of the camera between frame A
and frame B is minimize or the time of sampling the video is high
enough, this condition is maintained. The RANSAC algorithm is
repeated multiple times and at each run the cost of the result is
calculated by projecting frame B onto frame A via Sum of Absolute
Differences between the two image frames and the results attained
is as in Table I.
Firstly, the effect of the number of corners is influenced by
the output as resulted in Vid1 with highest matching point values
but least SSD followed by Vid3 and Vid2 respectively. This
indicated that Vid2 comprised of the maximum number of salient
points to be handled since the SSD attained is the highest.
D. Frames Correction Further, the raw mean video frames and the
mean of
corrected frame are computed as in Figure 5.
Figure 6: Cod Figure 5: Corrected frames
The left image showed the mean of the raw input frames that
resembled the distorted original video frame due to extreme
jittery. On the right side is the mean of the corrected frames with
less distortion. This proven that the stabilization algorithm
worked well. Several more samples of corrected video frames are as
depicted in Figure 6.
E. Quality The output video quality is also measured based on
the
proposed methods. This is evaluated based on SVD based grayscale
Image value and graphical measurement.
i. SVD Based Grayscale Image Quality Singular value
decomposition (SVD) is developed as a new measurement that can
express the quality of distorted images either graphically that is
in 2D measurement or numerically as a scalar measurement, both near
and above the visual threshold. The experiments here utilized SVD
based measurement that outperformed the normally used PSNR [10].
Equation 1 represented the computed value for this purpose:
2012 IEEE Control and System Graduate Research Colloquium
(ICSGRC 2012)
305
-
where: Dmid represents the midpoint of the sorted DiS k is the
image size n is the block size M SVD is the measurement of Singular
value decomposition
An example for the output quality for Vid1 based on Equation 1
with k=8, n=1, Di and Dmid represented by 256256 matrix attained
M-SVD of 22.20. Hence, the numerical quality obtained from the
three sample videos are tabulated in Table 1. As visualize in
Figure 6, it can be seen that Vid3 obtained the best quality based
on the calculated value that is 40.50% followed by Vid2 with 39.21%
and Vid1 22.20%. This resembled that Vid1 has great distortion
whilst Vid3 is least distorted.
i. Graphical measurement The criteria of measuring graphical
quality in any image or frame can be done as shown in Figure 7.
Graphical measurement will indicate the condition of video due to
distortion. As seen in
the graphical results, Vid1 has the most size of blocks among
all frames blocks to indicate that it is the worst video
experienced distortion followed by Vid2 and Vid3. This result
agreed with the SVD value calculated earlier.
IV. CONCLUSION In conclusion, the video stabilization technique
based on
proposed method showed remarkable results in term of stabilizing
high jittery videos suffered from distortion. Initial results also
proven that due fusions of RANSAC algorithm, Gaussian filter,
Harris, Stephenss and SAD efficiency stabilization process
succeeded based on the output quality attained. Future work
includes finding better feature detector and overcome the
consequences of extreme shaking of handheld camera in feasible real
time implementation for video stabilization.
Acknowledgment Funding for presenting this study was supported
by Faculty of Electrical Engineeering, UiTM Shah Alam,
Selangor.
Figure 6: Three input videos (Vid1, Vid2, Vid3) with different
stabilization need to be done since recorded in different
circumstances.
Figure 7: Graphical measurement for each sample videos (Vid1,
Vid2 & Vid3) as quality stabilization indication
2012 IEEE Control and System Graduate Research Colloquium
(ICSGRC 2012)
306
-
REFERENCES [1] M. Gleicher and F. Liu., Re-cinematography:
Improving the camerawork
of casual video, ACM Transactions on Multimedia Computing,
Communications, and Applications, 5(1), pp 1- 28, 2008.
[2] C. Harris and M.J. Stephens, A combined corner and edge
detector, Proc of Alvey Vision Conference, pp 147152, 1988.
[3] Anu Suneja and Gaurav Kumar . An Experimental Study of Edge
Detection Methods in Digital Image, Global Journal of Computer
Science and Technology, 10(2), 2010.
[4]
http://www.mathworks.com/products/computer-vision/demos.html?file=/products/demos/shipping/vision/videostabilize_pm.html.
[5] Fischler, MA; Bolles, RC. "Random Sample Consensus: A
Paradigm for Model Fitting with Applications to Image Analysis and
Automated Cartography." Comm. of the ACM 24, 1981.
[6] Tordoff, B; Murray, DW. "Guided sampling and consensus for
motion estimation." 7th European Conference on Computer Vision,
2002.
[7] J. Jin, Z. Zhu, and G. Xu. Digital video sequence
stabilization based on 2.5D motion estimation and inertial motion
filtering, Real-Time Imaging, 7(4):357365, 2001.
[8]
http://siddhantahuja.wordpress.com/tag/sum-of-squared-differences/
[9] M. Pilu. Video stabilization as a variation problem and
numerical
solution with the Viterbi method. In Proceedings of Computer
Vision and Pattern Recognition, pp 625630, 2004.
[10] Aleksandra Shnayderman, Alexander Gusev, and Ahmet M.
Eskicioglu An SVD-Based Grayscale Image Quality Measure for Local
and Global Assessment ,IEEE 15(2), 2006.
Table I: Criteria of sample videos & Results (Type: RGB
& Extension: AVI) Sample 1st Frame Sample
Inputs
Size
Bytes No
Frames # &
Length
SAD Value
Computational Time (s)
Quality Value
(M -SVD)
Vid1 2x139 1112 34 & 2 8.85e5 8.81 22.20 %
Vid2 2x66 528 73 & 3 0.0824 9.18 39.21%
Vid3 2x128 1024 132 & 4 3.553e3 9.28 40.50%
2012 IEEE Control and System Graduate Research Colloquium
(ICSGRC 2012)
307