Click here to load reader
Jun 24, 2018
Introducing In-Frame Shear Constraints for Monocular MotionSegmentation
Thesis submitted in partial fulfillmentof the requirements for the degree of
MS by Researchin
Computer Science and Engineering
by
Siddharth TouraniID
International Institute of Information TechnologyHyderabad - 500 032, INDIA
July 2015
Copyright c Siddharth Tourani, 2015
All Rights Reserved
International Institute of Information TechnologyHyderabad, India
CERTIFICATE
It is certified that the work contained in this thesis, titled Introducing In-Frame Shear Constraints forMonocular Motion Segmentation by Siddharth Tourani, has been carried out under my supervision andis not submitted elsewhere for a degree.
Date Adviser: Prof. K Madhava Krishna
.
Acknowledgments
In chronological order.1) My Parents2) Mr. K K Kumar3) Dr. Gaurav Dar4) Dr. Jayanti Sivaswamy5) Dr. K Madhava Krishna6) Labmates7) RSP, Arp-Ray and Tejas.
v
Abstract
In this thesis, the problem of motion segmentation is discussed. The aim of motion segmentation isto decompose a video into different objects that move through the sequence. In many computer visionpipelines, this is an important, middle step. It is essential in several applications like robotics, visualsurveillance and traffic monitoring. While,there is already a vast amount of literature on the topic, theperformance of all thus-far proposed algorithms are far behind human perception.
This thesis starts of with a formal introduction to the problem. Then, it proceeds to explain themain approaches proposed to the problem, along with their advantages, and shortcomings. Finally, theproposed algorithm, that forms the keystone of this thesis, is introduced and fully-fleshed out, givingmotivation for the structure and the various parts of the algorithm. In addition, the traditional comparisonis given with the other-proposed state-of-the art algorithms. We do so, on the standard benchmarkHopkins-155 dataset, as well as a new dataset, compiled from video sequences from the publicallyavailable, KITTI dataset, the Versailles-Rond sequence taken from [] and several sequences taken aroundthe IIIT Hyderabad campus. The sequences in the dataset, consist of video footage taken from a single-camera mounted on the front of a car. The dataset is far more realistic and challenging than the Hopkinsdataset, and provides a more rigorous assessment for both the proposed algorithm, as well as otherstate-of-the-art algorithms in motion segmentation. This dataset is hereby referred to as the On-Roaddataset.
On the Hopkins-155, our algorithm achieves near state-of-the-art performance, while performingsubstantially better on the On-Road dataset, showing that the proposed algorithm, has superior perfor-mance in realistic scenarios.
vi
Contents
Chapter Page
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Two Main Approaches To Motion Segmentation . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Multibody Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.3 Gestalt Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.1 Hopkins-155 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 On-Road Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Objective of thesis and Design Principles underlying the proposed motion segmentationalgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.2 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 Frame Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Epipolar Geometry Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Optical Flow /Gestalt Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5 Subspace Clustering Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.0.1 Initial Foreground-Background Segmentation . . . . . . . . . . . . . . . . . . 203.0.2 Biased Affine Motion Model Sampling . . . . . . . . . . . . . . . . . . . . . 213.0.3 Initial Assignment of the Unsampled Points . . . . . . . . . . . . . . . . . . . 223.0.4 Segmentation Refinement by Energy-Minimization . . . . . . . . . . . . . . . 24
3.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.1.1 Model Merging Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1 Hopkins-155 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 On-Road Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
vii
viii CONTENTS
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
List of Figures
Figure Page
1.1 The two main approaches to point tracking. . . . . . . . . . . . . . . . . . . . . . . . 21.2 Interpretation of the Trajcectory Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Factorization of trajectory matrix W into motion matrix M and structure matrix S. . . . 41.4 Multibody Factorization of the trajectory matrix . . . . . . . . . . . . . . . . . . . . . 51.5 Patterns classified by Subspace Clustering Algorithms. (a) is taken from 77. (b) is taken
from 26. (c) is taken from 26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Illustration of how the epipolar constraint functions and fails. In (a) the 3d-point P onmoving to P off the epipolar plane is projected into the primed camera frame C abovethe epipolar line l. In (b) P still lies on the epipolar plane and is projected right ontothe l. In (b) the epipolar constraint cannot be used to detect P as moving. . . . . . . 12
3.1 An Overview of the Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Figure shows a result of the initial foreground-background segmentation. The fore-
ground (epipolar outliers) are shown in blue, and background (epipolar inliers) areshown in red. (a) In the non-degenerate case most, of the points on the moving ve-hicle have been categorized as not belonging to the background. (b) In the degeneratecase, most of the points on the vehicle belong to the background. . . . . . . . . . . . . 21
3.3 Difference between minimum residual and Top-k residual sampling. . . . . . . . . . . 243.4 Results from the Hopkins-155 dataset.The various stages of our proposed approach are
shown. The tracked points shown in the figure were not the ones used to verify theaccuracy of our approach on the Hopkins-155 dataset. These are shown here primarilyfor illustrative purposes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Illustration of how shear and stretch work. Top row: The case where models are detectedas seperate. In (a) and (b) the initial and final frames are shown along with two motionmodels. The notion of shear being clearly visible. In (c) is shown the cumulative shearvs number of frames plot. Likewise in (d), but for stretch. Bottom row: The case weremodels are merged are shown in the bottom row with symmetric plots. . . . . . . . . 28
4.1 Error Distribution Histogram for the Hopkins-155 Dataset . . . . . . . . . . . . . . . 324.2 Percentage Accuracy With Gaussian Noise Added To The Trajectory Data. . . . . . . . 324.3 Box-plot assessing the performancce of various algorithms at estimating the number of
motions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.4 Comparison of various state-of-the-art motion segmentation algorithms for the On-Road
Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
ix
List of Tables
Table Page
4.1 Total Error Rates in Noise-Free Case. The first row represents error rates for all (155seqeunces). The second and third rows are two and three motions, respectively. Thevalues of the algorithms were taken from their corresponding papers. . . . . . . . . . . 31
4.2 Total Computation Time on the Hopkins-155 Dataset in seconds. . . . . . . . . . . . . 314.3 Error Rates on the On-Road Dataset For the various sequences. Error Rates for the
various algorithms under consideration . . . . . . . . . . . . . . . . . . . . . . . . . . 35
x
Chapter 1
Introduction
This chapter starts off by introducing the motion segmentation problem. The two main approaches tothe motion segmentation problem are explained in Section 1.1. The datasets on which our experimentswere con