Page 1
http://www.iaeme.com/IJCIET/index.asp 627 [email protected]
International Journal of Civil Engineering and Technology (IJCIET) Volume 9, Issue 12, December 2018, pp. 627-645, Article ID: IJCIET_09_12_069
Available online at http://www.iaeme.com/ijciet/issues.asp?JType=IJCIET&VType=9&IType=12
ISSN Print: 0976-6308 and ISSN Online: 0976-6316
© IAEME Publication Scopus Indexed
A FAST LINEARLY BACK-END SLAM FOR
NAVIGATION BASED ON MONOCULAR CAMERA
Mohamed H. Mahmoud
Communications Engineering Dep., Fayoum University, Egypt
Nashaat M. Hussein
Communications Engineering Dep., Fayoum University, Egypt
Elsayed Hemayed
Computer Science Engineering Dept., Cairo University, Egypt
ABSTRACT
Recently Graph based SLAM has become a popular representation for solving
SLAM problem, but it is affected by its inherent nonlinearity due to noisy measurements
which lead to non-consistent framework. Nonlinear optimization algorithms, that are
used for large-scale keyframe based SLAM, suffer from issues associated with
initialization, iterations, local minima, and scale erroneous.The proposed method
avoids issues that are involved in the nonlinear optimization algorithms. The proposed
system provides the entire robot trajectory, at lower cost using monocular camera,
generates a globally consistent 3D model of the environment, and detects loop closure
with an optimized runtime. Our system utilizes a linear optimization method which
provides only one accurate solution, in contrast to the well-known nonlinear global
optimization like BA of ORB-SLAM, which provides multiple non-accurate solutions
for optimization. Furthermore, our system runs in real time and is 3X faster than the
current state-of-the-art SLAM systems. Our proposed system is used for autonomous
navigation of Micro Aerial Vehicles (MAVs). We tested our system using EuRoC, TUM,
and KITTI datasets. Simulation results and implemented SLAM experiments in large-
scale environments show that our system can be an alternative to filtering approaches
with a better accuracy, robustness, and more efficient than some well-known systems
Keywords: Robotic Systems; Monocular Camera; Tracking; Loop Closure.
Cite this Article: Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed,
a Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera,
International Journal of Civil Engineering and Technology, 9(12), 2018, pp. 627–645
http://www.iaeme.com/IJCIET/issues.asp?JType=IJCIET&VType=9&IType=12
Page 2
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 628 [email protected]
1. INTRODUCTION
SLAM is an active research topic for more than 30 years with overwhelming literature in the
robotics community (e.g. [1]). Mathematically, SLAM is modeled as a high-dimensional
nonlinear estimation problem whose aim is to build a map for unknown environments in
obscurity of referencing systems like GPS and simultaneously utilize the built map to find the
“optimal” estimate for robot poses (location and orientation) and correct errors of previous
estimations using noisy measurements and uncertain priors. The measurement function is
generally a nonlinear objective function in its arguments especially in case of camera sensor.
Generally, a graph- based SLAM system can be divided into three sequential modules [2];
frontend, backend and map representation. the frontend is used for measurement of motion
which processes sensor data to extract geometric motion and spatial constraints (data
association), e.g., between the key frame and map points at different points in time. The
backend (optimizer) is used to estimate and correct the poses of the keyframes (maximum a
posterior) to obtain a consistent map of the environment given the constraints from front-end.
For this target, back-end solutions have developed gradually from filter based methods (e.g. [3,
4]) to graph optimization methods [5].
Graph based optimization methods have the property of exploitation of the natural sparsity
structure of SLAM. It also enables better calculations by using nonlinear least square
optimizers like (e.g., g2o [6], Ceres [7]) which can detect and optimize loops to build a globally
consistent map free from drifts. Moreover, efficient visual SLAM solutions have been obtained
using linear optimization methods like [8, 9] exploiting the sparseness of the information
matrix and solving sparse linear equations, but no
Guarantee to get consistent map particularly for large-scale SLAM problems with
erroneous detected loops. In literature, to detect loops, there are reliable optimization
techniques [10] that are based on switching variables and weighting methods. These techniques
are sensitive to initialization, as they iteratively optimize a highly non-linear objective function.
The main contribution of this work is computing optimized paths for navigation, developing
and implementing visual SLAM using a linear method for a fast and accurate approximation
of pose graph optimization. The proposed algorithm requires no initial guess (assumption) and
detects loop closure. This is done by creating a system of linear equations (or linear system)
which is a collection of linear equations involving the same set of variables independent of rotations.
The method is used in [12] to compute keyframe rotations independent of computation of
keyframe poses. Most importantly we gain the advantages of linearity.
2 .RELATED WORK
Extensive research in the field of keyframe-graph based SLAM techniques have been
introduced in recent years. A comprehensive review of these techniques is beyond the scope of
this work. We present some of the most relevant and recent works related to visual SLAM
systems in the literature.
Map initialization. Monocular SLAM that depends on low-cost sensor of embedded
hardware is challenging. It requires a procedure to create an initial map knowing that depth
cannot be recovered from monocular camera. There are recent successful solutions for SLAM
as a filtering problem like in [14] where Kalman filters are used, and 3D keypoints are
initialized with a higher uncertainty in the depth. In these approaches every frame is processed
by the filter to jointly estimate the map point locations and the robot pose. These approaches
suffer from the uncertainty of initialization, the wasting of run time in processing consecutive
frames with little new information, and the accumulation of linearization errors. But methods
like [15] initializes SLAM system using an essential matrix assuming that the observation of
Page 3
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 629 [email protected]
camera is a general scene. The drawback of the computation of essential matrix is the need to
select optimum solution from multiple solutions. In addition to the process of determining
which matrix (essential or homography) is better than other, is very complicated, as discussed
in [5]. Therefore, ORB-SLAM presents a new automatic approach based on model selection to
decide whether the transformation can best be modeled with an essential matrix E for non-
planar scenes, or with a homography H for planar scenes, and each model is penalized
according to its symmetric transfer error. In the planar case (i.e. planar, nearly planar, and low
parallax), the selection of the homography may generate corrupted solutions. Therefore, ORB-
SLAM may fail the initialization in that case, and the system restarts with a different pair of
frames to get initial pose and initial map point.
The recent LSD-SLAM [16] randomly initializes the 3D map by initializing the depth of
the pixels to a random value with high variance. Thus, the LSD-SLAM does not have selection
problem of the initialization but may be unstable, because this system minimizes the
photometric error function that includes much more local minima than the geometric error. We
will show in the simulation how todeal with environments of arbitrary geometry in challenging
situations and any camera motion that induces parallax (general camera motions) in the
initialization stage.
Loop closure detection and correction. The algorithms that detect loop closures in pose
graph SLAM system, are either PGO (pose graph optimization) or BA. The implementations
of such optimizations are provided using libraries such as G2o [6] and Ceres [7]. These
implementations optimize the loops to detect and correct the error (e.g. for positive depth, re-
projection error, or scale drift) in the keyframe pose and in all relevant map data that were
generated during the trajectory of loop. The loop closure thread tries to resolve loops upon the
addition of a new keyframe, to detect and correct any accumulated error by the system over
time. Some methods [17] is using bag of words (BoW) to detect loops in trajectory of camera.
However, they are still creating false loops that lead to erroneous in relative motion
computations and failure of the optimization of the pose-graph.
SLAM Optimization and Linearization. The graph-based SLAM models the SLAM
system as graphical structure. In this system each keyframes poses or map points positions are
represented as nodes (metric map), with edges are the measurements (scans or odometry)
connecting them as constraints. The goal of optimization approaches is to optimize the poses
of the nodes to minimize the error introduced by the constraint, thus called an off-line or batch
processing technique. Bundle Adjustment (BA) [18] is solved with the Levenberg-Marquardt
(LM) nonlinear optimizer to minimize the reprojection error. Estimator consistency is vital for
SLAM because an inconsistent estimator provides no guarantee for the accuracy of the
generated state estimates, leading to unreliable robot/landmark estimates. In general, for small-
scale problems BA easily converges to the globally optimal solution. However, when the
number of keyframes is large, BA can be very time-consuming and can converge to a local
minimum, unless a good initialization tothe map and keyframe pose is used. There are SFM
techniques such GSLAM [26] which can’t be used for large scale environment, general scene,
and feature reprojection errors minimizations.
The recent linear SLAM [27] provides a linear approach based on sequential Bayesian
filtering of orthogonal planar features using Kalman filter, to estimate the camera poses and
multiple point’s planes in a pose graph optimization. The drawback of this system is used only
for planar environments, although it depends on RGBD camera.
In the literature, the nonlinear least squares problems (NLS) with partially linear residuals
are called separable, and in statistical term such issues are called conditionally Linear-Gaussian
state space models. Optimizing large graphs has been a serious impedimentfor SLAM system
Page 4
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 630 [email protected]
[23-25]. The problem is to reduce the residual errors by an optimum selection of the
optimization variables that can greatly simplify the problem structure and get consistent
graph(map). Inspiring by this work we develop, design, and implement the idea of linear
approximation of pose graph optimization to minimize residual error, detect loop closure, and
enhance Monocular SLAM initialization. To achieve this, we have implemented a SLAM
system with some new ideas and algorithms, but also incorporating excellent works developed
in the literature. Some of these works are the graph optimization of P. Barooah [23], ORB
features extractor and matcher developed by R. Mur-Arta [5], and the loop detection by bag-
of-words (BoW) of D. Galvez-Lpez [17]. A sequence of experiments, on-line as well as off-
line with different datasets of environments, are performed to validate the accuracy, speed, and
real-time performance of the developed linear system.
In summary, we propose a real-time visual SLAM system with the following properties:
1. Our system keeps the separable structure of SLAM, since we estimate the linear variables
using nonlinear measurements corrupted by Gaussian noise. Thus, the investigated system has
a linearity property which is a significant improvement compared to other systems.
2. Our system does not only maximize the accuracy of the robot trajectory, but also impedes
drift-buildup that minimizes the number of failures of the tracking. Thus, achieving accurate
and globally consistent framework compared to that of the state-of-the-art techniques.
3. Our implementation provides a faster Linear SLAM impediment to false loops.
4. Our system is capable of accurate initialization of vSLAM.
3. SYSTEM PRELIMINARIES AND PROBLEM FORMULATION
3.1. Graph based SLAM preliminaries
Let )3(SE}p,.....,p{=P N1 ∈∈∈∈ be a set of N nodes representing position of a mobile robot at
consecutive time instants (frame poses) and }e,.....,e{= M1ξξξξ be a set of M edges represent relative
translation vector between every two nodes ije . The objective of pose graph optimization is to
compute an estimate of the nodes configuration that maximizes the likelihood of the
measurements by using mathematical model that is solved by linear method. Since relative
pose measurements are affected by noise, the measured quantities are in the form ijijij s= ++++εεεεεεεε
where ijs ∈ R3 is a zero-mean Gaussian noise, i.e. )c,0(Ns ijij ≈≈≈≈ , being ijc a 3 × 3 covariance
matrix. The uncertainty of each edge measurement (covariance matrix) ijeC is obtained easily
from the uncertainties of the two frames locations as given by equation (2) as follows.
ijij pp=e −−−− (1)
pjpie CC=Cij
++++ (2)
3.2. Problem definition
The role of localization thread of SLAM system is to compute positions of the robot (frames)
in the world coordinate frame }z,y,x{pt ==== , where z,y,x translation are coordinates. For each frame
we need to find a set of 2D image points u such that t)proj(X, =u , where X ∈R3 and proj is the
projection function, that projects 3D triangulated points X from current observation t)(X, in the
map of SLAM to the 2D image plane.
Figure 1 shows the reprojection errors term. To keep the model as close to the ground-truth
as possible, we maximize the likelihood of the measurements by using mathematical model
Page 5
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 631 [email protected]
that is solved by linear method and minimize reprojection error. The variables to be optimized
are tp . It should be mentioned that the projection function proj is strongly nonlinear, hence the
optimization problem has many minima, hence we should have good initial solutions for tp .
Fortunately, the previous stage of RANSAC and triangulation provide such good initial
solutions.
In a summary our target in this research is the detection of loops and graph optimization.
We can distinguish the two different threads as described in Figure 3, although they are closely
linked. The first thread consists of the identification of a previously visited place (loop). When
a loop is detected, a new relation is added to the graph that relates the current pose with the
pose in the past where we visited the same place. The second thread tries to reduce the
accumulated errors from the pose estimation based on pairwise alignment and reprojection
errors. This will get more consistent maps, especially when a place is revisited after a long
period of time.In a summary our target in this research is the detection of loops and graph
optimization.We can distinguish the two different threads as described in Figure 3, although
they are closely linked. The first thread consists of the identification of a previously visited
place (loop). When a loop is detected, a new relation is added to the graph that relates the
current pose with the loop pose. The second thread tries to reduce the accumulated errors from
the pose estimation based on reprojection errors using proposed linear algorithm. This will get
more consistent maps, especially when a place is revisited after a long period of time.
4. THE GENERAL STRUCTURE OF THE PROPOSED VSLAM SYSTEM
The general structure of the proposed vSLAM system is depicted in Figure 2, and its main
modules are described in the following sections. Our system generates a global, scalable, and
reliable 3D map based on pose-graph of keyframes and computing relative poses of frames.
Our proposal SLAM system has three main modules; tracking, loop closure, and optim
ization modules. Our SLAM system can be divided into two parts front-end and back-end.
The front end is used to determine camera pose (translation and rotation) and detect loop
closure. The back-end part is used to optimize the trajectory of robot by reducing accumulated
errors using our new investigated method. This efficient distribution allows for a continuous
tracking of the tracking module while the global optimization and the loop closure ones are
processed in the background only when a new keyframe is added. System takes a set of
pairwise relative poses between cameras and rotations as input, and outputs the position of all
keyframes in global coordinate. The camera poses are computed through motion averaging
linear algorithm.
4.1. Tracking module
We use a pinhole camera model, in which the camera is modeled as a dark box with a small
hole that allows light rays to fall on the back plane of the box. The camera mapping function
is termed perspective projection and the parameters for each frame rotation matrix R and a
translation vector t . The relations for projecting a 3D point X into image plan is:
tRX=U ++++ (3)
(Project 3D point X to image plane that convert from world to camera coordinates)
Page 6
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 632 [email protected]
Figure. 1: Reprojection errors for poses optimization.
Our SLAM system initially loads the frames, then detects and extracts the features, then
processes them by the tracking subsystem. Once the keyframe selection is determined, an initial
map is generated by triangulating feature points in the initialization thread. Our global tracking
module pipeline is summarized in the following steps:
1. Feature Extraction and matching: We employ existing methods for feature tracking.
Firstly, ORB [5] keypoints are detected in the frames. ORB has several advantages;
mainly it is extremely fast to compute and match, better for matching keypoints
between frames from various viewpoints, and invariant for rotation. We implement
an enhanced robust algorithm to extract ORB feature descriptors (which are
oriented multi-scale FAST corners with a 256 bits’ descriptor associated) in parallel
at salient points within frames. We use epipolar geometry by selecting 8 matches at
random and estimate the Fundamental matrix, Essential matrix E for non-planar
scenes, and a Homography matrix H for planar scenes. Hence, we can find
translation vector.
2. Estimate Camera Poses: We use the geometrically two-view matches from the
previous step to estimate camera poses with global motion averaging schemes.
Camera orientations are estimated with a robust orientation estimation algorithm
[19] after camera orientations and their covariance are estimated, the camera
positions are robustly estimated with our linear optimization method in the pose
graph optimization thread.
Page 7
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 633 [email protected]
Figure. 2: Overview of the global vSLAM pipeline that is implemented in our designed system.
3. Triangulate 3D Points: After camera poses are estimated, 3D points are triangulated in
parallel and refined with our linear optimization method.
4. Bundle Adjustment: As a final step, camera poses, and 3D points are refined with a
nonlinear optimization to minimize reprojection error which is used to compute covariance of
translation and rotations.
Figure 3: Keyframe poses optimization and loop detection.
Page 8
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 634 [email protected]
Figure 4: Keyframes poses (p1, p2, p3) and edges (e12, e23) to get optimized trajectory.
4.2. Loop Closure
This section illustrates the work of detecting loops and correcting the erroneous of loops in the
pose graph of optimization. Our implementation is based on visual feature matching to
determine if
An area has been visited or not. For each new keyframe created by the tracking subsystem,
feature matching is performed. We use a RANSAC algorithm with 30 iterations to check for
3D-correspondences. If the number of resulting inliers is above a certain threshold (40
matches), we consider that the current frame and the keyframe to which pairwise alignment is
performed, a good matching. And hence, an optimization with all matches are performed, and
a similarity transformation SIM3 matrix is added from the current keyframe to each loop
keyframe (4x4 transformation matrix is created). Only these keyframes which have a higher
similarity score than a threshold and are not directly attached to the current keyframe are
considered as loop keyframes. This thread is illustrated in figure 3. Once a list of loop
keyframes is found, this gives a measure on the accumulated error. Our SLAM system provides
a linear system formulation that can integrate with L1 optimization algorithm [11,13], to detect
and correct false loops.
4.3. Graph Optimization (Error Reduction Subsystem)
This module represents back-end subsystem and has the following tasks:
4.3.1. Local Maps
Is modelled as a graph of a set of keyframes, and the sensed 3D keypoints. Local mapping
passes new keyframes and generates new map points. It also deletes redundant keyframes and
bad map points to provide a consistent map. Furthermore, it upgrades the covisibility graph
among keyframes and executes local BA for optimization, which optimizes the last processed
keyframe and all its adjacent keyframes in the covisibility graph.
4.3.2. The covisibility graph
Is modelled by an undirected weighted graph where each vertex represents a keyframe and the
edges represent vectors connect keyframes that share a minimum of 40 map points in the local
map. The weight of the edge is a score for the number of the common matchedmap keypoints
Page 9
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 635 [email protected]
between the two keyframes. To provide an optimum loop closure optimization, we also create
the so-called essential graph, which is less dense than the covisibility graph because an edge
between two keyframes is created when they have matched keypoints more
han 120 map keypoints observations. Finally, the map also represents a spanning tree,
which is the minimum connected edges representation of a graph that includes all the
keyframes.
4.3.3. The Optimization Algorithm
For each new frame, we compute camera poses and their related uncertainty represented by
information matrices. The 3D keypoints are then projected to the new keyframe pose, and the
reprojection errors are minimized to obtain both the camera pose and the information associated
to such estimation. Graph Optimization Algorithm (covisbilty graph optimization) is the thread
of computing keyframes (or points) positions and information matrix with known edge vectors.
There is an offset ambiguity in the edge map. To resolve this ambiguity, we add a special
reference keyframe to the surrounding environment. The robot should be able to identify this
reference keyframe and evaluate the map offset such that the position of the reference keyframe
is always fixed. There is an inconsistency in the map because of noisy measurements. To reach
the optimized trajectory of robot, we provide an objective function that minimizes the
inconsistency as possible (reduce errors part) by minimizing re-projection geometric error
(difference measure between the predicted and the measured projections), corresponding to
edge position observations model which is described in Figure 4. We define the residual error
ijerr that connects ip to jp as follows:
ijijtrans
ij epp=err −−−−−−−− (4)
The orientation is computed with sufficient accuracy using the method proposed in [12].
We exclude orientation computation from our estimation model to make the estimation
problem a linear estimation model. The edge variable has an additive white Gaussian
measurement noise with covariance ijΣΣΣΣ , and is considered 3-D random vector with Gaussian
distribution with mean ije and covariance matrix ijΣΣΣΣ . Uncertainty in edge vector represented
by its covariance matrix ijΣΣΣΣ varies from edge to another. By using mathematical operations in
[23-25] for graph optimization, these residual errors can be formulated as a non-linear least
squares problem (edge vector normalized error ijnerr ) as follow:
ijij
T
ijtrans
ij errerr=nerr1−−−−
ΣΣΣΣ
)epp()epp(=1
ijijij
T
ijij −−−−−−−−ΣΣΣΣ−−−−−−−−−−−−
(5)
We can write ijnerr as:
)ep(S)epp(=nerr ijijij
T
ijijtransij p −−−−−−−−−−−−−−−− (6)
`)epp(S)epp(2
1=F ijijij
T
ijijl
N
ij1,=jl
N
1=i −−−−−−−−−−−−−−−−ΣΣΣΣΣΣΣΣ ≠≠≠≠
(7)
We differentiate the total F and equalize the result by zero.
Page 10
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 636 [email protected]
[[[[ ]]]] ikkil
Ni
kii
lN
kl
kNkil
Ni
kiikk eS=
p
...
p
...
p
p
S...)S(...SS=
1,=
2
1
=
1,=21 ≠≠≠≠≠≠≠≠ ΣΣΣΣ
−−−−ΣΣΣΣ−−−−−−−−
(8)
B=AP (9)
Where A (System matrix), P (State vector), B (Coefficient vector) are defined by:
`
p
...
p
p
=P2
1
lN
(10)
ΣΣΣΣ−−−−−−−−
−−−−ΣΣΣΣ−−−−
−−−−−−−−ΣΣΣΣ
≠≠≠≠
≠≠≠≠
≠≠≠≠
)S(...SS
...
S...)S(S
S...S)S(
=A
il
Nl
N=i
lNi1,=i2
lN1
lN
lN2i2
lN=i
2i1,=i21
lN112i1
lN=i
1i1,=i
(11)
`
eS
...
eS
eS
=B=
1,=
22
=
21,=
11
=
11,=
ΣΣΣΣ
ΣΣΣΣ
ΣΣΣΣ
≠≠≠≠
≠≠≠≠
≠≠≠≠
liNi
lN
lNi
lNii
iil
Ni
ii
iil
Ni
ii
(12)
Equations (9) to (12) are the solution to the optimization problem expressed in Equation
(7). The dimensions of every vector and matrix is given in Table 1. These results show that
graph optimization is equivalent to solving a linear system of equations, So the proposed
system is truly linear without approximation.
Table 1: Dimensions of Terms of Graph-Based SLAM
Terms in Graph-
Optimization Alg. of SLAM Symbol Dimension
Relative Displacement
Vector between two
keyframesi and j. ije
3*1
Covariance matrix of edge ijΣΣΣΣ
3*3
State Vector (Linearized
Solution) P 3N*1
System Matrix A 3N*3N
Coefficient vector B 3N*1
Where N: Number of Keyframes (vertices)
Page 11
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 637 [email protected]
Note: In our system we assume that the reference pose of the robot to be the last global
frame, i.e. 3N 0=pl
.
We can use graph optimization for point map optimization. The edges are independent from
nodes and from each other. This process enables us to estimate each edge vector alone, which
extremely simplifies the map update. Instead of updating the whole map every time we add
measurements to the map, we only need to update the optimization problem and then solve it
to yield keyframes positions which represent full optimized trajectory.
4.3.3.1 Computing the covariance of Translation Vector
To compute the unknown 3D keypoints locations and keyframes poses from the observations,
we minimize their total prediction error. Our optimization is the model refinement. Hence, it is
essentially a matter of optimizing a nonlinear objective function represents the total
reprojection error over nonlinear parameters (the features and camera parameters). The inverse
Hessian H (second order derivative) at the minimum is a good estimate of the covariance matrix
(uncertainty amount) for these parameters. The objective function defined as the mean square
error of reprojection errors as illustrated (defined according to the Method of Least Squares):
(((( )))) (((( )))) (((( )))) (((( ))))j 2*
2jj 2*
j 2
1=
j 1*
j 1j 1*
j 1
1=
uuuuuuuu= −−−−−−−−++++−−−−−−−− ∑∑∑∑∑∑∑∑T
N
j
TN
j
E
(13)
Where: 1) imageproj(X`, =u j 1*
2) imageproj(X, =u j 2*
Where j 1u , j 2u are the projection of keypoint j onto frames 1, 2 respectively For each frame
we need to find a set of image points j 1*u , j 2
*u , where X ∈ R3 is the 3D environment points
and proj is the projection function, that projects 3D triangulated points from current
observation t)(X, in the map of SLAM to the 2D frame plane 1 and 2 as illustrated in Figure 1.
The pose ip is the pose of frame i . It includes both orientation and translation of frame i . In
this optimization problem, j 1u , j 2u are considered to be the measurements that have uncertainty
given by covariance matrix uijΣΣΣΣ . Both j 1u and j 2u are 2-D vector that represents the location
of a detected keypoint (image feature) inside the image plane and it has the units of pixels. Its
22 ×××× uncertainty matrix uijΣΣΣΣ is taken as unity matrix which means that the uncertainty is taken
as 1 pixel as given in the literature. We differentiate equation (13) twice with respect to camera
pose (translation vector and rotation matrix) we get:
∂∂∂∂
∂∂∂∂
∂∂∂∂
∂∂∂∂++++
∂∂∂∂
∂∂∂∂
∂∂∂∂
∂∂∂∂≅≅≅≅
∂∂∂∂∂∂∂∂
∂∂∂∂∑∑∑∑∑∑∑∑
TTTTTT
ET
N
j
TN
j
j 2*
j 2*
1=
j 1*
j 1*
1=
2 uu2
uu2
(14)
We can use: 2211
1 22 JJJJHTT
T ++++≅≅≅≅====∑∑∑∑ −−−−
Where 1J , 2J are the Jacobins of j 1*u , j 2
*u w.r.t camera pose Thus, computingthe error
vector of reprojection in camera of keypoint and its Jacobians with respect to the camera
poseresults in a matrix of dimension 62 ××××
4.3.3.2. Determination of Global scale of Monocular SLAM
Based on our proposed linearization algorithm, we use the method proposed by [5] and extend
it to compute the relative scale between two directly connected keyframes from the
Page 12
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 638 [email protected]
corresponding 3D keypoints (ORB features) based on bag of words. Furthermore, we optimize
the scale erroneous using essential graph.
5. EXPERIMENTAL SIMULATIONS & RESULTS EVALUATION
In this section we evaluate the accuracy and speed of our vSLAM for different datasets. We
also compare our designed system with the monocular version of ORB-SLAM [5] and LSD
SLAM [11] system. It was assumed that the camera provides 30 frames per second and delivers
consecutive stream of frames. To evaluate the graph-based SLAM algorithm, we compute the
translational errors for different datasets and tabulate these results. One of the most important
types of errors was mentioned in the literature for evaluating the SLAM problem, is relative
pose error (RPE) that is helpful for graph-based SLAM. We evaluate it in the TUM-RGBD
dataset. A more intuitive direction isto estimate the absolute trajectory error (ATE) [20] after
mapping the two trajectories: the ground truth and the estimated path. We evaluate it in the
KITTI dataset. All the experiments are executed on an Intel Core i5 computer with 8Gb RAM
without GPU parallelization.
5.1. KITTI dataset
First, we have simulated our vSLAM system on the KITTI dataset [21], which is recorded by
a stereo camera at 20 fps and a resolution 512x382. Our results are shown in Table 2. The
ground truth/estimated trajectory of the robot in case our system , ORB SALM system , and
LSD SLAM is shown in Figures 5, 6 for a robot moving in sequence 00 and sequence 08 of
KITTI database (which show the loop closure). We show that our system can detect loops in
the previous figures. The RMSE was estimated for KITTI dataset for all sequences. As shown
in these figures, the proposed vSLAM produced an accurate camera trajectory which is roughly
aligned with the trajectory of ground truth.
5.2. TUM-RGBD Dataset
In this part, we compare the quantitative results of our method using TUM-RGBD database
[20] which contains sequences from RGB-D camera, that was recorded at frame rate 30 Hz and
camera resolution 640×480. We compare our accuracy to that of the widely used state-of-the-
art systems ORB SLAM and LSD-SLAM in Table 3. Our method is based on the proposed
linear optimization method and outperforms the other approaches in most sequences. We also
compared these works with GSLAM [26] which show that our work outperforms it in some
sequences.
5.3. EuRoC Dataset
In Table V, weprovide the accuracy of our system in the 11 sequences of the EuRoC dataset
[22]. The datasetconsists of 11 stereo sequences. The sequences are specified as easy, medium
and difficult depending on the speed of MAV, illumination and view texture. Table 4 shows
absolute translation RMSE (translation Root Mean Square Error) of the robot trajectory for
each sequence.Our system successfully processes all the dataset sequences in real-time, except
V1-03 difficult, where the movement is too extreme to continue which lead to an ambiguity.
Our system gets lost also in sequence V2-03 difficult because of intense motion blur. Our
system is able to build its map, detect, and correct false loops. Our system achieves a tracking
precision of a few centimeters and gives accurate and similar results compared to both ORB-
SLAM and LSD-SLAM.
Page 13
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 639 [email protected]
Table 2: The Translation Errors Recorded for the KITTI Dataset
Ours ORB [5] LSD [16]
Sequence relt
(%)
abst
(m) relt abst relt abst
KITTI_00 0.82 7.3 0.70 6.68 0.63 6.2
KITTI_01 1.2 13.0 X X 2.36 15.2
KITTI_02 0.9 22.8 0.76 21.6 0.79 22.3
KITTI_03 0.92 0.6 0.71 1.6 1.01 1.2
KITTI_04 0.56 1.9 0.48 1.7 0.38 1.1
KITTI_05 0.65 9.6 0.40 8.25 0.64 2.1
KITTI_06 0.2 6.61 0.91 14.68 0.71 18.3
KITTI_07 0.68 4.4 0.50 3.26 0.56 5.2
KITTI_08 3.2 20.0 2.05 46.6 1.11 53.2
KITTI_09 0.92 3.0 0.87 7.62 1.14 9.4
KITTI_10 0.7 4.9 0.60 8.68 0.72 11.3
Table 3: Quantitative Results TUM-RGBD Dataset RMSE in cm.
Sequences Ours ORB [5] LSD
[16]
GSL
[26]
fr1xy 1.2 0.9 9.0 0.6
fr1desk 3.8 1.7 10.7 3.8
fr2xyz 0.6 0.3 2.2 0.2
fr2desk 2.01 0.9 4.6 2.1
fr3household 3.30 2.2 38.0 3.5
fr3sittingxyz 0.9 0.8 7.7 0.6
fr3nostructure
near 2.9 2.7 7.53 2.4
fr3nostructure
far 11.5 X 18.5 3.4
fr3structure far 0.98 0.8 8 0.8
fr3structur near 2.5 1.6 X 2.5
Table 4: Quantitative Results for EURoC Sequences, Showing Keyframe Trajectory RMSE Error, in
m
Sequences Ours ORB [5] LSD [16]
V1 _01 easy 0.23 0.015 0.081
V1_02 medium 0.54 0.020 0.03
V1 _03 difficult X X X
V2 _01 easy 0.35 0.015 X
V2 _02 medium 0.227 0.017 X
V2 _03 difficult X X X
MH_ 01 easy 0.131 0.070 X
MH _03 medium 0.233 0.066 X
MH _04 difficult X 0.071 X
MH _05 difficult 0.221 0.081 X
Page 14
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 640 [email protected]
Figure.5: Robot Trajectory for our system, ORB SLAM [6], and LSD SLAM [16] in case of dataset
KITTI_00.
Figure.6: Robot Trajectory for our system, ORB SLAM [6], and LSD SLAM [16] in case of dataset
KITTI_8
Page 15
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 641 [email protected]
Figure 7: Synthetic evaluation of speed performance of our system (blue lines) compared to the well-
known global BA [18] algorithm (orange lines) used in ORB SLAM [5].
Figure. 8: Real aerial video shows accurate convergent initialization of our SLAM. Keyframes
trajectories of our method (green line), ORB-SLAM (blue line) [5], and ground truth (red line)
respectively are shown.
5.4. Real Time performance for All Previous Datasets
In Table 5, we estimate the performance of our proposed graph optimization algorithm of
SLAM system w.r.t well-known nonlinear optimization algorithm which is called global
Bundle Adjustment (BA) [18] used in ORB SLAM system [5]. We note that the speed and
efficiency of our pose graph optimization is higher than that of global BA of ORB SLAM
system because it runs in real time faster (3 X) than global BA as shown in figure 7. We also
use real video to test our system to show the robustness of our system for initialization of map
in frames as shown in Figure 8.
Page 16
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 642 [email protected]
Table 5: Mean times of global optimizations of our algorithm.
Dataset Type,
Sequence Number,
and Number of
Frames
Our
Propose
d
lineariza
tion
Algorith
m
Global BA of
Nonlinear
Optimization
of ORB SLAM
[5]
KITTI_00_100 0.53452 1.5435
KITTI_01_100 0.46532 1.9648
KITTI_02_100 0.6542 1.8652
KITTI_03_100 0.54682 1.5968
TUM-
RGBD_fr1xy_150 0.6785 1.7568
TUM-
RGBD_fr1desk_150 0.75986 2.1564
TUM-
RGBD_fr2xyz_150 0.84563 2.5648
TUM-
RGBD_fr2desk_150 0.7896 2.9648
EuRoC _V1 _01
easy_200 0.9856 2.8652
EuRoC _V1 _01
easy_200 0.9856 2.8652
6. CONCLUSION& FUTURE WORK
We present a low cost visual SLAM allows the employment of a proposed linear method for
graph optimization. The proposed algorithm doesn't require initial conditions which makes map
initialization more robust. Our method can be used to initialize the bundle adjustment algorithm
in applications such as SLAM without needing to resort to heuristic methods to detect the
coplanarity of the points. We show that initializing SLAM produces accurate initialization
compared with state-of-the-art visual SLAMs. We provide a robust localization optimization
method for camera pose computation in the situations where camera localization becomes
unreliable in a very low-textured challenging environment. The pipeline of our system shows
an important advance in map generation in terms of semantic understanding and in loop closure
detection and correction using structural properties of edges in keyframes. We demonstrated
the estimation accuracy and robustness of our optimization method.
We presented the mathematical linear framework to compute keyframe poses and
uncertainties values for SLAM. The proposed method directly computes the 3DoF motion
parameters from a pair of frames and estimates the accuracy for the computed values. There
are also four degrees of freedom scale and three angles of rotations, that are computed
separately producing a consistent map free of drifts. The proposed system has an ability for
relocalization in a map optimization stage that has been previously visited and permits for
continued SLAM process for any general scene planar or non-planar in contrast to GSLAM
[26] which can’t be used for large scale environment, general scene, and feature reprojection
errors minimizations.
Our graph-based SLAM can compute the translation vector, scale, and rotation matrix
individually. We show that our approach optimization method can provide reliable and
Page 17
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 643 [email protected]
promoted results compared to the newly state-of-the-art SLAM systems specially inaccuracy
initialization and in dealing with incorrect loops. Finally, we conclude that our system utilizes
a linear optimization method which provides only one accurate solution, in contrast to the well-
known nonlinear global optimization BA of ORB-SLAM [5] which provides multiple non-
accurate solutions for optimization. Furthermore, our system runs in real time faster (3 X) than
the well-known systems in the literature.
In Future work we can enhance tracking performance by using minimization of the
photometric residual corresponding to photometric intensity observation model and including
uncertainty of this error in our optimization.
REFERENCES
[1] M. Ozaki Hugh Durrant-Whyte and Tim Bailey, “Simultaneous localisation and mapping
(SLAM): Part I the essential algorithms,” Robotics and Automation Magazine, 13(2):99–
110, 2006.
[2] T. Bailey and H. Durrant-Whyte, “Simultaneous localization and mapping (SLAM): Part
II,” Robotics& Automation Magazine, IEEE, 13(3):108–117, 2006.
[3] S. Lynen, M. W. Achtelik, S. Weiss, M. Chli and R. Siegwart, "A robust and modular multi-
sensor fusion approach applied to MAV navigation," 2013 IEEE/RSJ International
Conference on Intelligent Robots and Systems, Tokyo, 2013, pp. 3923-3929.
[4] E.S. Jones, S. Soatto, “Visual-inertial navigation, mapping and localization: A scalable
real-time causal approach,” Int. J. Robot. Res., 2011, 30, 407–430.
[5] R. Mur-Artal, J.M. Montiel, J.D. Tardós, “ORB-SLAM: A Versatile and Accurate
Monocular SLAM System,” IEEE Trans. Robot. , 2015, 31, 1147–1163.
[6] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, W. Burgard, “g2o: A general
framework for graph optimization,” In Proceedings of the IEEE International Conference
on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3607–3613.
[7] S. Agarwal, K. Mierle, and Others. Ceres solver, site: http://ceres-solver.org, (accessed on
14 November 2017).
[8] Jiantong Cheng, Jonghyuk Kim, Jinliang Shao, “Robust linear pose graph-based SLAM,”
In Robotics and Autonomous Systems, May 2015.
[9] L Zhao, S Huang, G Dissanayake, “Linear MonoSLAM: A linear approach to large-scale monocular SLAM
problems,” In the IEEE International Conference on Robotics and Automation (ICRA), 2015,
1517-1523.
[10] N. S. underhauf and P. Protze, “Switchable constraints for robust pose graph SLAM,” In
Proc. of IROS, 2012.
[11] L. Carlone, A. Censi, and F. Dellaert, “Selecting good measurements via l1 relaxation: a
convex approach for robust estimation over graphs,” In Proc. of IROS, 2014.
[12] L. Kneip, R. Siegwart, and M. Pollefeys, “Finding the exact rotation between two images
independently of the translation,” In Proc. of ECCV, 2012.
[13] E. J. Candes and T. Tao, “Decoding by linear programming. Information Theory,” IEEE
Transactions on Robotics, 51(12):4203–4215, 2005.
[14] J. Civera, A. J. Davison, and J. M. M. Montiel, “Inverse depth parametrization for
monocular SLAM,” IEEE Trans. on Rob, 29(6):1052–1067, 2008.
[15] W. Tan, H. Liu, Z. Dong, G. Zhang, and H. Bao, “Robust monocular SLAM in dynamic
environments,” In Proc. of IEEE Symposium on Mixed and Augmented Reality (IS-MAR),
2013.
[16] J. Engel, T. Schops, and D. Cremers, “Lsd-SLAM: Large-scale direct monocular SLAM,”
In Proc. of ECCV, 2014.
Page 18
Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed
http://www.iaeme.com/IJCIET/index.asp 644 [email protected]
[17] D. Galvez-Lpez and J. D. Tardos, “Bags of binary words for fast place recognition in image
sequences,” IEEE Transactions on Robotics, 28(5):1188–1197, Oct 2012.
[18] P. F. Triggs, R. I. McLauchlan, Hartley, and A. W. Fitzibbon, “Bundle adjustment - a
modern synthesis,” In Vision Algorithms: Theory andPractice, LNCS, pages 298–375.
Springer Verlag, 2000.
[19] A. Chatterjee and V. M. Govindu, “Efficient and robust large-scale rotation averaging,” In
ICCV. IEEE, 2013.
[20] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the
evaluation of RGB-D SLAM systems,” in IEEE/RSJInt. Conf. Intell. Robots and Syst.
(IROS), 2012, pp. 573–580.
[21] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI
vision benchmark suite,” in Computer Vision and Pattern Recognition (CVPR), 2012.
[22] M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R.
Siegwart, “The EuRoC micro aerial vehicle datasets,” The International Journal of
Robotics Research ,2016.
[23] P. Barooah and J.P. Hespanha, “Estimation on graphs from relative measurements,” IEEE
Control Systems Magazine, 27(4):57–74, 2007.
[24] Mohamed H. Merzban; M. Abdellatif, H. Abbas, S. Sessa, “Toward multi-stage decoupled
visual SLAM system,” Robotic and Sensors Environments (ROSE), 2013 IEEE
International Symposium on, vol., no., pp.172,177, 21-23 Oct. 2013.
[25] Jingchun Yin, Luca Carlone, Stefano Rosa, and Basilio Bona, “Graph-based robust
localization and mapping for autonomous mobile robotic navigation,” Conference: 2014
IEEE International Conference on Mechatronics and Automation (ICMA), vol., no.,
pp.123-177, August 2014.
[26] Chengzhou Tang ,Oliver Wang ,Ping Tan , “GSLAM: Initialization-robust Monocular Visual SLAM
via Global Structure-from-Motion,” International Conference on 3D Vision , Aug. 2017.
[27] Pyojin Kim, Brian Coltin, H. Jin Kim, “Linear RGB-D SLAM for Planar Environments,”
EECV, 2018.
M. H. Mahmoud received the B.S. degree in Communications
engineering from Faculty of Engineering - Fayoum University - Egypt in
2008 and the M.S. degree in signal processing engineering from Fayoum
University-Egypt in 2012. He is currently pursuing the Ph.D. degree in
computer vision engineering at Fayoum University-Egypt. From 2012 till
now, he is working as Lecturer Assistant at MSA University. His research
interest includes computer vision, signal processing, robotics, and SLAM
systems.
Nashaat M. Hussein received his B.Sc. in communication and
electronics engineering from Al-Azhar University—Egypt in 2002. In
2005, he received his M.Sc. degree in communication and electronics
engineering from (C.N.M.) National Center of Microelectronics, Sevilla
University—Spain. In 2009, he received his Ph.D. in Digital Integrated
Circuit Design for the Applications of Image processing from (C.N.M.)
National Center of Microelectronics, Sevilla University—Spain.
Currently, he is working as a Lecturer at Fayoum University—Egypt.
His research interest includes optimization of digital Image processing
techniques (image compression, enhancement, pattern recognition analysis, edges detections,
and image hiding data). Applications of these techniques (artificial vision, smart vision, SLAM
system).
Page 19
A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera
http://www.iaeme.com/IJCIET/index.asp 645 [email protected]
Elsayed Hemayed received is currently working as a Professor of
computer engineering at the University of Science and Technology, Zewail
City, and the director of the Communication and Information Engineering
(CIE) program. He is on leave from Cairo University. He got his PhD from
University of Louisville, KY in 1999. He got the Graduate Dean’s Citation
Award and the John M. Houchens Prize in recognition of excellent doctoral
dissertation. He got his MSc and his BSc from Cairo University in 1992 and
1989. His research interest includes computer vision, robotics, and SLAM systems. Sites link
is https://sites.google.com/site/elsayedhemayed/home .