A FAST LINEARLY BACK-END SLAM FOR NAVIGATION BASED … › MasterAdmin › uploadfolder › IJCIET_09...association), e.g., between the key frame and map points at different points

http://www.iaeme.com/IJCIET/index.asp 627 [email protected]

International Journal of Civil Engineering and Technology (IJCIET) Volume 9, Issue 12, December 2018, pp. 627-645, Article ID: IJCIET_09_12_069

Available online at http://www.iaeme.com/ijciet/issues.asp?JType=IJCIET&VType=9&IType=12

ISSN Print: 0976-6308 and ISSN Online: 0976-6316

© IAEME Publication Scopus Indexed

A FAST LINEARLY BACK-END SLAM FOR

NAVIGATION BASED ON MONOCULAR CAMERA

Mohamed H. Mahmoud

Communications Engineering Dep., Fayoum University, Egypt

Nashaat M. Hussein

Communications Engineering Dep., Fayoum University, Egypt

Elsayed Hemayed

Computer Science Engineering Dept., Cairo University, Egypt

ABSTRACT

Recently Graph based SLAM has become a popular representation for solving

SLAM problem, but it is affected by its inherent nonlinearity due to noisy measurements

which lead to non-consistent framework. Nonlinear optimization algorithms, that are

used for large-scale keyframe based SLAM, suffer from issues associated with

initialization, iterations, local minima, and scale erroneous.The proposed method

avoids issues that are involved in the nonlinear optimization algorithms. The proposed

system provides the entire robot trajectory, at lower cost using monocular camera,

generates a globally consistent 3D model of the environment, and detects loop closure

with an optimized runtime. Our system utilizes a linear optimization method which

provides only one accurate solution, in contrast to the well-known nonlinear global

optimization like BA of ORB-SLAM, which provides multiple non-accurate solutions

for optimization. Furthermore, our system runs in real time and is 3X faster than the

current state-of-the-art SLAM systems. Our proposed system is used for autonomous

navigation of Micro Aerial Vehicles (MAVs). We tested our system using EuRoC, TUM,

and KITTI datasets. Simulation results and implemented SLAM experiments in large-

scale environments show that our system can be an alternative to filtering approaches

with a better accuracy, robustness, and more efficient than some well-known systems

Keywords: Robotic Systems; Monocular Camera; Tracking; Loop Closure.

Cite this Article: Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed,

a Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera,

International Journal of Civil Engineering and Technology, 9(12), 2018, pp. 627–645

http://www.iaeme.com/IJCIET/issues.asp?JType=IJCIET&VType=9&IType=12

Mohamed H. Mahmoud, Nashaat M. Hussein and Elsayed Hemayed


1. INTRODUCTION

SLAM is an active research topic for more than 30 years with overwhelming literature in the

robotics community (e.g. [1]). Mathematically, SLAM is modeled as a high-dimensional

nonlinear estimation problem whose aim is to build a map for unknown environments in

obscurity of referencing systems like GPS and simultaneously utilize the built map to find the

“optimal” estimate for robot poses (location and orientation) and correct errors of previous

estimations using noisy measurements and uncertain priors. The measurement function is

generally a nonlinear objective function in its arguments especially in case of camera sensor.

Generally, a graph- based SLAM system can be divided into three sequential modules [2];

frontend, backend and map representation. the frontend is used for measurement of motion

which processes sensor data to extract geometric motion and spatial constraints (data

association), e.g., between the key frame and map points at different points in time. The

backend (optimizer) is used to estimate and correct the poses of the keyframes (maximum a

posterior) to obtain a consistent map of the environment given the constraints from front-end.

For this target, back-end solutions have developed gradually from filter based methods (e.g. [3,

4]) to graph optimization methods [5].

Graph based optimization methods have the property of exploitation of the natural sparsity

structure of SLAM. It also enables better calculations by using nonlinear least square

optimizers like (e.g., g2o [6], Ceres [7]) which can detect and optimize loops to build a globally

consistent map free from drifts. Moreover, efficient visual SLAM solutions have been obtained

using linear optimization methods like [8, 9] exploiting the sparseness of the information

matrix and solving sparse linear equations, but no

Guarantee to get consistent map particularly for large-scale SLAM problems with

erroneous detected loops. In literature, to detect loops, there are reliable optimization

techniques [10] that are based on switching variables and weighting methods. These techniques

are sensitive to initialization, as they iteratively optimize a highly non-linear objective function.

The main contribution of this work is computing optimized paths for navigation, developing

and implementing visual SLAM using a linear method for a fast and accurate approximation

of pose graph optimization. The proposed algorithm requires no initial guess (assumption) and

detects loop closure. This is done by creating a system of linear equations (or linear system)

which is a collection of linear equations involving the same set of variables independent of rotations.

The method is used in [12] to compute keyframe rotations independent of computation of

keyframe poses. Most importantly we gain the advantages of linearity.

2 .RELATED WORK

Extensive research in the field of keyframe-graph based SLAM techniques have been

introduced in recent years. A comprehensive review of these techniques is beyond the scope of

this work. We present some of the most relevant and recent works related to visual SLAM

systems in the literature.

Map initialization. Monocular SLAM that depends on low-cost sensor of embedded

hardware is challenging. It requires a procedure to create an initial map knowing that depth

cannot be recovered from monocular camera. There are recent successful solutions for SLAM

as a filtering problem like in [14] where Kalman filters are used, and 3D keypoints are

initialized with a higher uncertainty in the depth. In these approaches every frame is processed

by the filter to jointly estimate the map point locations and the robot pose. These approaches

suffer from the uncertainty of initialization, the wasting of run time in processing consecutive

frames with little new information, and the accumulation of linearization errors. But methods

like [15] initializes SLAM system using an essential matrix assuming that the observation of

A Fast Linearly Back-End SLAM for Navigation Based on Monocular Camera


camera is a general scene. The drawback of the computation of essential matrix is the need to

select optimum solution from multiple solutions. In addition to the process of determining

which matrix (essential or homography) is better than other, is very complicated, as discussed

in [5]. Therefore, ORB-SLAM presents a new automatic approach based on model selection to

decide whether the transformation can best be modeled with an essential matrix E for non-

planar scenes, or with a homography H for planar scenes, and each model is penalized

according to its symmetric transfer error. In the planar case (i.e. planar, nearly planar, and low

parallax), the selection of the homography may generate corrupted solutions. Therefore, ORB-

SLAM may fail the initialization in that case, and the system restarts with a different pair of

frames to get initial pose and initial map point.

The recent LSD-SLAM [16] randomly initializes the 3D map by initializing the depth of

the pixels to a random value with high variance. Thus, the LSD-SLAM does not have selection

problem of the initialization but may be unstable, because this system minimizes the

photometric error function that includes much more local minima than the geometric error. We

will show in the simulation how todeal with environments of arbitrary geometry in challenging

situations and any camera motion that induces parallax (general camera motions) in the

initialization stage.

Loop closure detection and correction. The algorithms that detect loop closures in pose

graph SLAM system, are either PGO (pose graph optimization) or BA. The implementations

of such optimizations are provided using libraries such as G2o [6] and Ceres [7]. These

implementations optimize the loops to detect and correct the error (e.g. for positive depth, re-

projection error, or scale drift) in the keyframe pose and in all relevant map data that were

generated during the trajectory of loop. The loop closure thread tries to resolve loops upon the

addition of a new keyframe, to detect and correct any accumulated error by the system over

time. Some methods [17] is using bag of words (BoW) to detect loops in trajectory of camera.

However, they are still creating false loops that lead to erroneous in relative motion

computations and failure of the optimization of the pose-graph.

SLAM Optimization and Linearization. The graph-based SLAM models the SLAM

system as graphical structure. In this system each keyframes poses or map points positions are

represented as nodes (metric map), with edges are the measurements (scans or odometry)

connecting them as constraints. The goal of optimization approaches is to optimize the poses

of the nodes to minimize the error introduced by the constraint, thus called an off-line or batch

processing technique. Bundle Adjustment (BA) [18] is solved with the Levenberg-Marquardt

(LM) nonlinear optimizer to minimize the reprojection error. Estimator consistency is vital for

SLAM because an inconsistent estimator provides no guarantee for the accuracy of the

generated state estimates, leading to unreliable robot/landmark estimates. In general, for small-

scale problems BA easily converges to the globally optimal solution. However, when the

number of keyframes is large, BA can be very time-consuming and can converge to a local

minimum, unless a good initialization tothe map and keyframe pose is used. There are SFM

techniques such GSLAM [26] which can’t be used for large scale environment, general scene,

and feature reprojection errors minimizations.

The recent linear SLAM [27] provides a linear approach based on sequential Bayesian

filtering of orthogonal planar features using Kalman filter, to estimate the camera poses and

multiple point’s planes in a pose graph optimization. The drawback of this system is used only

for planar environments, although it depends on RGBD camera.

In the literature, the nonlinear least squares problems (NLS) with partially linear residuals

are called separable, and in statistical term such issues are called conditionally Linear-Gaussian

state space models. Optimizing large graphs has been a serious impedimentfor SLAM system



[23-25]. The problem is to reduce the residual errors by an optimum selection of the

optimization variables that can greatly simplify the problem structure and get consistent

graph(map). Inspiring by this work we develop, design, and implement the idea of linear

approximation of pose graph optimization to minimize residual error, detect loop closure, and

enhance Monocular SLAM initialization. To achieve this, we have implemented a SLAM

system with some new ideas and algorithms, but also incorporating excellent works developed

in the literature. Some of these works are the graph optimization of P. Barooah [23], ORB

features extractor and matcher developed by R. Mur-Arta [5], and the loop detection by bag-

of-words (BoW) of D. Galvez-Lpez [17]. A sequence of experiments, on-line as well as off-

line with different datasets of environments, are performed to validate the accuracy, speed, and

real-time performance of the developed linear system.

In summary, we propose a real-time visual SLAM system with the following properties:

1. Our system keeps the separable structure of SLAM, since we estimate the linear variables

using nonlinear measurements corrupted by Gaussian noise. Thus, the investigated system has

a linearity property which is a significant improvement compared to other systems.

2. Our system does not only maximize the accuracy of the robot trajectory, but also impedes

drift-buildup that minimizes the number of failures of the tracking. Thus, achieving accurate

and globally consistent framework compared to that of the state-of-the-art techniques.

3. Our implementation provides a faster Linear SLAM impediment to false loops.

4. Our system is capable of accurate initialization of vSLAM.

3. SYSTEM PRELIMINARIES AND PROBLEM FORMULATION

3.1. Graph based SLAM preliminaries

Let )3(SE}p,.....,p{=P N1 ∈∈∈∈ be a set of N nodes representing position of a mobile robot at

consecutive time instants (frame poses) and }e,.....,e{= M1ξξξξ be a set of M edges represent relative

translation vector between every two nodes ije . The objective of pose graph optimization is to

compute an estimate of the nodes configuration that maximizes the likelihood of the

measurements by using mathematical model that is solved by linear method. Since relative

pose measurements are affected by noise, the measured quantities are in the form ijijij s= ++++εεεεεεεε

where ijs ∈ R3 is a zero-mean Gaussian noise, i.e. )c,0(Ns ijij ≈≈≈≈ , being ijc a 3 × 3 covariance

matrix. The uncertainty of each edge measurement (covariance matrix) ijeC is obtained easily

from the uncertainties of the two frames locations as given by equation (2) as follows.

ijij pp=e −−−− (1)

pjpie CC=Cij

++++ (2)

3.2. Problem definition

The role of localization thread of SLAM system is to compute positions of the robot (frames)

in the world coordinate frame }z,y,x{pt ==== , where z,y,x translation are coordinates. For each frame

we need to find a set of 2D image points u such that t)proj(X, =u , where X ∈R3 and proj is the

projection function, that projects 3D triangulated points X from current observation t)(X, in the

map of SLAM to the 2D image plane.

Figure 1 shows the reprojection errors term. To keep the model as close to the ground-truth

as possible, we maximize the likelihood of the measurements by using mathematical model



that is solved by linear method and minimize reprojection error. The variables to be optimized

are tp . It should be mentioned that the projection function proj is strongly nonlinear, hence the

optimization problem has many minima, hence we should have good initial solutions for tp .

Fortunately, the previous stage of RANSAC and triangulation provide such good initial

solutions.

In a summary our target in this research is the detection of loops and graph optimization.

We can distinguish the two different threads as described in Figure 3, although they are closely

linked. The first thread consists of the identification of a previously visited place (loop). When

a loop is detected, a new relation is added to the graph that relates the current pose with the

pose in the past where we visited the same place. The second thread tries to reduce the

accumulated errors from the pose estimation based on pairwise alignment and reprojection

errors. This will get more consistent maps, especially when a place is revisited after a long

period of time.In a summary our target in this research is the detection of loops and graph

optimization.We can distinguish the two different threads as described in Figure 3, although

they are closely linked. The first thread consists of the identification of a previously visited

place (loop). When a loop is detected, a new relation is added to the graph that relates the

current pose with the loop pose. The second thread tries to reduce the accumulated errors from

the pose estimation based on reprojection errors using proposed linear algorithm. This will get

more consistent maps, especially when a place is revisited after a long period of time.

4. THE GENERAL STRUCTURE OF THE PROPOSED VSLAM SYSTEM

The general structure of the proposed vSLAM system is depicted in Figure 2, and its main

modules are described in the following sections. Our system generates a global, scalable, and

reliable 3D map based on pose-graph of keyframes and computing relative poses of frames.

Our proposal SLAM system has three main modules; tracking, loop closure, and optim

ization modules. Our SLAM system can be divided into two parts front-end and back-end.

The front end is used to determine camera pose (translation and rotation) and detect loop

closure. The back-end part is used to optimize the trajectory of robot by reducing accumulated

errors using our new investigated method. This efficient distribution allows for a continuous

tracking of the tracking module while the global optimization and the loop closure ones are

processed in the background only when a new keyframe is added. System takes a set of

pairwise relative poses between cameras and rotations as input, and outputs the position of all

keyframes in global coordinate. The camera poses are computed through motion averaging

linear algorithm.

4.1. Tracking module

We use a pinhole camera model, in which the camera is modeled as a dark box with a small

hole that allows light rays to fall on the back plane of the box. The camera mapping function

is termed perspective projection and the parameters for each frame rotation matrix R and a

translation vector t . The relations for projecting a 3D point X into image plan is:

tRX=U ++++ (3)

(Project 3D point X to image plane that convert from world to camera coordinates)



Figure. 1: Reprojection errors for poses optimization.

Our SLAM system initially loads the frames, then detects and extracts the features, then

processes them by the tracking subsystem. Once the keyframe selection is determined, an initial

map is generated by triangulating feature points in the initialization thread. Our global tracking

module pipeline is summarized in the following steps:

1. Feature Extraction and matching: We employ existing methods for feature tracking.

Firstly, ORB [5] keypoints are detected in the frames. ORB has several advantages;

mainly it is extremely fast to compute and match, better for matching keypoints

between frames from various viewpoints, and invariant for rotation. We implement

an enhanced robust algorithm to extract ORB feature descriptors (which are

oriented multi-scale FAST corners with a 256 bits’ descriptor associated) in parallel

at salient points within frames. We use epipolar geometry by selecting 8 matches at

random and estimate the Fundamental matrix, Essential matrix E for non-planar

scenes, and a Homography matrix H for planar scenes. Hence, we can find

translation vector.

2. Estimate Camera Poses: We use the geometrically two-view matches from the

previous step to estimate camera poses with global motion averaging schemes.

Camera orientations are estimated with a robust orientation estimation algorithm

[19] after camera orientations and their covariance are estimated, the camera

positions are robustly estimated with our linear optimization method in the pose

graph optimization thread.



Figure. 2: Overview of the global vSLAM pipeline that is implemented in our designed system.

3. Triangulate 3D Points: After camera poses are estimated, 3D points are triangulated in

parallel and refined with our linear optimization method.

4. Bundle Adjustment: As a final step, camera poses, and 3D points are refined with a

nonlinear optimization to minimize reprojection error which is used to compute covariance of

translation and rotations.

Figure 3: Keyframe poses optimization and loop detection.



Figure 4: Keyframes poses (p1, p2, p3) and edges (e12, e23) to get optimized trajectory.

4.2. Loop Closure

This section illustrates the work of detecting loops and correcting the erroneous of loops in the

pose graph of optimization. Our implementation is based on visual feature matching to

determine if

An area has been visited or not. For each new keyframe created by the tracking subsystem,

feature matching is performed. We use a RANSAC algorithm with 30 iterations to check for

3D-correspondences. If the number of resulting inliers is above a certain threshold (40

matches), we consider that the current frame and the keyframe to which pairwise alignment is

performed, a good matching. And hence, an optimization with all matches are performed, and

a similarity transformation SIM3 matrix is added from the current keyframe to each loop

keyframe (4x4 transformation matrix is created). Only these keyframes which have a higher

similarity score than a threshold and are not directly attached to the current keyframe are

considered as loop keyframes. This thread is illustrated in figure 3. Once a list of loop

keyframes is found, this gives a measure on the accumulated error. Our SLAM system provides

a linear system formulation that can integrate with L1 optimization algorithm [11,13], to detect

and correct false loops.

4.3. Graph Optimization (Error Reduction Subsystem)

This module represents back-end subsystem and has the following tasks:

4.3.1. Local Maps

Is modelled as a graph of a set of keyframes, and the sensed 3D keypoints. Local mapping

passes new keyframes and generates new map points. It also deletes redundant keyframes and

bad map points to provide a consistent map. Furthermore, it upgrades the covisibility graph

among keyframes and executes local BA for optimization, which optimizes the last processed

keyframe and all its adjacent keyframes in the covisibility graph.

4.3.2. The covisibility graph

Is modelled by an undirected weighted graph where each vertex represents a keyframe and the

edges represent vectors connect keyframes that share a minimum of 40 map points in the local

map. The weight of the edge is a score for the number of the common matchedmap keypoints



between the two keyframes. To provide an optimum loop closure optimization, we also create

the so-called essential graph, which is less dense than the covisibility graph because an edge

between two keyframes is created when they have matched keypoints more

han 120 map keypoints observations. Finally, the map also represents a spanning tree,

which is the minimum connected edges representation of a graph that includes all the

keyframes.

4.3.3. The Optimization Algorithm

For each new frame, we compute camera poses and their related uncertainty represented by

information matrices. The 3D keypoints are then projected to the new keyframe pose, and the

reprojection errors are minimized to obtain both the camera pose and the information associated

to such estimation. Graph Optimization Algorithm (covisbilty graph optimization) is the thread

of computing keyframes (or points) positions and information matrix with known edge vectors.

There is an offset ambiguity in the edge map. To resolve this ambiguity, we add a special

reference keyframe to the surrounding environment. The robot should be able to identify this

reference keyframe and evaluate the map offset such that the position of the reference keyframe

is always fixed. There is an inconsistency in the map because of noisy measurements. To reach

the optimized trajectory of robot, we provide an objective function that minimizes the

inconsistency as possible (reduce errors part) by minimizing re-projection geometric error

(difference measure between the predicted and the measured projections), corresponding to

edge position observations model which is described in Figure 4. We define the residual error

ijerr that connects ip to jp as follows:

ijijtrans

ij epp=err −−−−−−−− (4)

The orientation is computed with sufficient accuracy using the method proposed in [12].

We exclude orientation computation from our estimation model to make the estimation

problem a linear estimation model. The edge variable has an additive white Gaussian

measurement noise with covariance ijΣΣΣΣ , and is considered 3-D random vector with Gaussian

distribution with mean ije and covariance matrix ijΣΣΣΣ . Uncertainty in edge vector represented

by its covariance matrix ijΣΣΣΣ varies from edge to another. By using mathematical operations in

[23-25] for graph optimization, these residual errors can be formulated as a non-linear least

squares problem (edge vector normalized error ijnerr ) as follow:

ijij

T

ijtrans

ij errerr=nerr1−−−−

ΣΣΣΣ

)epp()epp(=1

ijijij

T

ijij −−−−−−−−ΣΣΣΣ−−−−−−−−−−−−

(5)

We can write ijnerr as:

)ep(S)epp(=nerr ijijij

T

ijijtransij p −−−−−−−−−−−−−−−− (6)

`)epp(S)epp(2

1=F ijijij

T

ijijl

N

ij1,=jl

N

1=i −−−−−−−−−−−−−−−−ΣΣΣΣΣΣΣΣ ≠≠≠≠

(7)

We differentiate the total F and equalize the result by zero.



[[[[ ]]]] ikkil

Ni

kii

lN

kl

kNkil

Ni

kiikk eS=

p

...

p

...

p

p

S...)S(...SS=

1,=

2

1

=

1,=21 ≠≠≠≠≠≠≠≠ ΣΣΣΣ

−−−−ΣΣΣΣ−−−−−−−−

(8)

B=AP (9)

Where A (System matrix), P (State vector), B (Coefficient vector) are defined by:

`

p

...

p

p

=P2

1

lN

(10)

ΣΣΣΣ−−−−−−−−

−−−−ΣΣΣΣ−−−−

−−−−−−−−ΣΣΣΣ

≠≠≠≠

≠≠≠≠

≠≠≠≠

)S(...SS

...

S...)S(S

S...S)S(

=A

il

Nl

N=i

lNi1,=i2

lN1

lN

lN2i2

lN=i

2i1,=i21

lN112i1

lN=i

1i1,=i

(11)

`

eS

...

eS

eS

=B=

1,=

22

=

21,=

11

=

11,=

ΣΣΣΣ

ΣΣΣΣ

ΣΣΣΣ

≠≠≠≠

≠≠≠≠

≠≠≠≠

liNi

lN

lNi

lNii

iil

Ni

ii

iil

Ni

ii

(12)

Equations (9) to (12) are the solution to the optimization problem expressed in Equation

(7). The dimensions of every vector and matrix is given in Table 1. These results show that

graph optimization is equivalent to solving a linear system of equations, So the proposed

system is truly linear without approximation.

Table 1: Dimensions of Terms of Graph-Based SLAM

Terms in Graph-

Optimization Alg. of SLAM Symbol Dimension

Relative Displacement

Vector between two

keyframesi and j. ije

3*1

Covariance matrix of edge ijΣΣΣΣ

3*3

State Vector (Linearized

Solution) P 3N*1

System Matrix A 3N*3N

Coefficient vector B 3N*1

Where N: Number of Keyframes (vertices)



Note: In our system we assume that the reference pose of the robot to be the last global

frame, i.e. 3N 0=pl

.

We can use graph optimization for point map optimization. The edges are independent from

nodes and from each other. This process enables us to estimate each edge vector alone, which

extremely simplifies the map update. Instead of updating the whole map every time we add

measurements to the map, we only need to update the optimization problem and then solve it

to yield keyframes positions which represent full optimized trajectory.

4.3.3.1 Computing the covariance of Translation Vector

To compute the unknown 3D keypoints locations and keyframes poses from the observations,

we minimize their total prediction error. Our optimization is the model refinement. Hence, it is

essentially a matter of optimizing a nonlinear objective function represents the total

reprojection error over nonlinear parameters (the features and camera parameters). The inverse

Hessian H (second order derivative) at the minimum is a good estimate of the covariance matrix

(uncertainty amount) for these parameters. The objective function defined as the mean square

error of reprojection errors as illustrated (defined according to the Method of Least Squares):

(((( )))) (((( )))) (((( )))) (((( ))))j 2*

2jj 2*

j 2

1=

j 1*

j 1j 1*

j 1

1=

uuuuuuuu= −−−−−−−−++++−−−−−−−− ∑∑∑∑∑∑∑∑T

N

j

TN

j

E

(13)

Where: 1) imageproj(X`, =u j 1*

2) imageproj(X, =u j 2*

Where j 1u , j 2u are the projection of keypoint j onto frames 1, 2 respectively For each frame

we need to find a set of image points j 1*u , j 2

*u , where X ∈ R3 is the 3D environment points

and proj is the projection function, that projects 3D triangulated points from current

observation t)(X, in the map of SLAM to the 2D frame plane 1 and 2 as illustrated in Figure 1.

The pose ip is the pose of frame i . It includes both orientation and translation of frame i . In

this optimization problem, j 1u , j 2u are considered to be the measurements that have uncertainty

given by covariance matrix uijΣΣΣΣ . Both j 1u and j 2u are 2-D vector that represents the location

of a detected keypoint (image feature) inside the image plane and it has the units of pixels. Its

22 ×××× uncertainty matrix uijΣΣΣΣ is taken as unity matrix which means that the uncertainty is taken

as 1 pixel as given in the literature. We differentiate equation (13) twice with respect to camera

pose (translation vector and rotation matrix) we get:

∂∂∂∂

∂∂∂∂

∂∂∂∂

∂∂∂∂++++

∂∂∂∂

∂∂∂∂

∂∂∂∂

∂∂∂∂≅≅≅≅

∂∂∂∂∂∂∂∂

∂∂∂∂∑∑∑∑∑∑∑∑

TTTTTT

ET

N

j

TN

j

j 2*

j 2*

1=

j 1*

j 1*

1=

2 uu2

uu2

(14)

We can use: 2211

1 22 JJJJHTT

T ++++≅≅≅≅====∑∑∑∑ −−−−

Where 1J , 2J are the Jacobins of j 1*u , j 2

*u w.r.t camera pose Thus, computingthe error

vector of reprojection in camera of keypoint and its Jacobians with respect to the camera

poseresults in a matrix of dimension 62 ××××

4.3.3.2. Determination of Global scale of Monocular SLAM

Based on our proposed linearization algorithm, we use the method proposed by [5] and extend

it to compute the relative scale between two directly connected keyframes from the



corresponding 3D keypoints (ORB features) based on bag of words. Furthermore, we optimize

the scale erroneous using essential graph.

5. EXPERIMENTAL SIMULATIONS & RESULTS EVALUATION

In this section we evaluate the accuracy and speed of our vSLAM for different datasets. We

also compare our designed system with the monocular version of ORB-SLAM [5] and LSD

SLAM [11] system. It was assumed that the camera provides 30 frames per second and delivers

consecutive stream of frames. To evaluate the graph-based SLAM algorithm, we compute the

translational errors for different datasets and tabulate these results. One of the most important

types of errors was mentioned in the literature for evaluating the SLAM problem, is relative

pose error (RPE) that is helpful for graph-based SLAM. We evaluate it in the TUM-RGBD

dataset. A more intuitive direction isto estimate the absolute trajectory error (ATE) [20] after

mapping the two trajectories: the ground truth and the estimated path. We evaluate it in the

KITTI dataset. All the experiments are executed on an Intel Core i5 computer with 8Gb RAM

without GPU parallelization.

5.1. KITTI dataset

First, we have simulated our vSLAM system on the KITTI dataset [21], which is recorded by

a stereo camera at 20 fps and a resolution 512x382. Our results are shown in Table 2. The

ground truth/estimated trajectory of the robot in case our system , ORB SALM system , and

LSD SLAM is shown in Figures 5, 6 for a robot moving in sequence 00 and sequence 08 of

KITTI database (which show the loop closure). We show that our system can detect loops in

the previous figures. The RMSE was estimated for KITTI dataset for all sequences. As shown

in these figures, the proposed vSLAM produced an accurate camera trajectory which is roughly

aligned with the trajectory of ground truth.

5.2. TUM-RGBD Dataset

In this part, we compare the quantitative results of our method using TUM-RGBD database

[20] which contains sequences from RGB-D camera, that was recorded at frame rate 30 Hz and

camera resolution 640×480. We compare our accuracy to that of the widely used state-of-the-

art systems ORB SLAM and LSD-SLAM in Table 3. Our method is based on the proposed

linear optimization method and outperforms the other approaches in most sequences. We also

compared these works with GSLAM [26] which show that our work outperforms it in some

sequences.

5.3. EuRoC Dataset

In Table V, weprovide the accuracy of our system in the 11 sequences of the EuRoC dataset

[22]. The datasetconsists of 11 stereo sequences. The sequences are specified as easy, medium

and difficult depending on the speed of MAV, illumination and view texture. Table 4 shows

absolute translation RMSE (translation Root Mean Square Error) of the robot trajectory for

each sequence.Our system successfully processes all the dataset sequences in real-time, except

V1-03 difficult, where the movement is too extreme to continue which lead to an ambiguity.

Our system gets lost also in sequence V2-03 difficult because of intense motion blur. Our

system is able to build its map, detect, and correct false loops. Our system achieves a tracking

precision of a few centimeters and gives accurate and similar results compared to both ORB-

SLAM and LSD-SLAM.



Table 2: The Translation Errors Recorded for the KITTI Dataset

Ours ORB [5] LSD [16]

Sequence relt

(%)

abst

(m) relt abst relt abst

KITTI_00 0.82 7.3 0.70 6.68 0.63 6.2

KITTI_01 1.2 13.0 X X 2.36 15.2

KITTI_02 0.9 22.8 0.76 21.6 0.79 22.3

KITTI_03 0.92 0.6 0.71 1.6 1.01 1.2

KITTI_04 0.56 1.9 0.48 1.7 0.38 1.1

KITTI_05 0.65 9.6 0.40 8.25 0.64 2.1

KITTI_06 0.2 6.61 0.91 14.68 0.71 18.3

KITTI_07 0.68 4.4 0.50 3.26 0.56 5.2

KITTI_08 3.2 20.0 2.05 46.6 1.11 53.2

KITTI_09 0.92 3.0 0.87 7.62 1.14 9.4

KITTI_10 0.7 4.9 0.60 8.68 0.72 11.3

Table 3: Quantitative Results TUM-RGBD Dataset RMSE in cm.

Sequences Ours ORB [5] LSD

[16]

GSL

[26]

fr1xy 1.2 0.9 9.0 0.6

fr1desk 3.8 1.7 10.7 3.8

fr2xyz 0.6 0.3 2.2 0.2

fr2desk 2.01 0.9 4.6 2.1

fr3household 3.30 2.2 38.0 3.5

fr3sittingxyz 0.9 0.8 7.7 0.6

fr3nostructure

near 2.9 2.7 7.53 2.4

fr3nostructure

far 11.5 X 18.5 3.4

fr3structure far 0.98 0.8 8 0.8

fr3structur near 2.5 1.6 X 2.5

Table 4: Quantitative Results for EURoC Sequences, Showing Keyframe Trajectory RMSE Error, in

m

Sequences Ours ORB [5] LSD [16]

V1 _01 easy 0.23 0.015 0.081

V1_02 medium 0.54 0.020 0.03

V1 _03 difficult X X X

V2 _01 easy 0.35 0.015 X

V2 _02 medium 0.227 0.017 X

V2 _03 difficult X X X

MH_ 01 easy 0.131 0.070 X

MH _03 medium 0.233 0.066 X

MH _04 difficult X 0.071 X

MH _05 difficult 0.221 0.081 X



Figure.5: Robot Trajectory for our system, ORB SLAM [6], and LSD SLAM [16] in case of dataset

KITTI_00.

Figure.6: Robot Trajectory for our system, ORB SLAM [6], and LSD SLAM [16] in case of dataset

KITTI_8



Figure 7: Synthetic evaluation of speed performance of our system (blue lines) compared to the well-

known global BA [18] algorithm (orange lines) used in ORB SLAM [5].

Figure. 8: Real aerial video shows accurate convergent initialization of our SLAM. Keyframes

trajectories of our method (green line), ORB-SLAM (blue line) [5], and ground truth (red line)

respectively are shown.

5.4. Real Time performance for All Previous Datasets

In Table 5, we estimate the performance of our proposed graph optimization algorithm of

SLAM system w.r.t well-known nonlinear optimization algorithm which is called global

Bundle Adjustment (BA) [18] used in ORB SLAM system [5]. We note that the speed and

efficiency of our pose graph optimization is higher than that of global BA of ORB SLAM

system because it runs in real time faster (3 X) than global BA as shown in figure 7. We also

use real video to test our system to show the robustness of our system for initialization of map

in frames as shown in Figure 8.



Table 5: Mean times of global optimizations of our algorithm.

Dataset Type,

Sequence Number,

and Number of

Frames

Our

Propose

d

lineariza

tion

Algorith

m

Global BA of

Nonlinear

Optimization

of ORB SLAM

[5]

KITTI_00_100 0.53452 1.5435

KITTI_01_100 0.46532 1.9648

KITTI_02_100 0.6542 1.8652

KITTI_03_100 0.54682 1.5968

TUM-

RGBD_fr1xy_150 0.6785 1.7568

TUM-

RGBD_fr1desk_150 0.75986 2.1564

TUM-

RGBD_fr2xyz_150 0.84563 2.5648

TUM-

RGBD_fr2desk_150 0.7896 2.9648

EuRoC _V1 _01

easy_200 0.9856 2.8652

EuRoC _V1 _01

easy_200 0.9856 2.8652

6. CONCLUSION& FUTURE WORK

We present a low cost visual SLAM allows the employment of a proposed linear method for

graph optimization. The proposed algorithm doesn't require initial conditions which makes map

initialization more robust. Our method can be used to initialize the bundle adjustment algorithm

in applications such as SLAM without needing to resort to heuristic methods to detect the

coplanarity of the points. We show that initializing SLAM produces accurate initialization

compared with state-of-the-art visual SLAMs. We provide a robust localization optimization

method for camera pose computation in the situations where camera localization becomes

unreliable in a very low-textured challenging environment. The pipeline of our system shows

an important advance in map generation in terms of semantic understanding and in loop closure

detection and correction using structural properties of edges in keyframes. We demonstrated

the estimation accuracy and robustness of our optimization method.

We presented the mathematical linear framework to compute keyframe poses and

uncertainties values for SLAM. The proposed method directly computes the 3DoF motion

parameters from a pair of frames and estimates the accuracy for the computed values. There

are also four degrees of freedom scale and three angles of rotations, that are computed

separately producing a consistent map free of drifts. The proposed system has an ability for

relocalization in a map optimization stage that has been previously visited and permits for

continued SLAM process for any general scene planar or non-planar in contrast to GSLAM

[26] which can’t be used for large scale environment, general scene, and feature reprojection

errors minimizations.

Our graph-based SLAM can compute the translation vector, scale, and rotation matrix

individually. We show that our approach optimization method can provide reliable and



promoted results compared to the newly state-of-the-art SLAM systems specially inaccuracy

initialization and in dealing with incorrect loops. Finally, we conclude that our system utilizes

a linear optimization method which provides only one accurate solution, in contrast to the well-

known nonlinear global optimization BA of ORB-SLAM [5] which provides multiple non-

accurate solutions for optimization. Furthermore, our system runs in real time faster (3 X) than

the well-known systems in the literature.

In Future work we can enhance tracking performance by using minimization of the

photometric residual corresponding to photometric intensity observation model and including

uncertainty of this error in our optimization.

REFERENCES

[1] M. Ozaki Hugh Durrant-Whyte and Tim Bailey, “Simultaneous localisation and mapping

(SLAM): Part I the essential algorithms,” Robotics and Automation Magazine, 13(2):99–

110, 2006.

[2] T. Bailey and H. Durrant-Whyte, “Simultaneous localization and mapping (SLAM): Part

II,” Robotics& Automation Magazine, IEEE, 13(3):108–117, 2006.

[3] S. Lynen, M. W. Achtelik, S. Weiss, M. Chli and R. Siegwart, "A robust and modular multi-

sensor fusion approach applied to MAV navigation," 2013 IEEE/RSJ International

Conference on Intelligent Robots and Systems, Tokyo, 2013, pp. 3923-3929.

[4] E.S. Jones, S. Soatto, “Visual-inertial navigation, mapping and localization: A scalable

real-time causal approach,” Int. J. Robot. Res., 2011, 30, 407–430.

[5] R. Mur-Artal, J.M. Montiel, J.D. Tardós, “ORB-SLAM: A Versatile and Accurate

Monocular SLAM System,” IEEE Trans. Robot. , 2015, 31, 1147–1163.

[6] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, W. Burgard, “g2o: A general

framework for graph optimization,” In Proceedings of the IEEE International Conference

on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3607–3613.

[7] S. Agarwal, K. Mierle, and Others. Ceres solver, site: http://ceres-solver.org, (accessed on

14 November 2017).

[8] Jiantong Cheng, Jonghyuk Kim, Jinliang Shao, “Robust linear pose graph-based SLAM,”

In Robotics and Autonomous Systems, May 2015.

[9] L Zhao, S Huang, G Dissanayake, “Linear MonoSLAM: A linear approach to large-scale monocular SLAM

problems,” In the IEEE International Conference on Robotics and Automation (ICRA), 2015,

1517-1523.

[10] N. S. underhauf and P. Protze, “Switchable constraints for robust pose graph SLAM,” In

Proc. of IROS, 2012.

[11] L. Carlone, A. Censi, and F. Dellaert, “Selecting good measurements via l1 relaxation: a

convex approach for robust estimation over graphs,” In Proc. of IROS, 2014.

[12] L. Kneip, R. Siegwart, and M. Pollefeys, “Finding the exact rotation between two images

independently of the translation,” In Proc. of ECCV, 2012.

[13] E. J. Candes and T. Tao, “Decoding by linear programming. Information Theory,” IEEE

Transactions on Robotics, 51(12):4203–4215, 2005.

[14] J. Civera, A. J. Davison, and J. M. M. Montiel, “Inverse depth parametrization for

monocular SLAM,” IEEE Trans. on Rob, 29(6):1052–1067, 2008.

[15] W. Tan, H. Liu, Z. Dong, G. Zhang, and H. Bao, “Robust monocular SLAM in dynamic

environments,” In Proc. of IEEE Symposium on Mixed and Augmented Reality (IS-MAR),

2013.

[16] J. Engel, T. Schops, and D. Cremers, “Lsd-SLAM: Large-scale direct monocular SLAM,”

In Proc. of ECCV, 2014.



[17] D. Galvez-Lpez and J. D. Tardos, “Bags of binary words for fast place recognition in image

sequences,” IEEE Transactions on Robotics, 28(5):1188–1197, Oct 2012.

[18] P. F. Triggs, R. I. McLauchlan, Hartley, and A. W. Fitzibbon, “Bundle adjustment - a

modern synthesis,” In Vision Algorithms: Theory andPractice, LNCS, pages 298–375.

Springer Verlag, 2000.

[19] A. Chatterjee and V. M. Govindu, “Efficient and robust large-scale rotation averaging,” In

ICCV. IEEE, 2013.

[20] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the

evaluation of RGB-D SLAM systems,” in IEEE/RSJInt. Conf. Intell. Robots and Syst.

(IROS), 2012, pp. 573–580.

[21] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI

vision benchmark suite,” in Computer Vision and Pattern Recognition (CVPR), 2012.

[22] M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R.

Siegwart, “The EuRoC micro aerial vehicle datasets,” The International Journal of

Robotics Research ,2016.

[23] P. Barooah and J.P. Hespanha, “Estimation on graphs from relative measurements,” IEEE

Control Systems Magazine, 27(4):57–74, 2007.

[24] Mohamed H. Merzban; M. Abdellatif, H. Abbas, S. Sessa, “Toward multi-stage decoupled

visual SLAM system,” Robotic and Sensors Environments (ROSE), 2013 IEEE

International Symposium on, vol., no., pp.172,177, 21-23 Oct. 2013.

[25] Jingchun Yin, Luca Carlone, Stefano Rosa, and Basilio Bona, “Graph-based robust

localization and mapping for autonomous mobile robotic navigation,” Conference: 2014

IEEE International Conference on Mechatronics and Automation (ICMA), vol., no.,

pp.123-177, August 2014.

[26] Chengzhou Tang ,Oliver Wang ,Ping Tan , “GSLAM: Initialization-robust Monocular Visual SLAM

via Global Structure-from-Motion,” International Conference on 3D Vision , Aug. 2017.

[27] Pyojin Kim, Brian Coltin, H. Jin Kim, “Linear RGB-D SLAM for Planar Environments,”

EECV, 2018.

M. H. Mahmoud received the B.S. degree in Communications

engineering from Faculty of Engineering - Fayoum University - Egypt in

2008 and the M.S. degree in signal processing engineering from Fayoum

University-Egypt in 2012. He is currently pursuing the Ph.D. degree in

computer vision engineering at Fayoum University-Egypt. From 2012 till

now, he is working as Lecturer Assistant at MSA University. His research

interest includes computer vision, signal processing, robotics, and SLAM

systems.

Nashaat M. Hussein received his B.Sc. in communication and

electronics engineering from Al-Azhar University—Egypt in 2002. In

2005, he received his M.Sc. degree in communication and electronics

engineering from (C.N.M.) National Center of Microelectronics, Sevilla

University—Spain. In 2009, he received his Ph.D. in Digital Integrated

Circuit Design for the Applications of Image processing from (C.N.M.)

National Center of Microelectronics, Sevilla University—Spain.

Currently, he is working as a Lecturer at Fayoum University—Egypt.

His research interest includes optimization of digital Image processing

techniques (image compression, enhancement, pattern recognition analysis, edges detections,

and image hiding data). Applications of these techniques (artificial vision, smart vision, SLAM

system).



Elsayed Hemayed received is currently working as a Professor of

computer engineering at the University of Science and Technology, Zewail

City, and the director of the Communication and Information Engineering

(CIE) program. He is on leave from Cairo University. He got his PhD from

University of Louisville, KY in 1999. He got the Graduate Dean’s Citation

Award and the John M. Houchens Prize in recognition of excellent doctoral

dissertation. He got his MSc and his BSc from Cairo University in 1992 and

1989. His research interest includes computer vision, robotics, and SLAM systems. Sites link

is https://sites.google.com/site/elsayedhemayed/home .

A FAST LINEARLY BACK-END SLAM FOR NAVIGATION BASED … › MasterAdmin › uploadfolder › IJCIET_09...association), e.g., between the key frame and map points at different points

Documents