An Orientation Invariant Visual Homing Algorithm · Collett’s snapshot model to make explicit use of the disparity in apparent size of visual features. The biological community

J Intell Robot SystDOI 10.1007/s10846-012-9730-5

An Orientation Invariant Visual Homing Algorithm

David Churchill · Andrew Vardy

Received: 3 January 2012 / Accepted: 13 July 2012© Springer Science+Business Media B.V. 2012

Abstract Visual homing is the ability of an agentto return to a goal position by comparing the cur-rently viewed image with an image captured at thegoal, known as the snapshot image. In this paperwe present additional mathematical justificationand experimental results for the visual homingalgorithm first presented in Churchill and Vardy(2008). This algorithm, known as Homing in ScaleSpace, is far less constrained than existing meth-ods in that it can infer the direction of translationwithout any estimation of the direction of rota-tion. Thus, it does not require the current andsnapshot images to be captured from the sameorientation (a limitation of some existing meth-ods). The algorithm is novel in its use of the scalechange of SIFT features as an indication of thechange in the feature’s distance from the robot.

D. ChurchillDepartment of Computer Science,University of Alberta, Edmonton, Canadae-mail: [email protected]

A. Vardy (B)Department of Computer Science,Memorial University of Newfoundland,St. John’s, Canadae-mail: [email protected]: http://www.cs.mun.ca/∼av

We present results on a variety of image databasesand on live robot trials.

Keywords Visual homing · Robot navigation

1 Introduction

Visual homing (VH) provides the ability for anagent to return to a previously visited positionby comparing the currently viewed image witha remembered image captured at the referenceposition. This allows the agent to return to thereference position from any nearby point with asufficient degree of visual similarity. In this paperwe provide formal justification for the Homingin Scale Space algorithm first proposed in [1].We also present additional experimental resultsthat demonstrate the algorithm’s effectiveness fora variety of different environments and on liverobot trials.

VH has been studied both as a model for localanimal navigation and as a tool for local robotnavigation. A particular model of the homingbehaviour of honeybees known as the snapshotmodel was proposed by Cartwright and Collett[2, 3]. This model proposes that honeybees canreturn to important locations in their environmentby pairing visual features between the currentimage and an image stored at the goal positionknown as the snapshot image. Disparities in both

J Intell Robot Syst

the position and size of features in the currentimage from the snapshot image are used to com-pute correcting vectors. These vectors are thensummed to produce an overall home vector. Thisgeneral strategy pervades much of the work onVH from both the biological and robotics commu-nities, including the work described in this paper.However, as far as we are aware the algorithmpresented here is the first since Cartwright andCollett’s snapshot model to make explicit use ofthe disparity in apparent size of visual features.The biological community have proposed variantsof the snapshot model, as well as alternative hom-ing strategies for a variety of species includinghoneybees [2–4], ants [5, 6], rats [7] and humans[8]. For social insects such as bees and ants ithas been argued that visual homing (sometimesreferred to as ‘image matching’) is a crucial com-ponent in their overall navigational strategy [9].

In robotics, a visual homing algorithm servesthe purpose of a ‘local control strategy’ whichKuipers and Byun described as “how a robot canfollow the link connecting two distinctive places”[10]. The chief limitation is that it can only beapplied in the immediate neighbourhood of thegoal location. There must be sufficient similaritybetween the current image and the goal imagefor an accurate home vector to be computed. Ifthe goal locations are spaced closely together andin sequence then VH can be used as a meansof executing learned routes through an environ-ment [11–13]. If the goal locations are distrib-uted throughout the environment, they can betreated as nodes in a graph. This representationis known as a topological map. To navigate usingsuch a map requires a localization system thatcombines sensory information with a model of therobot’s motion. VH then fills the role of movingthe robot between connected nodes. It can alsobe used in the discovery of new edges betweennodes [14]. This approach falls under the categoryof topological simultaneous localization and map-ping (SLAM). VH has been employed by a varietyof researchers on topological SLAM [14–17].

Visual homing can be considered a form ofqualitative navigation, in the sense of Dai andLawton where spatial learning and path planningproceed “in the absence of a single global coor-dinate system” [18]. This is in contrast with most

work on grid-based or metric SLAM where theproduction of a single coordinate frame map is theultimate goal. The difference lies in the degree ofaccuracy required to achieve the task at hand. Itis possible to visually home to a previously visitedposition even with inaccurate information aboutits direction. As long as the difference betweenthe robot’s direction of movement and the idealdirection is less than 90◦ the robot will eventu-ally reach home [19] (although naturally we strivefor higher accuracy). In the SLAM frameworkreaching a desired pose requires an accurate map,accurate localization of the robot within the map,and a further path planning stage. Methods ofqualitative navigation such as visual homing arepursued because they offer the possibility of ro-bust navigation with low computational cost.

The next section considers related work onthe visual homing problem. We then present themathematical formulation for the Homing in ScaleSpace algorithm. This is followed by a discus-sion of our experimental methods and results. Weconclude with a discussion of these results andsuggestions for future work.

2 Related Work

Existing methods for visual based homing canbe classified as either holistic or correspondence-based [20]. In the next two sections we will discussthese two classes of homing algorithms.

2.1 Holistic Methods

Holistic methods rely on comparisons betweenimages as a whole. An example of a holisticmethod is the method of Zeil et al. who posit asimple distance metric between images and im-plement homing as gradient descent in the spaceof this distance metric [21]. This method, whileelegant in its simplicity, relies on the existenceof a monotonic relationship between image dis-tance and spatial distance. It also requires smallexploratory movements of the robot in order todetermine the gradient of the image distance func-tion. Möller and Vardy described an alternativemethod based on gradient descent that removes

J Intell Robot Syst

the need for exploratory movements prior to com-puting a home vector [20].

Another holistic method is the so-called warp-ing method of Franz et al. [19]. We present thismethod in some detail as it used as a benchmarkfor comparison with our method. The warpingmethod searches for the parameters of motionwhich make the warped snapshot image mostsimilar to the current image. A warped snapshotimage is generated by transforming the snapshotimage as if the robot had actually moved accord-ing to the given motion parameters. To make thistransformation possible the assumption is madethat all objects are equidistant from the goal.This assumption is rarely satisfied in practise.However, in environments where the objects areall relatively distant from the goal it provides areasonable method of predicting the image thatwould result from small movements of the robot.A precise prediction would require a priori in-formation on the structure of the environment,which is presumed not to be available in thiscontext. The robot’s movement is described bythree parameters: α is the direction the robothas moved away from the goal, ψ is the changein orientation, and ν characterizes the distanceto the goal relative to an assumed average land-mark distance (see [19] for details). The snapshotimage is warped by iterating over a discretizedset of possible values for the movement parame-ters (α,ψ ,ν). This search is tractable because itoperates on one-dimensional images, which aresampled from the centre rows of two-dimensionalimages captured from the omnidirectional camerasystem. Despite the clearly unrealistic nature ofthe assumption that all landmarks are of equaldistance from the snapshot, the warping methodhas been found to perform robustly in various in-door environments and has emerged as a standardfor comparison for various visual homing methods[22, 23]. For this reason we utilize the warpingmethod to benchmark the performance of ouralgorithm.

There has been notable recent progress byMöller in extending the warping algorithm to op-erate directly on two-dimensional images [24] andin relaxing the assumption that all landmarks lieat an equal distance from the snapshot location[25]. Comparison of our method with these newer

variants of the warping framework is planned forfuture work.

2.2 Correspondence Methods

Correspondence based homing methods utilizefeature detection and matching algorithms to forma set of correspondence vectors between the snap-shot and current images. These vectors give theshift of the features in image space, known asthe image flow field (c.f. Fig. 1). The flow fieldformed by these correspondence vectors is theninterpreted to yield the direction of motion. Theseflow fields comprise both robot translation as wellas rotation. The separation of these two compo-nents of motion can be difficult, therefore mostcorrespondence methods posit the additional as-sumption that all images are counter-rotated tothe same compass orientation prior to calculat-ing homing direction. This process requires someform of compass, or a search for the change inorientation which would minimize the differencebetween the two images [21, 26, 27].

Vardy and Möller investigated the use of bothmatching and differential methods of optic flowfor visual homing [22]. They determined that ifboth snapshot and current images were capturedfrom the same orientation that the direction oftranslation could be computed analytically from asingle correspondence. The optic flow techniquesthey used could produce dense flow fields, with

Fig. 1 Ideal flow field for pure translation in a panoramicimage [20]

J Intell Robot Syst

home direction estimates produced for each vec-tor in the flow field. The resulting home vectorestimates were summed, which induced a cancel-lation of errors and resulted in very accurate androbust visual homing.

Various type of features have been utilized fordetermining correspondences, ranging in sophisti-cation from raw image windows [22] to descriptorsbased on the Fourier-Mellin transform [28]. Otherfeature types which have been used include Harriscorners [29], distinctive landmarks [11], and highcontrast features [2, 30, 31]. Recently, Scale In-variant Feature Transform (SIFT) features havegained great popularity in many areas of computervision and robotics due to the stability of their de-scriptor vectors with respect to changes in scaling,rotation, and illumination [32]. SIFT features havealso been used to perform localization and visualhoming [16, 33–36].

Pons et al. [35] use SIFT features in order to re-cover image orientation before implementing thestrategy of Vardy and Möller [22]. They search forthe mode of the horizontal component of corre-spondence vectors as an indicator of the rotationalcomponent of motion. This technique is similar toone proposed by Röfer which sorts the horizontalshifts of all features and determines the value thatwould make the sign of half of the shifts positiveand the other half negative [37].

Briggs et al. [34] deviate from the standard two-dimensional application of SIFT feature detectionby utilizing one-dimensional images in order toreduce processing time and memory. Using thesnapshot and current view images as the axes ofa graph, images are matched using SIFT featuresand the resulting correspondence curve is plotted.The direction of motion required to return to thegoal is then extracted from this matching curve.This technique has much in common with thatof the original warping method [19] and its morerecent two-dimensional variants [24, 25].

The method we present is similar to the cor-respondence methods described above in that itrelies upon finding correspondences between fea-tures. However, our interpretation of the resultingcorrespondences is markedly different. Considerthe flow field for pure translation of an agentequipped with an omnidirectional camera. Thefield has a characteristic structure with foci of

expansion and contraction separated by 180◦ (seeFig. 1). If objects are distributed uniformly in theenvironment, roughly half of them will appear tohave expanded, while the remaining half will ap-pear to contract. Typical correspondence methodsconsider how the features have shifted but notwhether they have expanded or contracted. Theproblem is that in the presence of rotation it be-comes much more difficult to determine the homedirection from feature shifts. Hence, the two-stageprocess referred to above. However, whether afeature has changed in scale is independent of anychange in orientation between the two views. Weutilize the change in scale of corresponding SIFTfeatures to move towards contracted features andaway from expanded features.

3 Homing in Scale Space

3.1 Notation

The robot’s current position and the snapshot (i.e.goal) position will be represented as position vec-tors c and s respectively. Let C and S represent theimages captured from these positions. Featuresextracted from an image will be denoted with thesame symbol, with a superscript giving the indexof the feature. For example, S j indicates the jthfeature extracted from the snapshot image.

One requirement of our method is that thedirection of translation be visible within the ro-bot’s field of view. Therefore we utilize panoramicimages that provide an omnidirectional field ofview in the horizontal direction (c.f. Fig. 3a).

We will refer to our method as Homing in ScaleSpace or HiSS.

3.2 Visual Homing

If c and s lie within the same plane, then the idealmovement from c to s can be described by thehome direction α and distance r (see Fig. 2). Somevisual homing algorithms (e.g. [22, 35]) requirethe change in robot orientation ψ to be knownprior to computing either α or r. The algorithmpresented here has no such requirement. Themethod for estimating α is presented below. In

J Intell Robot Syst

Fig. 2 The unknown quantities in the visual homing prob-lem. Thick arrows indicate the forwards orientation of therobot at c and s. The dotted line through c is parallel to therobot’s orientation at s

the experimental section we consider a variety oftechniques for estimating r.

If both α and r are known then the robot canmove to its goal in a single step. However, due tothe unknown scale of the environment it is oftenmore difficult to obtain an estimate of r than ofα. If only α is known then homing can still beachieved by making small steps in the direction ofα. This requires some sort of similarity measure todetermine when the robot has arrived at s. In ourexperiments on live robot homing, we consistentlyunderestimate r so that the robot moves towardsthe goal in smaller and smaller steps—a tech-nique that prevents excessive oscillation aroundthe goal.

3.3 Feature Scale Change

In the description of our method below, we makegeometric arguments on the basis of whether aperceived feature has expanded or contracted.That is, whether the object that generated thefeature is closer or further from the robot at thecurrent position than at some reference position.As opposed to estimating the distance to the fea-ture, we use the change in the scale parameterof SIFT features to indicate whether the featurehas expanded or contracted. Consider C j the jthfeature extracted from the current image:

C j = {C j,x, C j,y, C j,θ , C j,σ , Cj,d} (1)

The feature’s location within the image is (C j,x,

C j,y), its orientation is C j,θ , its scale is C j,σ , andits descriptor vector is Cj,d.

As far as we are aware, Homing in Scale Space[1] was the first visual navigation method to makeexplicit use of C j,σ (henceforth referred to as σ

if the context is clear). We have also recentlyemployed σ to localize a robot along a trainedroute [36]. Informally, σ is the effective amount ofGaussian blurring required for a feature’s distinc-tive characteristic to emerge (the distinctive char-acteristic being that the point is a local extremawith respect to both scale and space). Consider alandmark which yields one or more SIFT features.If the landmark is approached, it will take moreblurring for the corresponding features to be de-tected. Thus, σ increases as the distance betweenthe landmark and viewer decreases.

For our purposes we need only determinewhether the distance to a landmark has increasedor decreased with respect to a reference location.We utilize σ for this purpose. This substitutionis valid as long as σ decreases monotonically asdistance increases. Figure 3a shows a selection ofpanoramic images captured in the lobby of theS.J. Carew building at Memorial University. Atotal of ten images were captured at increasingdistances from a plaque on the wall. The top imageshows the positions of SIFT features extractedfrom the vicinity of this plaque (features lyingoutside the large rectangular region surroundingthe plaque were discarded). Subsequent imagesshow the matched features for images at distancesof 2.4, 4.8, and 7.2 m from the top image. Figure 3bshows the scale σ of matched features versus dis-tance from the reference location. A clear trendof decreasing scale with increasing distance is ob-servable. Although, there are a few exceptionssuch as the feature indicated by the heavy trace.

Let S j be the jth SIFT feature extracted fromthe snapshot image and Ck be the kth feature fromthe current image. If these features are matched,we can compute a quantity �σ which indicateswhether the feature has expanded or contracted.

�σ = S j,σ − Ck,σ (2)

If �σ > 0 then the feature has contracted. If �σ <

0 then the feature has expanded. A value of zeroindicates no detectable change in apparent size.

Consider a matched feature which is generatedby an object in the environment at position f . Letd(x, f ) represent the distance from a position x to

J Intell Robot Syst

(a)

0 1 2 3 4 5 6 7 80

2

4

6

8

10

12

distance

scal

e

(b)

Fig. 3 a Images taken from the lobby of the S.J. Carewbuilding of Memorial University. Overlaid are the locationsof features extracted from the vicinity of a plaque on thewall. b Plot of the relation between spatial distance and

feature scale for the features extracted from the top imagein a. The heavy trace indicates one feature which exhibitsan increase in scale with increasing distance, contrary to thegeneral trend of decreasing scale with increasing distance

the feature f . The relationship between d(x, f )and σ is not straightforward. It depends uponthe discretization of the scale-space pyramid, therelative positions of x and f , and the physical sizeof the object that generates the feature. Never-theless, we assume that when observing the samefeature from two positions such as c and s that thefollowing holds.

sign(�σ ) = sign(d(c, f ) − d(s, f )) (3)

The principles described below make use of thisrelationship, allowing us to compare the scalevalue of matched features and infer informationabout the sign of distance changes.

3.4 Principles

Homing in Scale Space is based on two simpleprinciples:

1. Move towards features that have contracted(�σ > 0).

2. Move away from features that have expanded(�σ < 0).

To determine whether a feature has expandedor contracted, we compute a set of SIFT featurematches from S to C. Let mi = (S j, Ck), representthe ith matched pair. We determine whether a

feature has contracted or expanded from the signof �σ as given in Eq. 2. If �σ = 0 then we excludethe feature pair, leaving a total of n matchedpairs where the feature has either expanded orcontracted from S to C. For each mi we use the an-gular position of Ck to define a partial movementvector vi which is a unit vector directed eithertowards the feature if it is contracted, or awayfrom it if expanded (details in Section 3.4.2). Allpartial movement vectors are added to produce anoverall movement vector h. The overall directionof movement α is then computed from h.

h = 1∣∣∑n

i=0 vi∣∣

n∑

i=0

vi (4)

α = atan2(hy, hx) (5)

Notice that h is given as a unit vector, althoughthis is not strictly necessary as we are only inter-ested in its direction α.

3.4.1 Principle 1

Consider the case of a contracted feature as shownin Fig. 4a. The robot’s orientation at c is shownby the short thick vector. Feature f is seen at anangle θ with respect to the robot’s orientation.This angle is sufficient to specify a unit vector

J Intell Robot Syst

(a) Single contracted feature (b) Multiple contracted features

Fig. 4 The perpendicular bisector of cs, denoted ρ, sepa-rates expanded from contracted features. Here features lieon the same side of ρ as s indicating contraction. a A singlecontracted feature is present. The partial movement vector

v is directed towards this feature, which is at an error angleof ε from cs. b Four contracted features are present. h isthe normalized sum of v1, v2, v3, and v4

v directed towards f , which makes an angle ε

with the line through cs. |ε| is the angular error,a value that would be zero in the ideal case ( f co-linear with cs). Also, shown is ρ the perpendicularbisector of cs. As long as f lies on the same side ofρ as s then the distance from c to f will be greaterthan the distance from s to f . Hence, the featurewill appear to have contracted and �σ should takeon a positive value.

The unit vector v represents a partial motionvector corresponding to contracted feature f .This vector would represent the ideal movementof the robot only if f was collinear with cs . If flies on the snapshot side of ρ then ε is constrainedto lie in the range [−π

2 , π2 ]. Further, ε = ±π

2 onlyif f lies directly on ρ at an infinite distance.

Thus, a movement towards a contracted featureat a finite distance yields a home vector with anangular error less than π

2 . Franz et al argue thathoming under this condition is convergent [19].Let d(c(t), s) represent the distance from c(t) tos where c is now a function of time t. Since the an-gular error is always less than π

2 movements alongv will yield a monotonic decrease in d(c(t), s). Ifguided by a single feature alone, the robot wouldreach a point at which the feature ceases to bea contracted feature. At this point, the followingcondition would hold,

d(c(t), f ) = d(s, f ). (6)

This condition indicates that c(t) (i.e. the robot)lies on a circle centred at f that intersects s. Wewill refer to this circle as the scale horizon forfeature f , so called because the feature’s scalechange �σ will change from positive to negativeas the circle is entered. The area enclosed by thescale horizon can be considered a dead zone withrespect to the contracted feature. Homing for asingle feature converges to this dead zone but goesno further. An example home vector field for asingle feature is shown in Fig. 5a.

When multiple contracted features are present,the scale horizons may have some degree of over-lap. Only points that lie in the intersection of allscale horizons will belong to the dead zone. Thus,as more features are added the dead zone willtend to shrink and will typically disappear entirelyafter the addition of just a few features. Examplesfor two and three features are shown in Figs. 5band c. In the case of Fig. 5c three features aresufficient to eliminate the dead zone. In summary,principle 1 yields convergent homing to an areacalled the dead zone. When multiple contractedfeatures are present the dead zone will typicallydisappear, yielding convergent homing to s fromall points in the plane.

While convergence to the dead zone as t →∞ is an attractive property, we would prefer anangular error as close to 0 as possible to minimizethe distance travelled. If we have an ensemble of

J Intell Robot Syst

(a) One contracted feature (b) Two contracted features

(c) Three contracted features

Fig. 5 Home vectors produced by the application of Eq. 5on contracted features only. A circle representing a fea-ture’s scale horizon surrounds each feature-generating ob-

ject f i. Only regions within the intersection of all scalehorizons lie in the dead zone. Such regions are shaded grey

contracted features, each denoted as f i, we cancompute a unit movement vector for each. If theangular distribution of features is approximatelyuniform, then the sum of all individual movementvectors h would point approximately towards s(see Fig. 4b). Yet even if the angular distributionof features was not uniform, the sum of individualmovement vectors would still exhibit an error less

than π2 , yielding convergent homing as described

above.

3.4.2 Principle 2

Principle 2 is illustrated in Fig. 6. Features f 1 andf 2 lie on the same side of ρ as c. Thus, they willappear to have expanded. The vectors v1 and v2

J Intell Robot Syst

Fig. 6 Feature-generating objects f 1 and f 2 lie on thesame side of ρ as c. Therefore, these features will have ex-panded in the current view image C and the correspondingpartial movement vectors v1 and v2 point away from them.Regions A and B are defined with respect to ρ and ρ′ asshown. The partial movement vector v1 for f 1 in region Ahas an angular error less than π

2 . However, the vector v2

for f 2 has an error greater than π2

are now directed away from their correspondingfeatures. For v1 this yields an angular error ≤ π

2 ,but not for v2. The difference is that f 2 lies inthe region between the perpendicular bisector ρ

and a parallel line through c called ρ ′. We will callthis region B (the bad region). Movement vectorsfor expanded features in region B exhibit angularerror greater than π

2 . Any expanded features notin region B will lie in region A. These features willyield convergent movement vectors since there isno dead zone associated with expanded features.

We can argue that region A will tend to bemuch larger than region B, and therefore will con-tain more features. If the distance d(c, s) is smallrelative to the size of the environment then thiswill likely be the case. If so, then the ‘good’ fea-tures in region A may outweigh the ‘bad’ featuresin region B. We have found this to be true in ourexperimental results. Also, if features are evenlydistributed However, it must be acknowledgedthat convergent homing can not be guaranteed forprinciple 2.

Implementation Details We use panoramic im-ages of our environment to represent views fromthe robot’s perspective. These images are w pixelswide by h pixels high and represent a completeviewing angle of 2π in the horizontal direction andγmax radians in the vertical direction. Each pixel

represents a spacing of δx radians in azimuth, andδy radians in elevation, computable by:

δx = 2π

wδy = γmax

h

We therefore can convert a feature Fi withpixel coordinates (Fi,x, Fi,y) to angular coordi-nates (θi, γi).

θi = δx Fi,x γi = δy Fi,y

For movements in the plane, only θi is required.We can compute a partial movement vector forfeature Fi, which is directed towards contractedfeatures but away from expanded features.

vi =

⎧

⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

[

cos θi

sin θi

]

if �σ > 0

[

cos(θi + π)

sin(θi + π)

]

if �σ < 0

(7)

Our method operates on pairs of features thathave been matched from S to C. These matchesare determined via the standard match criteriondescribed by Lowe [32] in which a match is ac-cepted only if it is significantly better than thesecond closest match.

4 Experimental Methods

4.1 Image Databases

Six image databases were used for testing. Foreach database a capture grid was defined on thefloor of the capture area. Images were capturedby a camera mounted upwards on a robot, viewinga hyperbolic mirror. Next, the image is projectedonto a sphere. The final image is obtained by sam-pling the image for positions on the sphere takenat constant angular increments. This representa-tion is convenient in that all pixels from a singleimage column correspond to the same azimuth,while all pixels from a single row correspond tothe same elevation. Sample images of this formatalong with information on the image databases arefound in Table 1.

The A1OriginalH, CHall1H, and CHall2H data-bases were captured at the University of Bielefeld.A1OriginalH was captured within the Robotics

J Intell Robot Syst

Table 1 Detailed information for each of the six image databases used

Sample image Name Image size Capture grid Grid spacing

A1OriginalH 561×81 10×17 30 cm

CHall1H 561×81 10×20 50 cm

CHall2H 561×81 8×20 50 cm

Kitchen1H 583×81 12×9 10 cm

Moeller1H 583×81 22×11 10 cm

ISLab 346×50 9×8 61 cm

Lab of the Computer Engineering Group, whileCHall1H and CHall2H are of the main hall ofthe university. Kitchen1H and Moeller1H werecaptured by Sven Kreft and Sebastian Ruwischin a small kitchen and living room, respectively.All visible objects remained stationary through-out the collection process. More details on thecollection of these databases can be found in [22](covering A1OriginalH, CHall1H, and CHall2H)and [38] (covering Kitchen1H and Moeller1H).All of these databases have been made publiclyaccessible at http://www.ti.uni-bielefeld.de/html/research/avardy/index.html.

The ISLab database was captured at the Intel-ligent Systems laboratory at Memorial University.The setting for the database is a lab with an offwhite floor lit by fluorescent lighting. Since it is anactive laboratory, some of the images contain peo-ple who move throughout the collection process.This active setting provides for a more challeng-ing environment for homing to take place, sincefeatures occasionally vanish or change locationsbetween images. The floor of the lab is tiled bysquare tiles which measure 30.5×30.5 cm. Imageswere captured on a grid equal to every second tilespacing. The area surrounding the image capturecan be seen in the floor plan depicted in Fig. 7.Additional details on the format of our images areavailable in [39].

In order to demonstrate the invariance of ourmethod to rotation, input images will be rotatedby a random amount before each test is per-

formed. A circular shift of the image simulates therotation of the robot about an axis perpendicularto the ground plane. Images are rotated by arandomly chosen angle θr in the range [0, 2π). Forsome experiments we will also simulate a changein elevation of the robot by shifting the image up-wards or downwards by a random amount vshift ∈[0, h), where h is the image height. Unlike the hor-izontal shift induced by a rotation, vertical shiftingwill leave some portion of the image undefined.These undefined pixels will be filled in with black.See Fig. 8 for an example of an image that hasbeen both rotated and vertically shifted. Thesevertical shifts allow us to test the robustness of ouralgorithm to changes in the position of the imagehorizon.

4.2 Configuration

We utilize David Lowe’s SIFT implementa-tion available from http://www.cs.ubc.ca/∼lowe/keypoints/. Our method operates best when alarge number of SIFT features have been ex-tracted. We therefore modified several parame-ters in order to maximize feature production,while still maintaining accurate results. The valueschanged from those of Lowe’s original implemen-tation are as follows:

1. The number of scales at which keypoints areextracted is increased from three to six to

http://www.ti.uni-bielefeld.de/html/research/avardy/index.html

http://www.ti.uni-bielefeld.de/html/research/avardy/index.html

http://www.cs.ubc.ca/~lowe/keypoints/

http://www.cs.ubc.ca/~lowe/keypoints/

J Intell Robot Syst

Fig. 7 Diagram of theIntelligent Systems Lab atMemorial University ofNewfoundland

(0,0) (0,8)

(9,0)

Desk

Wood Platform

Bookshelf

High Book Case

Filing Cabinet

Robot Parts

Ledge

Intelligent Systems Lab (ISLab)

Fig. 8 Images from the A1OriginalH database taken atlocation (1,1). Top image shows the original image taken bythe robot. Bottom image shows the image after a random

amount of rotation, plus a random vertical shift. The re-maining pixels after vertical shifting are filled in with black

J Intell Robot Syst

Fig. 9 Example of theapplication of HiSS. Forboth a and b the snapshotimage shown on top isfrom position (5, 8) ofA1OriginalH. Thecurrent image below isfrom position (3, 8). Linesbetween the imagesindicate matches betweenSIFT keypoints. Ina these lines connect thecontracted features in Cwith their matches in S. Inb expanded features areshown. The thin arrowindicates the true homeposition while the thickerarrow indicates thecomputed home directionby contracted (a) orexpanded (b) features

(a) Contracted features

(b) Expanded features

increase the number of overall keypoints,while maintaining feasible running time.

2. The peak threshold for the magnitude of thedifference of Gaussian values is decreasedfrom 0.08 to 0.01 in order to maintain more

keypoints from areas of low contrast, sinceindoor environments often contain such areas.

3. The ratio of scores from best to second bestfeature match has been increased from 0.6 to0.8. As discussed in [32], this change results

0 2 4 6 8 10

0

2

4

6

8

10

12

14

16

0

2

4

6

8

10

12

14

16

0

2

4

6

8

10

12

14

16

0

2

4

6

8

10

12

14

16

(a) HiSS, rot.

0 2 4 6 8 10

(b) HiSS, rot. + v. shift

0 2 4 6 8 10

(c) Warping, rot.

0 2 4 6 8 10

(d) Warping, rot. + v. shift

Fig. 10 Homing vector images with s set to (2, 3). The twoplots on the left show the application of HiSS with randomrotation (first, AAE=12.3◦) and combined vertical shift

(second, AAE=18.1◦). The two plots on the right show thewarping method with random rotation (third, AAE=39.2◦)and combined vertical shift (fourth, AAE=59.4◦)

J Intell Robot Syst

in a marginal decrease in match accuracywhile dramatically increasing the number ofmatches.

We use Ralf Möller’s implementation of thewarping method. Parameters for the warpingmethod were selected to ensure fairness with re-spect to running time. We selected a discretizationof 36 steps for all three movement parameters(α, ψ, ν). On an Intel Core2 2.13 GHz processor,

this parameter selection resulted in an averageexecution time for the warping method which was4.8 % faster per snapshot than our method. Weconsider this to be a fair metric for results com-parison.

4.3 Live Trial Implementation

Live robot trials were conducted using a PioneerP3-AT robot. The environment for the live trials

0 2 4 6 8

02

46

8

Angular Error Grid Plot − ISLab − Homing in Scale Space

0 2 4 6 8

02

46

8

Angular Error Grid Plot − ISLab − Warping Method

0 2 4 6 8 10

05

1015

Angular Error Grid Plot − A1OriginalH − Homing in Scale Space

0 2 4 6 8 10

05

1015

Angular Error Grid Plot − A1OriginalH − Warping Method

0 2 4 6 8 10

05

1015

20

Angular Error Grid Plot − CHall1H − Homing in Scale Space

0 2 4 6 8 10

05

1015

20

Angular Error Grid Plot − CHall1H − Warping Method

Fig. 11 Grids showing AAE results for ISLab, A1OriginalH, and CHall1H databases

J Intell Robot Syst

was exactly the same as described for the collec-tion of the ISLab database. Five different snap-shot positions were tested with the robot manuallypositioned at five different start positions at thestart of each trial. Trials were terminated in thecase of collisions with objects in the room or after12 individual movements had been completed.

5 Results

5.1 Performance Metrics

Given two images S and C, the ideal visual homingalgorithm computes α, the direction needed tomove in order to reach s from c. The robot will

0 2 4 6 8

05

1015

20

Angular Error Grid Plot − CHall2H − Homing in Scale Space

0 2 4 6 8

05

1015

20

Angular Error Grid Plot − CHall2H − Warping Method

0 2 4 6 8 10 12

02

46

8

Angular Error Grid Plot − Kitchen1H − Homing in Scale Space

0 2 4 6 8 10 12

02

46

8

Angular Error Grid Plot − Kitchen1H − Warping Method

0 5 10 15 20

02

46

810

Angular Error Grid Plot − Moeller1H − Homing in Scale Space

0 5 10 15 20

02

46

810

Angular Error Grid Plot − Moeller1H − Warping Method

Fig. 12 Grids showing AAE results for CHall2H, Kitchen1H, and Moeller1H databases

J Intell Robot Syst

then move in the direction of α and determinewhether or not it has arrived at the goal. In or-der to measure the accuracy of a given homingalgorithm, we use two different performance met-rics [22]. The first metric, angular error, is thedifference between α and the true homing direc-tion αideal. The second metric is the return ratio,which measures the number of times the robot wasable to successfully navigate to the goal location.Angular error results will be averaged over a setof start positions and describe only the error of in-stantaneous home vectors. The return ratio metricdescribes the overall success of a homing attempt.It is possible that in some pathological environ-ments the majority of home vectors are accurate,except those close to the goal. The inaccuratehome vectors close to the goal are in the minority

but may have the effect of preventing the robotfrom reaching the goal for many start positions.In this case, the average angular error metric willindicate successful homing but the return ratiometric will indicate unsuccessful homing.

For results on the image databases, we haveaccess to the true positions of both s and c. There-fore, we can compute the ideal home angle asfollows:

αideal(s, c) = atan2(sy − cy, sx − cx) (8)

thus, the angular error AE(s, c) can be found by:

AE(s, c) = diff(αideal − αhoming) (9)

Fig. 13 Databaseresults—angular error

A1OriginalH CHall1H CHall2H Kitchen1H Moeller1H ISLab

Database Visual Homing Results (Angular Error)

Database

Tot

al A

vera

ge A

ngul

ar E

rror

(D

egre

es)

020

4060

8010

0

HiSS 0pxHiSS 5pxHiSS 15pxHiSS 24px

Warp 0pxWarp 5pxWarp 15pxWarp 24px

J Intell Robot Syst

where diff() is a function that yields the differencebetween two angles. We can then obtain an over-all average angular error as follows:

AAE(s) = 1mn

m∑

x=1

n∑

y=1

AE(s, cxy). (10)

where AE(s, s) = 0.To obtain a measure of performance for the

entire image database we can define the overallaverage angular error OAAE(db), which com-putes the overall average of AAE for all snapshotimages in database db.

OAAE(db) = 1mn

m∑

x=1

n∑

y=1

AAE(sxy). (11)

The second performance metric is the returnratio. The return ratio is computed by carrying out

simulated homing trials on the capture grid of aparticular database. We declare a trial to be suc-cessful if the simulated robot was able to return towithin a given distance threshold of s. We defineret(c, s) as a binary valued function with a valueof 1 for successful homing and 0 for unsuccessfulhoming. ret(c, s) is evaluated on image databasedb as follows:

1. For positions c and s apply the homing algo-rithm on C and S to obtain αhoming.

2. Calculate the new position of the simu-lated robot by moving in the direction ofαhoming: cnew = (cx + round(cos(αhoming)),cy +round(sin(αhoming))).

3. If cnew = s, homing is successful. If cnew is out-side the boundary determined by the capturegrid, or is the same as a previously visited c

Fig. 14 Databaseresults—return ratio

A1OriginalH CHall1H CHall2H Kitchen1H Moeller1H ISLab

Database

HiSS 0pxHiSS 5pxHiSS 15pxHiSS 24px

Warp 0pxWarp 5pxWarp 15pxWarp 24px

Database Visual Homing Results (Return Ratio)

Tot

al R

etur

n R

atio

0.0

0.2

0.4

0.6

0.8

1.0

J Intell Robot Syst

(loop), then the trial is considered unsuccess-ful. Otherwise, return to step 1 with c = cnew.

If we iterate this process over all possible c forall possible s, we can determine the total returnratio T RR(db) as the percentage of homing trialsthat succeed.

RR(s) =m

∑

x=1

n∑

y=1

ret(cxy, sxy)/mn (12)

T RR(db) =m

∑

x=1

n∑

y=1

RR(sxy)/mn. (13)

Figure 9 shows an example of the operationof our algorithm. In this case the two imagesare taken from the same orientation for ease ofinterpretation.

5.2 Notation

Each test was performed using both homing meth-ods. Wherever ‘H’ or ‘HiSS’ is noted in a legendor table, it represents the results for the homingin scale space method. Wherever ‘W’ or ‘Warp’ isnoted in a legend or table, it represents the resultsfor the warping method. Since all tests were donewith a certain level of vertical shifting, wherever0 px, 5 px, 15 px, or 24 px is noted, it correspondsto the maximum random vertical shift for thatparticular trial. For example, ‘15H’ or ‘HiSS15’both refer to a trial performed by the hom-ing in scale space method under uniform(0, 15)

pixel vertical shift, where uniform(a, b) returnsa uniformly-distributed random number in therange [a, b ].

Table 2 Tables representing the results from the sign test applied to angular error data for HiSS-Warping

Database Samples Mean Median 95 % CI S-value P-value

Sign test (with Alt. Hyp. HiSS-Warping < 0)—no pixel vertical shiftA1originalH 28,900 −0.236 −0.054 (−π, −0.051) 11,873 2.2e-16Chall1H 40,000 −0.318 −0.120 (−π, −0.116) 14,252 2.2e-16Chall2H 25,600 −0.471 −0.255 (−π, −0.246) 8,571 2.2e-16Kitchen1H 11,664 −0.375 −0.111 (−π, −0.102) 4,497 2.2e-16Moeller1H 58,564 −0.197 −0.003 (−π, 0.0) 28,851 0.0057RobISLab 5,184 −0.707 −0.429 (−π, −0.399) 1,287 2.2e-16

Sign test (with Alt. Hyp. HiSS-Warping < 0)—5 pixel vertical shiftA1originalH 28,900 −0.287 −0.069 (−π, −0.066) 11,580 2.2e-16Chall1H 40,000 −0.556 −0.251 (−π, −0.244) 11,481 2.2e-16Chall2H 25,600 −0.517 −0.316 (−π, −0.305) 7,991 2.2e-16Kitchen1H 11,664 −0.309 −0.084 (−π, −0.074) 4,859 2.2e-16Moeller1H 58,564 −0.285 −0.052 (−π, −0.049) 2,6075 2.2e-16RobISLab 5,184 −0.841 −0.659 (−π, −0.621) 1,140 2.2e-16

Sign test (with Alt. Hyp. HiSS-Warping < 0)—15 pixel vertical shiftA1originalH 28,900 −0.778 −0.528 (−π, −0.513) 6,654 2.2e-16Chall1H 40,000 −0.734 −0.409 (−π, −0.399) 9,888 2.2e-16Chall2H 25,600 −0.724 −0.541 (−π, −0.525) 6,702 2.2e-16Kitchen1H 11,664 −0.307 −0.079 (−π, −0.067) 5,016 2.2e-16Moeller1H 58,564 −0.535 −0.243 (−π, −0.234) 2,0712 2.2e-16ISLab 5,184 −0.927 −0.915 (−π, −0.874) 1,133 2.2e-16

Sign test (with Alt. Hyp. HiSS-Warping < 0)—24 pixel vertical shiftA1originalH 28,900 −0.885 −0.718 (−π, −0.703) 5,903 2.2e-16Chall1H 40,000 −0.867 −0.638 (−π, −0.625) 8,217 2.2e-16Chall2H 25,600 −0.812 −0.697 (−π, −0.683) 6,057 2.2e-16Kitchen1H 11,664 −0.375 −0.133 (−π, −0.120) 4,796 2.2e-16Moeller1H 58,564 −0.609 −0.386 (−π, −0.376) 18,772 2.2e-16ISLab 5,184 −0.665 −0.606 (−π, −0.573) 1,512 2.2e-16

J Intell Robot Syst

5.3 Results on Image Databases

In Fig. 10 we see the results of homing to location(2, 3) in the A1OriginalH database from every

other location in the database. Computed homingangles are represented by unit vectors.

Figures 11 and 12 are of grayscale grids plottedfor each (x, y) location within each database. The

Fig. 15 Percentage matched vs. distance graphs for each database

J Intell Robot Syst

gray scale value for a particular location within adatabase is scaled from black (0) to white (max-imum of max(OAAE(hiss), OAAE(warping))

for a particular database). This view allows us tosee which locations in a particular environmentperform well (darker), or poorly (lighter). Notethat the aspect ratio for these figures is not 1:1, re-fer to the axes for coordinate information. Theseresults are summarized for the angular error met-ric in Fig. 13 and for the return ratio metric inFig. 14.

From Figs. 11–13 it appears that the OAAEfor our method is lower than that for the warpingmethod. However, we must show that there is in-deed a statistically significant difference betweenthe two methods. In order to determine whichtests to perform, we first analyzed the distributionof our data. For the angular error data, we usedthe Shapiro-Wilk, or W normality test [40, 41].Upon running the W test for each of the datasets individually, as well as all combined data setsas a whole, each test returned a result of p <

2.2e − 16, indicating that our data is not normallydistributed. For this reason, we used the sign testwhich is applicable even if the data is not normallydistributed [42]. The alternative hypothesis testedis that AE(hiss) − AE(warp) < 0, representingsuperior performance of HiSS over warping. AP-value < 0.05 is sufficient to support this alter-native hypothesis [42–44]. The results from thesetests are shown in Table 2. In all cases the P-valueis < 0.05 indicating superior performance of HiSSover Warping.

5.4 Distance Estimation

As mentioned in Section 3.2 visual homing canbe achieved by incremental movements in thedirection α. However, we can reach the goal moreefficiently using some estimate of the distance r.

In [39] we considered a variety of image andfeature-based measures in order to arrive at aquantity with a consistently high correlation withr. The best measure found was the percentage of

Table 3 Table of results for functions plotted in Fig. 15

Database Trial a b a std. err b std. err RSE

Islab 0 px Vert 10.06780 −5.85665 0.05436 0.04763 0.83355 px Vert 9.78177 −6.43592 0.06124 0.06236 0.9595

15 px Vert 9.26169 −6.53532 0.06932 0.07979 1.14424 px Vert 7.8007 −5.4061 0.0731 0.1028 1.494

A1OriginalH 0 px Vert 17.68985 −7.27702 0.03602 0.02122 1.245 px Vert 17.89868 −7.73030 0.03793 0.02307 1.269

15 px Vert 18.23209 −8.03045 0.04010 0.02428 1.28624 px Vert 18.11404 −8.17131 0.04154 0.02585 1.344

CHall1H 0 px Vert 23.85965 −8.45522 0.05915 0.02441 1.6685 px Vert 23.85637 −8.75030 0.06172 0.02625 1.734

15 px Vert 23.82518 −8.82529 0.06291 0.02698 1.77224 px Vert 23.32249 −8.90887 0.06561 0.02945 1.907

CHall2H 0 px Vert 23.37360 −8.50571 0.08667 0.03625 1.885 px Vert 23.71199 −8.87797 0.09259 0.03905 1.941

15 px Vert 24.10059 −9.11256 0.10206 0.04246 2.03724 px Vert 24.07061 −9.26291 0.10868 0.04574 2.144

Kitchen1H 0 px Vert 12.71268 −7.15042 0.07095 0.05730 1.4475 px Vert 12.91063 −7.58406 0.07962 0.06509 1.53

15 px Vert 13.26301 −7.77610 0.08449 0.06660 1.53824 px Vert 12.50825 −7.31220 0.09264 0.07631 1.76

Moeller1H 0 px Vert 20.28980 −8.72208 0.05359 0.03502 2.6855 px Vert 20.66887 −9.24524 0.05959 0.03914 2.787

15 px Vert 21.09868 −9.46355 0.06302 0.04028 2.80824 px Vert 21.01630 −9.41256 0.06539 0.04160 2.886

a and b correspond to the values output by performing non linear regression on function r = aeb M% . Standard errors for aand b , as well as the residual standard error (RSE) are also included

J Intell Robot Syst

SIFT keypoints matched, which we denote M%.Figure 15 presents plots of M% versus the true dis-tance r for all image databases. The relationshipbetween these quantities appears to be exponen-tial in nature:

r = aeb M% (14)

We used nonlinear regression using the R statspackage to find the best parameters a and b foreach image database. The results are overlaidon the raw data in Fig. 15. The overlaid graphsshow four functions (one for each of the 0, 5,15, 24 pixel vertical shifts). Note that due to thesimilarity of the resulting values of a and b thelines are difficult to distinguish. This indicates that

the estimated relationships between M% and rare relatively resistant to vertical shift. Table 3provides tables of the computed values of a and b .

We can see by Fig. 15 and Table 3 that thedistance estimation function fits nicely to the ex-ponential curve. The function also remains re-markably similar despite large vertical shiftingwithin the image (represented by the differentlines), making this method for distance estimationfeasible for environments without level movementsurfaces. One downside to this approach howeveris that as the true distance from the goal increases,so does the error in the function. At areas in thegraph where the slope of the computed functionhas a larger magnitude, similar values of M% canyield dramatically different distances. This would

Fig. 16 ISLab livehoming trial 1. The plotabove shows the positionsof the robot as itapproaches the goal areawhich is indicated by theshaded circle. The tablebelow gives informationon the final robot positionfor the correspondinghoming attempt. Distanceand error units in thetable are given in metres

ISLab Live Robot Trial 1

01

23

45

Y−

Axi

s (M

eter

s)

0 1 2 3 4 5

X−Axis (Meters)

J Intell Robot Syst

lead us to believe that this distance estimationmethod will be less accurate for long-range hom-ing, but become more accurate as we approach thegoal.

Another issue of note is the fact that this func-tion varies with image dimensions. Homing withinan environment using images with a height of 50pixels will yield a different distance estimationfunction than an image with a height of 100 pixels.Experimentally, we have found that as resolutionincreases, more keypoints are found, and a highervalue for M% results.

5.5 ISLab Trials

To test our algorithm on our live robot, we usedthe environment of the ISLab database. Five

different goal locations were chosen, with fivestarting locations for each goal location spacedevenly throughout the environment. The robottakes an image at its current location, compares itto the goal image, computes the estimated valuesfor r and α, turns in the direction of α and moves adistance of r/2. Moving the full distance r on eachstep can lead the robot to overshoot the goal andthen oscillate around it. We found that moving adistance of r/2 yields more stable behaviour. Thisprocess repeats until the robot believes it is within30 cm of the goal (success) or for a maximum of12 iterations (failure). Values for r were computedfrom the fitted exponential function of M% dis-cussed above on the ISLab database. A real-timedistance estimator is discussed in the future worksection.

Fig. 17 ISLab livehoming trial 2

0 1 2 3 4 5

01

23

45


X−Axis (Meters)

Y−

Axi

s (M

eter

s)

J Intell Robot Syst

It was our original intention to compare homingin scale space to the warping method live on therobot. However, the warping method was found tobe too inaccurate to carry out the trials. Of severaldozen initial tests, the robot would inevitably veeroff the allotted limits for navigation. We suspectthis is due to the nature of the images capturedby the robot. Due to unevenness in the floor andslight discrepancies in the diameter of the robot’swheels, both the height and inclination of therobot’s camera varied slightly as it travelled acrossthe floor. Since the warping method relies heavilyon the stability of the horizon within an image, webelieve that this variance caused enough shift ofthe image horizon to cause the warping method toperform poorly. Due to this, results for live trialsusing the warping method are not included.

We will define two types of success for our liverobot trials. Type A success means the robot cameto stop within both an estimated distance of 30 cmand an actual distance of 30 cm. Type B successmeans that at some point the robot came within atrue distance of 30 cm of the goal, but did not stopdue to error in its distance estimation. If the robotpassed within 30 cm of the goal at any point duringa trial, but estimated it was not within the thresh-old, we record it as having been an undetectedarrival (UA). Therefore, type B success is equiva-lent to any trial which recorded an undetected ar-rival without achieving type A success. Figures 16,17, 18, 19 and 20 show results for each goal po-sition, along with a table of the associated esti-mated distance, actual distance, and distance es-timate error for the final step of each homing trial.



0 1 2 3 4 5

X−Axis (Meters)

01

23

45

Y−

Axi

s (M

eter

s)

J Intell Robot Syst

For the 25 homing trials conducted, 21 resultedin type A success and four resulted in type Bsuccess. 14 of the trials resulted in the recordingof an undetected arrival, which means that themethod is actually getting closer to the goal thanits distance estimation function would lead us tobelieve.

This effect is illustrated in Fig. 21 where we plotthe relationship between the actual distance fromthe goal ra and the error:

rerr = |r − ra| (15)

As ra increases, so does rerr. Using the Spearmanmethod of correlation between these two values

yields a correlation coefficient of 0.784, whichstrongly reinforces this relationship. The secondgraph is a histogram of rerr, showing a possiblereason for the high number of UAs in the livetrials. The distance estimation function nearly al-ways returns a value which is higher than that ofthe actual distance to the goal, with a mean of0.462 m and a median of 0.195 cm. A possiblereason for this is the fact that the distance estima-tion function was computed from the ISLab data-base, in which images were spaced 61 cm apart.Since the main purpose of the distance estimationfunction is to detect close proximity to the goal, itwould be preferable to estimate this function withfiner-grained resolution—particularly for smallerdistance values.



0 1 2 3 4 5

X−Axis (Meters)

01

23

45

Y−

Axi

s (M

eter

s)

J Intell Robot Syst



0 1 2 3 4 5

X−Axis (Meters)

01

23

45

Y−

Axi

s (M

eter

s)

Fig. 21 Graph (left) of actual distance from goal ra vs. distance error rerr = |r − ra|, along with the error histogram (right)

J Intell Robot Syst

6 Discussion

Our tests have demonstrated the superior perfor-mance of our method over the warping methodfor all six image databases. Homing in scale spaceyielded a dramatically lower angular error, aswell as a higher return ratio than the warpingmethod. The random horizontal rotations and ver-tical shifts that were incorporated into the data-base experiments were included to demonstrateour method’s invariance to orientation changesand robustness to vertical image shifts.

Results from the live robot trials were in agree-ment with those from the image databases. Thetype A success rate was found to be 84 %. If wecombine this with type B successes, we see thathoming in scale space was able to bring the robotto within 30 cm of the goal in all cases. Theseresults were obtained in an environment wherethe warping method was unable to achieve anymeasurable success.

6.1 Future Work

Recall the value of �σ which was used to de-termine whether a feature was classified as con-tracted or expanded. In the case of imagescaptured at nearby locations, we could see manyvery small values for �σ . In the presence of cam-era noise and improper focus, the chance of mis-classification between contracted and expandedfeatures may be high. We experimented with athreshold parameter for filtering matches with lowvalues of �σ . However, the results were incon-sistent across different image databases. A moresophisticated classification strategy should be in-vestigated in the future.

Distance estimation is another area where im-provements can be made. The distance estimationformula used in our live robot trials was computedusing nonlinear regression based on data fromthe ISLab database. We propose that a distanceestimation function for a particular environmentcould be calculated using relative motion datacollected by an inertial measurement unit (IMU),thus eliminating the need for an existing database.Assuming that the robot captured the goal imageand then moved away from it (e.g. in the contextof learning a route or topological map of the

environment) we could estimate the true distanceto the goal via the IMU. The relationship betweendistance and M% could then be learned on-line.

In this paper we have shown that homing inscale space is invariant to rotations of an imageabout an axis perpendicular to the ground plane.However, we would like to demonstrate moreconclusively that the algorithm is invariant to any3D rotation. We have captured a database ofimages taken from a variety of roll, pitch, andyaw angles. However, since we are using the samecamera system as in this paper, the images arenot truly omnidirectional. Limitations in the field-of-view have an impact on the algorithm’s per-formance and we are still determining the bestway of analyzing these results. It would be inter-esting to extend our technique for application onunmanned aerial vehicles (UAVs). Since aerialvehicles travel in 3D we would need to augmentthe algorithm by computing both the angle ofazimuth (i.e. α) and the angle of elevation. Thischange could easily be accommodated by makingthe partial movement vectors defined in Eq. 7three-dimensional.

Visual homing techniques can be applied onlywhen the robot lies within the catchment area ofthe goal location. Our image database results indi-cate a catchment area covering the entire capturegrid (areas ranged from 1.08 to 50 m2). Neverthe-less, other techniques will certainly be requiredto guide the robot into the catchment area. Onesimple strategy is route-based navigation wherethe routes consist of sets of nodes with over-lapping catchment areas. We have investigatedsome methods for ensuring this overlap, but muchmore remains to be done [13, 36]. Beyond route-based navigation is complete topological navi-gation where route segments are concatenatedtogether to form a graph [45]. There has beenconsiderable work on this area, also known astopological SLAM in recent years [14–17, 46]. Weintend to apply the algorithm presented here bothin route-based and topological navigation.

7 Conclusions

We have described a method for performingvisual homing using the scale change of SIFT

J Intell Robot Syst

features. In fact, the method is not reliant on SIFTitself but only requires features with an associatedscale parameter. Numerous variants of the SIFTframework have been proposed and could be usedfor this purpose (e.g. SURF features [47]).

In this paper we have shown that homing inscale space performed significantly better than thewarping method, which has been widely used as abenchmark in the field of visual homing. Futurework will focus on demonstrations of the tech-nique in 3D and improving robustness to field-of-view limitations.

Acknowledgements Thanks to David Lowe for the useof his SIFT implementation and to Ralf Möller for hisimplementation of the warping method and for the use ofimage databases collected by his students.

References

1. Churchill, D., Vardy, A.: Homing in scale space.In: IEEE/RSJ International Conference on Robots andSystems (IROS), pp. 1307–1312 (2008)

2. Cartwright, B., Collett, T.: Landmark learning in bees.J. Comp. Physiol., A 151, 521–543 (1983)

3. Cartwright, B., Collett, T.: Landmark maps for honey-bees. Biol. Cybern. 57, 85–93 (1987)

4. Anderson, A.: A model for landmark learning inthe honey-bee. J. Comp. Physiol., A 114, 335–355(1977)

5. Wehner, R., Michel, B., Antonsen, P.: Visual naviga-tion in insects: coupling of egocentric and geocentricinformation. J. Exp. Biol. 199, 129–140 (1996)

6. Graham, P., Durier, V., Collett, T.: The binding andrecall of snapshot memories in wood ants (Formicarufa L.). J. Exp. Biol. 207, 393–398 (2003)

7. Morris, R.: Spatial localization does not require thepresence of local cues. Learn. Motiv. 12, 239–260(1981)

8. Gillner, S., Weiss, A., Mallot, H.: Visual homing in theabsecne of feature-based landmark information. Cog-nition 109, 105–122 (2008)

9. Collett, T., Collett, M.: Memory use in insect visualnavigation. Nat. Rev. Neurosci. 3, 542–552 (2002)

10. Kuipers, B., Byun, Y.-T.: A robot exploration andmapping strategy based on a semantic hierarchy ofspatial representations. Robot. Auton. Syst. 8, 47–63(1991)

11. Hong, J., Tan, X., Pinette, B., Weiss, R., Riseman, E.:Image-based homing. In: IEEE ICRA, pp. 620–625(1991)

12. Argyros, A., Bekris, C., Orphanoudakis, S., Kavraki,L.: Robot homing by exploiting panoramic vision. Au-ton. Robots 19(1), 7–25 (2005)

13. Vardy, A.: Long-range visual homing. In: Proceedingsof the IEEE International Conference on Robotics andBiomimetics. IEEE Xplore (2006)

14. Franz, M., Schölkopf, B., Mallot, H., Bülthoff, H.:Learning view graphs for robot navigation. Auton. Ro-bots 5, 111–125 (1998)

15. Hübner, W., Mallot, H.: Metric embedding of view-graphs: a vision and odometry-based approach to cog-nitive mapping. Auton. Robots 23, 183–196 (2007)

16. Goedemé, T., Nuttin, M., Tuytelaars, T., Van Gool, L.:Omnidirectional vision based topological navigation.Int. J. Comput. Vis. 74(3), 219–236 (2007)

17. Filliat, D.: Interactive learning of visual topologicalnavigation. In: IEEE/RSJ International Conference onRobots and Systems (IROS) (2008)

18. Dai, D., Lawton, D.: Range-free qualitative navigation.In: IEEE ICRA (1993)

19. Franz, M., Schölkopf, B., Mallot, H., Bülthoff, H.:Where did I take that snapshot? Scene-based homingby image matching. Biol. Cybern. 79, 191–202 (1998)

20. Möller, R., Vardy, A.: Local visual homing by matched-filter descent in image distances. Biol. Cybern. 95, 413–430 (2006)

21. Zeil, J., Hofmann, M., Chahl, J.: Catchment areas ofpanoramic snapshots in outdoor scenes. J. Opt. Soc.Am. A 20(3), 450–469 (2003)

22. Vardy, A., Möller, R.: Biologically plausible visualhoming methods based on optical flow techniques.Connect. Sci. 17(1/2), 47–90 (2005)

23. Zampoglou, M., Szenher, M., Webb, B.: Adaptation ofcontrollers for image-based homing. Adapt. Behav. 14,245–252 (2006)

24. Möller, R.: Local visual homing by warping of two-dimensional images. Robot. Auton. Syst. 57(1), 87–101(2009)

25. Möller, R., Krzykawski, M., Gerstmayr, L.: Three 2d-warping schemes for visual robot navigation. Auton.Robots 29(3), 253–291 (2010)

26. Burke, A., Vardy, A.: Visual compass methods for ro-bot navigation. In: Proceedings of the NewfoundlandConference on Electrical and Computer Engineering(2006)

27. Vardy, A.: A simple visual compass with learned pixelweights. In: Proceedings of the Canadian Conferenceon Electrical and Computer Engineering. IEEE Xplore(2008)

28. Rizzi, A., Duina, D., Inelli, S., Cassinis, R.: Unsu-pervised matching of visual landmarks for robotichoming using Fourier-Mellin transform. In: IntelligentAutonomous Systems vol. 6, pp. 455–462 (2000)

29. Vardy, A., Oppacher, F.: Low-level visual homing.In: Banzhaf, W., Christaller, T., Dittrich, P., Kim,J.T., Ziegler, J. (eds.) Advances in Artificial Life—Proceedings of the 7th European Conference on Ar-tificial Life (ECAL). Lecture Notes in Artificial Intel-ligence, vol. 2801, pp. 875–884. Springer (2003)

30. Weber, K., Venkatesh, S., Srinivasan, M.: Insect-inspired robotic homing. Adapt. Behav. 7, 65–97(1999)

31. Lambrinos, D., Möller, R., Labhart, T., Pfeifer,R., Wehner, R.: A mobile robot employing insect

J Intell Robot Syst

strategies for navigation. Robot. Auton. Syst., SpecialIssue: Biomimetic Robots 30, 39–64 (2000)

32. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110(2004)

33. Se, S., Lowe, D., Little, J.: Mobile robot localizationand mapping with uncertainty using scale-invariant vi-sual landmarks. Int. J. Rob. Res. 21(8), 735–758 (2001)

34. Briggs, A., Li, Y., Scharstein, D., Wilder, M.: Ro-bot navigation using 1d panoramic images. In: IEEEICRA, pp. 2679–2685 (2006)

35. Pons, J.S., Hübner, W., Dahmen, J., Mallot, H.: Vision-based robot homing in dynamic environments. In:Schilling, K. (ed.) 13th IASTED International Confer-ence on Robotics and Applications, pp. 293–298 (2007)

36. Vardy, A.: Using feature scale change for robot lo-calization along a route. In: IEEE/RSJ InternationalConference on Robots and Systems (IROS) (2010)

37. Röfer, T.: Controlling a wheelchair with image-basedhoming. In: Proceedings of AISB Workshop onSpatial Reasoning in Mobile Robots and Animals.Manchester, UK (1997)

38. Möller, R., Vardy, A., Kreft, S., Ruwisch, S.: Visualhoming in environments with anisotropic landmark dis-tribution. Auton. Robots 23, 231–245 (2007)

39. Churchill, D.: Homing in scale space. Master’s thesis,Memorial University of Newfoundland (2009)

40. Royston, P.: An extension of shapiro and wilk’s w testfor normality to large samples. In: Applied Statistics,pp. 115–124 (1982)

41. Royston, P.: Algorithm as 181: the w test for normality.In: Applied Statistics, pp. 176–180 (1982)

42. Gibbons, J., Chakraborti, S.: Nonparametric StatisticalInference. Marcel Dekker, New York (1992)

43. Kitchens, L.: Basic Statistics and Data Analysis.Duxbury (2003)

44. Lehmann, E.L.: Nonparametrics: Statistical MethodsBased on Ranks. Holden and Day, San Francisco(1975)

45. Franz, M., Mallot, H.: Biomimetic robot navigation.Robot. Auton. Syst., Special Issue: Biomimetic Robots30, 133–153 (2000)

46. Ferdaus, S., Vardy, A., Mann, G., Gosine, R.: Compar-ing global measures of image similarity for use in topo-logical localization of mobile robots. In: Proceedings ofthe Canadian Conference on Electrical and ComputerEngineering. IEEE Xplore (2008)

47. Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: SURF:speeded up robust features. Comput. Vis. Image Un-derst. 110(3), 346–359 (2008)

An Orientation Invariant Visual Homing Algorithm · Collett’s snapshot model to make explicit use of the disparity in apparent size of visual features. The biological community

Documents