^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

Map building and localization are two crucial abilities that autonomous robots must develop. Vision sensors have become a widespread option to solve these problems. When using this kind of sensors, the robot must extract the necessary information from the scenes to build a representation of the environment where it has to move and to estimate its position and orientation with robustness. The techniques based on the global appearance of the scenes constitute one of the possible approaches to extract this information. They consist in representing each scene using only one descriptor which gathers global information from the scene. These techniques present some advantages comparing to other classical descriptors, based on the extraction of local features. However, it is important a good configuration of the parameters to reach a compromise between computational cost and accuracy. In this paper we make an exhaustive comparison among some global appearance descriptors to solve the mapping and localization problem. With this aim, we make use of several image sets captured in indoor environments under realistic working conditions. The datasets have been collected using an omnidirectional vision sensor mounted on the robot.

During the last years, omnidirectional cameras have become a widespread sensor in mobile robotics mapping and localization tasks, thanks to their relative low cost and the richness of the information they provide us with. When we mount one of these cameras on a robot, this information can be used to build a model or map of the environment and to estimate the position and the orientation of the robot within this map. There are many approaches to carry out these tasks. Amongst them, global-appearance techniques represent a very promising alternative. These techniques lead to conceptually simple algorithms since each image is represented by only one descriptor and the mapping and localization processes can be carried out by comparing these global descriptors. They also present some advantages over classical local features extraction and description methods, especially in dynamic and non structured environments, where it is difficult to extract and describe stable landmarks. However, when we apply them to solve a real time mapping and localization problem, some restrictions must be taken into account during the design of the algorithms.

In this work, a review and comparison is made taking into consideration different methods to extract the most relevant information from a set of images, based on their global-appearance. We propose to use several descriptors, based on Discrete Fourier Transform, Principal Components Analysis, Histograms of Oriented Gradients, and

For this purpose, we present the results of a set of experiments developed with several large databases composed of panoramic images, captured in different real indoor environments. We also study the effect of common situations that usually happen in real applications:

Changes in lighting conditions, due to the fact that the robot navigates within the environment at different times of day and with presence or not of artificial illumination.

Occlusions. People moving around the robot can temporary appear in the images, occluding part of them.

Noise produced by the vision sensor.

Visual aliasing. In indoors environments, it usually happens that two images captured from two distant points have a similar appearance.

The main objective is to demonstrate the applicability of the different descriptors to robotic mapping and localization tasks, and to measure their goodness and computational requirements. The experimental setup allows us to validate them and to make a detailed comparative analysis of the different techniques. We prove that it is possible to create an optimal model of the environment where the robot can estimate its position and orientation in real time and with accuracy, using just the information provided by an omnidirectional vision sensor.

Over last years, omnidirectional vision sensors have gained popularity thanks to the big quantity of information they provide, as they have a 360 degrees field of view around the robot; the stability of the features that appear in the images, since they last longer in the field of view as the robot moves; their relatively low cost comparing with other sensors and their low power consumption. These sensors are usually composed of a conventional camera and a convex spherical, parabolic or hyperbolic mirror (catadioptric system). The visual information can be represented using different projections: omnidirectional, panoramic of bird-eye view [

The first approach consists in extracting a limited number of relevant local features (such as points, lines or regions) and describing them using an invariant descriptor. Amongst the feature extraction and description methods we can highlight

The second approach works with each scene as a whole, without extracting any local information. Each image is represented by an only descriptor. These approaches have advantages in dynamic and unstructured environments where it is difficult to extract stable landmarks from the scenes. The main disadvantage is the high memory and time requirements to store the visual information and to compare the descriptors. The current methods for image description and compression allow us to optimize the size of the databases and to carry out the localization process with a relative computational efficiency.

The use of global appearance descriptors is an alternative to the classical methods based on the extraction and description of local features or landmarks. These approaches lead to conceptually simpler algorithms thus they constitute a systematic and intuitive alternative to solve the map building and localization problems. Usually, these approaches are used to build topological maps, which do not include any metric information. In these maps, the environment is often represented by a graph where nodes are images that symbolize distinctive places and links are the connectivity relationships between that places [

The key point of the global appearance approach is the description algorithm. Several alternatives can be found in the literature on this topic. Some authors make use of the Principal Components Analysis (PCA) to create visual models with mobile robots ([

We have not found in the related literature any work which makes a deep comparison between global description techniques. In this work we have selected several of the most relevant techniques. We have adapted some of them to describe panoramic scenes. We have also tested their performance depending on their main parameters and we have made a comparative evaluation among them. This comparison has been carried out from several points of view: we have tested them as a tool to solve the mapping and the localization problems (both global localization and probabilistic localization) and we have also taken into account the most relevant phenomena than usually happen in a real application: camera occlusions, noise, changes in lighting conditions and visual aliasing. All the tests have been carried out with two large sets of images captured under real working conditions.

The rest of the paper is organized as follows: in the next section we make a review of the main techniques to globally describe scenes. Section 4 formalizes the implementation of the description techniques to optimally solve mapping and localization tasks when we use panoramic scenes. Then, Section 5 presents the experimental setup, the images databases we have used and the results of the experiments. The work finishes with the discussion and the conclusion sections.

In this section we firstly make a general description of the map building and localization processes using the global appearance of scenes and secondly we revise the most relevant techniques for image description.

To solve the map building and localization problem using the global appearance of visual information, the first step consists in deciding how to represent such information. Working directly with the pixels of the images would be computationally very expensive. This way, first we will study some ways to globally describe the information in the scenes. To study the viability of these descriptors in map building and localization, we decompose the experimentation in two steps (1) learning and (2) validation.

In the first step, the robot is guided in a teleoperated way, through the environment to map. During this step, the robot acquires a set of omnidirectional images. We then compute the panoramic scenes and as a result we get the set _{1}, _{2}, …, _{n}_{j}^{Nx}^{×}^{Ny} represents each panoramic image.

From this set of images, a set of global descriptors is computed, one per original scene. As a result, the model of the environment is composed of the set of descriptors _{1}, _{2}, …, _{n}_{j}^{Mx}^{×}^{My}. Each one of these descriptors represents the main information in each scene. They should present some properties to be efficient in map creation and localization tasks:

Each descriptor should contain the main information in the original scene with a lower dimension _{x}_{y}_{x}_{y}

There should exist a correspondence between distance among descriptors and geometric distance between the points where the images were captured,

The descriptors should present robustness against some usual situations in mobile robots applications: occlusions in the scenes, changes in the lighting conditions, noise,

The computational cost to compute the descriptor should be lower enough to allow the robot localizing itself in real time.

It is recommendable that the descriptors can be built incrementally,

It is necessary that the descriptor includes some information about the orientation the robot had when capturing the image. This means that if a robot captures two images from near points in different orientations, the descriptors should allow us to compute the relative orientation.

In the next subsections we present the main description methods existing in the literature on this topic and their main properties.

The Discrete Fourier Transform of an image can be defined as:

Taking these facts into account, the amplitude spectrum can be used as a global descriptor of the scene, as it contains information about the dominant structural patterns and it is invariant with respect to the position of the objects. Some authors have shown how this kind of non-localized information is relevant to solve simple classification tasks [

However, this kind of descriptors which are purely based on the 2D-DFT do not contain any information about the spatial relationships between the main structures in the image. To have a complete description of the appearance of the scene it is necessary to include such information. A possible option based on the 2D-DFT is presented in [

Taking a panoramic image _{j}^{Nx}^{×}^{Ny} as our starting-point, after computing the FS we arrive to a new matrix _{j}^{Nx}^{×}^{Ny} where the most important information is concentrated in the low frequency components from each row. This way, we can retain just the _{1} first columns in the signature (_{1} < _{y}_{j}_{j}_{j}_{j}

The matrix _{j}_{n}_{n–q}} (being _{k}

Thanks to this property, the estimation of the robot position and the orientation can be made separately. Basically, we first we compute the Fourier Signature and we retain the first _{1} columns, _{j}^{Nx}^{×}^{k1}, then we compute the magnitude matrix and we use it to estimate the position of the robot and then we compute the argument matrix and we use it to estimate the orientation of the robot. Also, this is an inherently incremental method as the descriptor of each image can be computed independently on the rest of images.

When we have a set of panoramic images _{j}^{Nx}^{×}^{Ny}, _{x}_{y}_{j}^{Nx}^{×}^{Ny},

Using the classical formulation of PCA, as exposed in [_{j}^{Nx}^{×}^{Ny}^{×1}, _{x}_{y}_{x}_{y}_{j}^{K}^{×1} ^{T}_{x}_{y}

Some authors have applied PCA in mobile robots localization [

Histogram of Oriented Gradients (HOG) descriptors were first introduced in [

The experience with this kind of descriptors in robot mapping and localization is very limited. Hofmeister

Inspired in this description method, we have implemented an HOG descriptor applicable to panoramic images that offers rotational invariance and allows us to compute both the position and the orientation of the robot.

More recent works make use of the,

We can find in the literature few applications of this descriptor in mobile robotics. For example, Chang

In this section we detail how we have implemented robust and rotationally invariant descriptors to represent globally the panoramic scenes.

First, in the

Then, to carry out the _{t}_{t}_{1}, _{t}_{2}, …, _{tn}_{tj}_{tj}_{t}_{j}

Using a sorting algorithm we arrange these distances in ascending order. After that, we retain the closer neighbors. We name _{tj}

Once the position has been computed, the next step consists in estimating the orientation of the robot. With this aim, we compare the descriptor of the image captured at time

In the following sections we detail these steps for the four description methods compared.

The map is composed of a set of descriptors. Each descriptor is represented by two matrices: the modules _{i}_{x}_{1}, and a phase matrix Φ_{i}_{x}_{1}.

First, we use the modules matrix to estimate the position of the robot. We compute the distance between the modules matrix _{t}_{i}

Once the position has been computed, we estimate its orientation, using the argument matrix Φ_{t}

The Fourier Signature parameter we will try to optimize with the experiments is the number of columns retained from the signature, _{1} to arrive to a compromise between the computational cost and the accuracy during the localization process.

The PCA descriptor we use is described in the works of Jogan ^{T}

_{ij}_{l}_{0}, _{1}, …, _{N}_{−1}] is the first row of each block _{ij}

Since all the _{ij}

_{ij}_{ij}

Since _{k}

Thanks to this procedure, the problem of computing the SVD decomposition of

Since the projection basis is complex, so will be the coefficients in the projections of the images. It can be proved that the coefficients of an image and its rotated versions have the same modulus, with only a change in argument [_{ij}

To conclude, in the case of Rotational PCA, the database is made up of the projections (or descriptors) of each scene, arranged in a matrix _{x}_{y}

The localization process is carried out by projecting the input image at time _{t}

These are the steps we follow to build the descriptor:

_{x}_{y}^{T}_{x}_{x}_{y}_{y}_{x}_{y}

_{1}, as _{2} that allows us to estimate the robot orientation with precision, as shown on

The variable parameters of the HOG descriptor are the number of horizontal cells _{2}, the number of vertical cells _{3} and the width of vertical cells _{1}. As a result we get two descriptors, the first one will be used with localization purposes and the second one to estimate the robot orientation. Once built the HOG descriptors, then localization is estimated by calculating the minimum distance between the _{1} descriptor in the database an the current image. The orientation is obtained by successive rotation and comparison of the _{2} vector of the input image at time

Our

_{x}_{y}

_{4} orientations uniformly distributed between 0 and 180 degrees. As a result we get _{4} matrices per pyramid level with information on the analyzed directions. We apply it to the two first images of the pyramid so we get 2 · _{4} resulting matrices.

Once built the descriptor, the localization and orientation estimations are carried out using respectively the horizontal blocks descriptor and the vertical blocks descriptor, using the same procedure as in HOG.

The configurable parameters of the _{4} (b) the number of horizontal cells _{5}, (c) the number of vertical cells _{6} and (d) the width of vertical cells _{2}.

When a robot has to move autonomously in a real environment using vision as input data, it has to cope with the problem of changing lighting conditions. These conditions may vary considerably depending on the moment of the day and of the year and on the use of natural or artificial illumination. These changes will introduce perceptible changes in the appearance of the scenes.

After several works ([

In this section, we compare the performance of the four global appearance descriptors in the tasks of map creation and localization. For these purposes, we make use of two different images databases captured in different environments under realistic lighting conditions. We have carried out four different experiments with this goal. First, we evaluate the computational cost of building the representations of the database. We evaluate the necessary time and memory depending on the value of the most relevant parameters of the descriptors. Second, we test the performance of the descriptors to solve the global localization task as an image recovery problem. After that, we test the robustness of the descriptors to solve the same task, but when occlusions, noise or changes in lighting conditions are present. To end, we study the behavior of the descriptors to solve a probabilistic localization task, using the Monte Carlo algorithm.

In this section we first introduce the images's databases we have used to carry out the experiments and then we present the results of the four experiments.

We make use of two databases, captured with two different catadioptric systems (with different geometry). This fact does not affect the process to compute the descriptors since we approach the problem from a topological point of view. Therefore, a camera calibration process is not necessary. First, the

The second database, named COLD, has been captured by a third party [

The objective of this section is to compare the performance of the four descriptors during the task of creating a representation or map of the environment using the two images databases. We will show some results about the computational cost to build the map and the necessary memory to store it, depending on the value of the descriptors' parameters. In the following subsections we will make some additional experiments to test the utility of these representations in a localization task. After all the experiments, we will have the necessary information to know which is the best descriptor and the optimal parameters to arrive to a compromise between computational cost and accuracy in localization.

First, we show on

First, the main parameter of the Fourier Signature is the number of columns _{1} we retain to compose the descriptor. This descriptor is composed of a module matrix and a phase matrix, both with a size _{x}_{1}. From the figure we deduct that both the memory and time proportionally increase as we select more columns. Anyway, the increase in time is not significant because the cost of computing the DFT of each row is the same independently of _{1}, and the only difference is computing the module and phase of more or less components.

In the case of Rotational PCA, the database is made up of the projections (or descriptors) of each scene, arranged in a matrix _{x}_{y}

As for HOG, the parameter we have varied to shown these graphics is the number of horizontal cells, _{2} and in the case of _{4}, since in previous works we have shown how these are the parameters which have a greater influence in the behavior of the descriptor [

Comparatively, PCA is the computationally heaviest method, despite of using the properties of the circulant matrices to carry out the SVD decomposition. Fourier Signature is the fastest algorithm and

In this section we test the utility of the map created in the previous section to solve the global localization task. The robot has no information about its position at time

First we show on

We must remind that the results of Rotational PCA are given for a reduced version of the database with only 200 images. The PCA curve shows how the computational cost of the localization process is quite stable (the range shown at the y-axis is very short). This is due to the fact that the size of the descriptor is

We carry out the image recovering experiment (localization) using both databases. In the

We express the result of the image recovering experiments by means of

In

In _{2}). PCA presents the best localization results (when using a limited database of 200 images).

To finish this experiment, we are also interested in testing the performance of the descriptors when estimating the relative orientation between the test image and the retrieved image from the database. These data are shown in

In this section we test the performance of the descriptors in a localization task under some typical situations: different lighting conditions, occlusions and noise.

In the first experiment we make use of the COLD database. We have taken the images in the

The next experiment has been carried out with the

As far as occlusions are concerned, the precision clearly decreases when the percentage of occlusions increases. However, HOG and

To conclude this subsection, the results of global localization that we have obtained show how the behavior of all the descriptors gets worse when noise, occlusions or changes in lighting conditions are present. HOG and

Once we have carried out the global localization experiments, we are interested in testing the performance of the descriptors in a probabilistic localization task. In this section we present the formulation of the Monte Carlo algorithm we have implemented with this aim. In this problem we not only take into account the current observation but also all the data available till this moment: we try to estimate the robot's position and orientation _{t}_{1:}_{t}_{1}, _{2}, …, _{t}_{1:}_{t}_{1}, _{2}, …, _{t}_{t}_{t}

We have previously built a map of the environment where the robot moves, which is composed of a set of _{1}, _{2}, …, _{n}_{j}_{j}_{j}_{j}_{,x}, _{j}_{,y}), _{j}

To test the performance of the descriptors, we have decided to state this problem in a probabilistic fashion: we will estimate a probability function _{t}_{1:}_{t}_{1:}_{t}_{1:}_{t}_{1:}_{t}_{t}_{1:}_{t}_{1:}_{t}

The initial set of particles represents the initial knowledge _{0}) about the state of the mobile robot on the map. If we have no information about the initial position of the robot, the initial belief is a set of poses drawn according to a uniform distribution over the robot's map. If the initial pose is partially known up to some small margin of error (local localization or tracking), the initial belief is represented by a set of samples drawn from a narrow Gaussian centered at the known starting pose of the mobile robot. From this initial belief, the

_{t}_{−1} and the movement _{t}_{t}_{t}_{−1}, _{t}_{t}_{1:}_{t}_{−1}, _{1:}_{t}

_{t}_{t}_{t}_{t}_{t}_{t}_{1:}_{t}_{1:}_{t}_{t}

By means of computing a weight ^{i}_{t}_{t}_{j}_{t}_{j}_{j}_{,x}, _{j}_{,y}) − (x^{i}^{i}_{j}^{i}^{j}_{l}_{j}_{j}_{t}_{j}_{t}_{t}_{t}

To carry out this experiment, we make use of part of the

_{1}. HOG presents similar results when the number of cells _{2} is between 16 and 64, and _{4} is high, but the error in all cases is higher comparing to Fourier Signature.

As far as step time is concerned, _{2} is lower than 16. Fourier Signature presents a higher computational cost, and it increases as _{1} does.

_{1} = 32, HOG and _{2} = 16, and _{4} = 16. (a) shows the localization error and the dispersion of the particles. A sudden increase in this dispersion indicates visual aliasing (the nearest images in the database are in far points). In general, the dispersion is high at the beginning and decreases as new information arrives. The algorithm is able to recover from visual aliasing with the three descriptors. _{2} = 16. The blue dots are the positions of the map images, the black curve is the ground truth of the route followed by the robot, the red curve is the trajectory estimated using only the odometry data and the blue curve is the trajectory estimated making use of the probabilistic process. The robot starts at the bottom of the map (coordinates

Once we have shown how the descriptors behave in a probabilistic localization process under usual working conditions, to conclude with the experiments we test them in the resolution of the kidnapped robot problem. In this problem, a robot which is well localized during a probabilistic process is teleported to a different location without noticing it. This is a very interesting problem as it tests the ability of the localization algorithm to recover from serious localization errors or temporal failures of sensory systems.

To solve this problem with robustness, we have decided to make a slight variation of the Monte Carlo algorithm presented in the precedent section. During the resampling process, we have decided to add a new set of particles at random positions. This set of particles represents a low percentage of the number of particles in the total global set. When the robot is well localized, these random particles are expected not to affect the localization algorithm but, after the kidnapping of the robot, the random particles which are near the new position of the robot are expected to act as a seed that makes the probability distribution to tend to that real position.

In these experiments, 95% of the particles in the new set _{t}_{t}_{−1} using the SIR algorithm and the remaining 5% of the particles are sampled from a uniform distribution over the robot's map.

_{1} = 32; (b) HOG and _{2} = 16; and (c) _{4} = 16. In all cases the kidnapping is produced at the same point, during the ascending trajectory (the exact point is marked with a green circle). The descriptor which first recovers from the kidnapping is HOG. However it presents some problems of visual aliasing (see upper right corner in

Once we have presented the results, in this section we make a discussion of these results in the three fields we have analyzed: map building, localization and probabilistic localization. We have arrived to some general conclusions about the use of the four description methods. PCA with rotations presents, by far, the higher computational cost during the creation of the map. It makes this process unfeasible to model large environments. Also, comparing to the other three descriptors, PCA is not an incremental method. This means that, if we have created a map with a set of images and we want to add a new image to the map, the mapping process must be started from the scratch. This way, the whole map must be available before starting the localization process. By this reason, this method may be not advisable for certain tasks, such as SLAM (Simultaneous Localization and Mapping). Fourier Signature, HOG and

Comparing these three descriptors in a map building task, Fourier Signature needs, in general, more memory and

During the localization process, PCA with rotations is the quickest algorithm to estimate position and orientation. HOG is also very quick and Fourier Signature and

As far as the probabilistic localization process is concerned, the best results have been obtained with the HOG descriptor, as it presents a good compromise between average error and computational cost for an intermediate number of components (between 8 and 16 cells). Fourier Signature presents a good accuracy but the computational cost is higher and the results in accuracy are worse when using

In this paper we have studied and compared four approaches to describe panoramic scenes based on their global appearance. The methods we have studied are the Fourier Signature, Principal Components Analysis with Rotations, Histogram of Oriented Gradients and

The results presented in this paper show the feasibility of global appearance methods in mapping and localization tasks. We are now working on new description methods that improve the localization results, especially under occlusions and changes in lighting conditions, on new mapping methods to include more information about the relationships between positions and on solving the SLAM problem using global appearance.

This work has been supported by the Spanish government through the project DPI2010-15308. “Exploratión integrada de entornos mediante robots cooperativos para la creatión de mapas 3D visuales y topológicos que puedan ser usados en navegación con 6 grados de libertad”.

The authors declare no conflicts of interest.

_{1}), corresponding panoramic image (_{11}) and some samples of artificially rotated versions (_{12}, _{13}, _{14}, …), to carry out PCA with rotated images (

Graphical representation in the complex plane of two components of the projections of a 32 rotations set for an image.

Distribution of horizontal cells on a panoramic image to build a rotationally invariant descriptor _{1}.

Distribution of overlapping vertical cells on a panoramic image to build a descriptor _{2} that permits estimating the orientation of the robot.

_{4} = 4 Gabor filters with {0, 45, 90, 135}

Bird eye's view of (

COLD database. Necessary time to compute the representation (map) of the environment.

COLD database. Necessary memory to store the database.

COLD database. Computational cost of the localization process.

COLD database. Recall and precision curves for FS and _{1} = (_{2} = (_{4} = (

COLD database. Final precision results (expressed in parts per unit) depending on the parameter of each descriptor (

COLD database. Mean orientation error when comparing each test descriptor with its nearest neighbor in the database depending on the parameter of the descriptor (

COLD database. Precision in localization using the

Some examples of test images with different artificial occlusion percentage and with added Gaussian noise with different variances.

This figure shows the evolution of one of the experiments carried out with the three descriptors: (

This figure shows the average error during the localization process depending on the descriptor parameters (

This figure shows the evolution of three kidnapped robot experiments using (

Evolution of the localization error and the sample dispersion in the previous experiments.

Quorum database. Precision (%) in localization when the test images present occlusion or noise.

0 | 5 | 10 | 20 | 40 | 0 | 0.01 | 0.02 | 0.04 | 0.08 | |
---|---|---|---|---|---|---|---|---|---|---|

FS | 53 | 46 | 40 | 32 | 13 | 53 | 53 | 53 | 53 | 46 |

PCA rot. (*) | 67 | 62 | 54 | 38 | 5 | 67 | 64 | 63 | 63 | 62 |

HOG | 68 | 60 | 54 | 43 | 17 | 68 | 60 | 43 | 33 | 26 |

54 | 42 | 38 | 35 | 20 | 54 | 49 | 45 | 43 | 25 |