This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

In this paper we deal with the problem of map building and localization of a mobile robot in an environment using the information provided by an omnidirectional vision sensor that is mounted on the robot. Our main objective consists of studying the feasibility of the techniques based in the global appearance of a set of omnidirectional images captured by this vision sensor to solve this problem. First, we study how to describe globally the visual information so that it represents correctly locations and the geometrical relationships between these locations. Then, we integrate this information using an approach based on a spring-mass-damper model, to create a topological map of the environment. Once the map is built, we propose the use of a Monte Carlo localization approach to estimate the most probable pose of the vision system and its trajectory within the map. We perform a comparison in terms of computational cost and error in localization. The experimental results we present have been obtained with real indoor omnidirectional images.

Two well-known problems in mobile robotics are building a map of the environment where the robot moves and computing its location within this map. Finding a relatively good solution to both problems is crucial during the autonomous navigation of a mobile agent, which is expected to have to take decisions about its localization in the environment and about the trajectory to follow to arrive to the target points.

During the past years, omnidirectional cameras have become a widespread sensor in mobile robotics mapping and localization tasks, due to their low cost, weight and power consumption and to the richness of the information they provide us from the environment. In this work, we use the information captured by a camera that is installed at a fixed position on the robot and pointing upwards in direction to a hyperbolic mirror. This system offers us omnidirectional images from the environment. Different representations of the visual information can be used when working with these catadioptric systems (

Different authors have studied the use of omnidirectional images both in robot mapping and localization. These solutions can be categorized into two main groups: feature-based and appearance-based solutions. In the first approach, a number of significant points or regions are extracted from each omnidirectional image and each point is described using an invariant descriptor. As an example, Se

With respect to the mapping problem, the current research can be classified into two approaches: metric and topological. The first one consists of modeling the environment using a map obtained with geometrical accuracy when representing the position of the robot in the map. For example, Moravec and Elfes [

The approach we use to carry out the mapping process is inspired by the work of Menegatti

Once the map has been built, it is necessary to test if the robot is able to compute its pose (position and orientation) within the map with accuracy and robustness while knowing its location is crucial for an autonomous agent, since the pose is needed for a precise navigation. The

In this paper we propose to solve the localization problem using omnidirectional images and global appearance-based methods, as we do in the mapping task. Concerning the Monte Carlo localization methods using global appearance information, some similar works can be found in the literature, as the one of Menegatti

Another related work is presented by Linaker and Ishikawa [

So, in this work, we focus on the analysis of different choices that permit building topological maps from a set of omnidirectional images, so that localization can be carried out within these maps. Our main objective consists of evaluating the feasibility of using purely global-appearance methods in these tasks. However, solving the SLAM problem and studying the features of the appearance descriptor is out of the scope of this work. This way, we present first a methodology to built a grid map of the environment using an incremental method, with several omnidirectional images captured along the environment. We use a mass-spring-damper model with this aim. The method is able to arrange these images to create a topological map whose layout is similar to the layout of the grid where the images were captured. Also, in a second phase, we propose different methods that allow us to localize the robot using a particle filter and new methods to refine the position of the robot rapidly. We have decided to describe each omnidirectional image by a single Fourier descriptor. However, the methods described here are in fact independent of the descriptor used to represent the images, and other appearance-based descriptors may also be applied.

The work is structured as follows. Section 2 presents the fundamentals of global appearance-based approaches and the processing on the images to obtain a robust descriptor to make the experiments. Section 3 shows the description of a method for batch topological mapping and a new technique for building topological maps incrementally. Section 4 deals with the Monte Carlo algorithm and its application to the problem of localization in mobile robotics using the appearance of omnidirectional images. In this section we describe the different types of weight employed to make the localization experiments. Next, Section 5 presents the results of the mapping and localization experiments. All these experiments have been carried out using a real robotic platform and several sets of omnidirectional images captured with this platform. Finally, we present the conclusions and future work in Section 6.

In this section we present the state of the art and the procedure we have followed to build the global descriptor of each omnidirectional image. These descriptors have been previously used by other authors [

The appearance-based techniques offer a systematic and intuitive way to construct the map and carry out the localization process. With these techniques, each image is described by means of a single global descriptor and the information from these descriptors is used to build the map (

PCA (Principal Components Analysis) is a widely extended method used to extract the most relevant information from a set of images [

Other researchers rely on DFT (Discrete Fourier Transform) methods to get the most relevant information from the images. In this case there are several possibilities, such as to implement the 2D Discrete Fourier Transform [

Based on a previous work [

Due the fact that the appearance of an image depends strongly on the lighting conditions of the environment to map [

In this section we present the methodology we have followed to describe the appearance of the scenes in an efficient and robust way. This descriptor must be robust against small changes in the environmental lighting conditions, the computational cost to obtain it must be low to allow mapping and localization in real time, and it has to be built in an incremental way so that it allows the map to be created while the robot is going through the environment (

Among the Fourier-based methods, the advantages of the Fourier signature are its simplicity, computational cost and the fact that it exploits better the invariance against ground-plane rotations using panoramic images. When we have an image _{j}_{x}_{y}_{n}_{0}, _{1}, . . ., _{Ny−1}} using the Discrete Fourier Transform into the sequence of complex numbers {_{n}_{0}, _{1}, . . ., _{Ny−1}}.

This Fourier signature presents the same properties as the 2D Fourier Transform. The most important information is concentrated in the low frequency components of each row, so we can work only with the information from the _{y}_{n}_{n}_{−}_{q}_{k}

We can separate the computation of the robot position and the orientation thanks to this shift theorem. Finally, it is interesting to highlight also that the Fourier Signature is an inherently incremental method.

To compute the difference between the appearance of two scenes, we use the Euclidean distance between the Fourier signature. If the bi-dimensional vector _{i}_{x}_{i}_{j}_{x}_{j}

When the mobile robot moves in a real environment, it has to cope with some typical situations that produce small changes in the appearance of the environment, such as the lighting conditions and changes in the position of some objects. This way, the descriptor built must be robust against these small variations in the environment.

There are different methodologies to provide robustness to the map created. In a previous work [

The homomorphic filtering allows us to filter separately the luminance and reflectance components of an image [_{0} is the filter cut-off frequency to construct the low pass filter, _{lp}_{h}_{l}_{hp}

Once the kind of information to store in the database has been studied, we need to establish some geometrical relationships between the stored locations to complete the mapping process. In other words, the set of scenes the robot has captured along the environment must be arranged so that the resulting map has a similar layout comparing to the actual layout where the images were captured. Several databases with images from different environments have been used to test the algorithms. All of them consist of a set of omnidirectional images captured on a regular grid. The main features of these databases are shown in

To build a topological map of the environment where the images’ layout matches up with the original layout, we make use of a mass-spring-damper model, where each image acts as a particle _{i}, i_{i}_{i}_{j}_{ij}_{ij}_{ij}

The strength acting over each particle _{ij}_{0} is the natural length of the _{ij}_{ij}_{i}_{j}_{ij}_{0} = _{ij}

Taking into account the resulting force on each particle, we can update the position of this particle at each iteration with the following expressions:
_{i}_{i}

An interesting parameter to adjust is the elastic constant _{ij}_{ij}_{ij}_{ij}, k

On the other hand, the damping constants _{ij}

The mapping process is strongly conditioned by the availability of the omnidirectional images along this process. When all the images are available before the mapping process begins, a batch mapping approach must be used. However, if the images are captured gradually and the mapping process must start before all the images are available, an incremental approach must be implemented where the map is updated when a new image is captured. These two approaches are addressed in the next subsections.

When we have a set of panoramic images from an environment, with no information about the order they were acquired in, a batch approach must be used to build the topological map. In this approach, all the particles are given a random initial position. Then, we leave the system evolve by successively applying

This approach may become computationally unfeasible for large environments where a big collection of images has been captured. The computational cost grows exponentially with the number of images, due to the increasing number of forces on each particle. Also, it must be taken into account that all the images must be available before starting the process so it is not possible to perform an online mapping process (where the map must be built as the explorer robot goes through it).

This approach exploits the fact that during the mapping process the storage order of the images is usually known. Thanks to the incremental approach, the map can be built online, and when a new image is captured, the map is updated according to it. As the information is added gradually, the process becomes more robust and computationally more efficient. When we have a balanced set of particles _{i}, i_{m}_{+1} arrives, we allow this particle to tend to balance (using _{m}_{+1} a random initial position as it can be approximately inferred from the position of the previous particle in the system, _{m}

Once the map is built, we need a mechanism to evaluate how similar is the layout of the resulting map comparing to the real layout of the captures. As the only information we have used to build the map is the distance between Fourier signatures, the resulting map is expected to have a similar shape comparing to the original grid but with a scale factor, a rotation and a possible reflection. These effects should be removed in the resulting map to get an acceptable measure of the shape difference between the original and the resulting map layout. With this aim, we use a method based on the Procrustes analysis. This method is a statistical shape analysis that can be used to evaluate shape correspondence [_{1}, _{1})^{T}_{2}, _{2})^{T}_{n}_{n}^{T}^{T}_{1}, _{1})^{T}_{2}, _{2})^{T}_{n}_{n}^{T}^{T}

Once the translation, rotation and scaling effects have been removed, the goodness of fit criterion is the sum of squared errors. Thanks to this analysis we can measure how accurate is the layout of the particles after the mapping process, comparing to the layout where the images were taken.

This analysis has a closed form, as detailed in [

In mobile robot localization we are interested in the estimation of the robot’s pose (location and orientation, typically, the state _{t}_{1:}_{t}_{1}_{2}_{t}_{1:}_{t}_{1}_{2}_{t}_{t}|z_{1:}_{t}, u_{1:}_{t}

The initial set of particles represents the initial knowledge _{0}) about the state of the mobile robot on the map. When we use a particle filter algorithm, in global localization, the initial belief is a set of poses drawn according to a uniform distribution over the robot’s map. If the initial pose is partially known up to some small margin of error (local localization or tracking), the initial belief is represented by a set of samples drawn from a narrow Gaussian centered at the known starting pose of the mobile robot. The

_{t−}_{1} and a control signal _{t}_{t}|x_{t−}_{1}_{t}_{t}_{t}|z_{1:}_{t−}_{1}_{1:}_{t}

_{t}_{t}|x_{t}

The resulting set _{t}_{t}_{t}|z_{1:}_{t}, u_{1:}_{t}_{t}

By means of computing a weight ^{i}^{t}_{1}_{2}_{N}_{j}_{j}_{j}_{j}_{j,}_{x}_{j,}_{y})_{j}, I_{j}_{j}

We consider that the robot captured an image at time _{t}_{t}_{j}_{t}

To carry out the localization process we have used the Monte Carlo algorithm explained in the previous subsection. The computation of the particle weight is a very important part of the Monte Carlo algorithm. We propose several methods that allow us to compute the weight of each particle

Particle Weight Method 1 (PW1): This method correspond with a sum of gaussians centered on each image landmark, considering the difference in the descriptors.
_{j}_{j,}_{x}_{j,}_{y}) − (x^{i}^{i}_{j}^{i},^{i}_{l}_{j}_{j}_{t}|_{j}_{t}_{t}|x_{t}

Particle Weight Method 2 (PW2): Unlike the previous method, this method consists of a product of gaussians centered on each image landmark, considering the difference in the descriptors.
_{j}_{j}_{l}_{d}_{j}

Particle Weight Method 3 (PW3): In this case, we have used a sum of gaussians centered on each landmark position and considering the difference in the descriptors as well as the orientation of the landmarks (images)
_{j}_{j}_{l}_{d}_{j}_{j}_{i}_{j}_{i}_{t}_{j}_{j}_{θ}

Particle Weight Method 4 (PW4): This weight method is a product of gaussians centered on each landmark position and considering the difference in the descriptors as well as the orientation of the landmarks (images).
_{j}_{j}_{j}_{l}_{d}_{θ}

Particle Weight Method 5 (PW5): In this case we use a gaussian distribution at the mass center of a discrete mass system. This method is inspired in a masses system, each one having a mass related to the similarity with the current descriptor _{t}_{j}^{i},^{i}_{j}_{j}_{f}

Particle Weight Method 6 (PW6): As in the previous one, this method uses a gaussian distribution at the center of a mass system, but in this case, the method obtains this center from a spring-mass system. The elastic constant of each spring is related to the similarity in the description, thus, landmarks more similar to the current observation try to attract the mass more tightly. To simplify the calculations, _{j}_{j}^{i},^{i}_{s}

Particle Weight Method 7 (PW7): This weight method consists of a triangular distribution and is inspired in the weight function introduced by [_{j}_{j,}_{x}_{j,}_{y}) − (x^{i},^{i}

We have designed a complete set of experiments in order to test the validity of the global appearance-based approach both in map building and in localization.

First, we have developed some experiments to know the accuracy and feasibility of the appearance-based approaches in map building. With this aim, we use several sets of omnidirectional images that have been captured in different structured and non-structured indoor environments (

Our main objective consists of testing the feasibility of the batch and the incremental mapping procedures exposed in Section 3. With this aim, we study some features that define the feasibility of the procedures, such as the accuracy of the map built (how similar it is comparing to the grid where the images were captured) and the computational cost of the method. We also study how these features are influenced by some typical parameters, such as the number of images, the distance between them (grid step), the degree of compression during the Fourier Transformation and the value of the parameters of the mass-spring-damper model.

The preprocessing of the omnidirectional images includes the transformation to panoramic, homomorphic filtering and Fourier signature calculation. Once all the Fourier signatures of the images sets are available, the spring-mass-damper procedure is applied (_{ij}

The parameter

The results shown in

At last,

In this section we show separately the results obtained for the

We can see an example of global localization using the method PW1 and 8,000 particles with our Monte Carlo algorithm in

To compare the performance of our

We have separated the global localization from the tracking to compare the weight methods with respect to the number of associations.

The second map we have tested is the

Both in the case of global localization (

At last,

In this paper we have studied the applicability of the approaches based on the global appearance of omnidirectional images in topological mapping and localization tasks. The main contributions of the paper include the development of an approach to build a topological map of the environment in an incremental way, the study of the influence of the parameters of the process in the layout of the resulting map and in the processing time, the analysis of an appearance-based Monte Carlo localization using omnidirectional images, the comparison between different weighting methods both in the case of local and global localization and the influence of the Fourier signature descriptor in the mapping and localization process.

All the experiments have been carried out with a set of omnidirectional images captured by a catadioptric system mounted on the mobile platform. Each scene is first filtered to avoid lighting dependance and then it is described through a Fourier-based signature that presents a good performance in terms of amount of memory and processing time, and it is also invariant to ground-plane rotations and an inherently incremental method.

We present a methodology to build incremental topological maps. As shown in the results, when the parameters of this system are correctly tuned, accurate results can be obtained, maintaining a reasonable computational cost. The incremental method clearly outperforms the batch one. However, in this method, it is necessary to arrive to a compromise between accuracy and processing time when choosing the number of Fourier components in the descriptor.

On the other hand, we have presented a Monte Carlo localization method using omnidirectional images and we have evaluated the performance of different weighting methods in the case of local and global localization, finding different behaviors. We have also evaluated the influence of the descriptor in the localization by varying the number of possible associations. Our system is able to estimate the position of the robot in the case of unknown initial position and it is able to track the position of the robot while moving. We proved how the accuracy of the methods varies with the type of weight and the number of particles and associations. In the evaluated methods, as we increase the number of particles in the system, the average error of localization decreases rapidly. With respect to the orientation, we obtain similar results. We have proved the strong dependence between the number of associations and the type of method used. Also, it is possible to correct the weighting of the particles by combining a physical system of forces with a Gaussian weight (PW6).

The experimental section includes results both for a structured environment, which presents repetitive visual aspects, and an unstructured one (a corridor and a laboratory, respectively). In both cases, we have obtained relatively good results in the mapping and in the localization processes.

We are working now in the fusion of these two systems to build a topological SLAM approach (Simultaneous Localization and Map Building) using just the global appearance of omnidirectional images, and different map topologies.

This work has been supported by the Spanish government through the project DPI2010-15308. ’Exploración integrada de entornos mediante robots cooperativos para la creación de mapas 3D visuales y topológicos que puedan ser usados en navegación con 6 grados de libertad’.

Omnidirectional and Panoramic scenes.

A robot rotation in the ground plane produces a shift in the columns of the panoramic images captured.

Bird eye’s view of the

Bird eye’s view of the

Euclidean distance between the Fourier signatures versus geometrical distance between the points where the images were captured.

Example of some intermediate steps during the batch mapping process of the

Example of some intermediate steps during the incremental mapping process of the

Processing time to build the topological map of

This figure shows the same results than

Trajectory average error in position versus associations number for all methods using 8,000 particles.

Trajectory average error in position versus associations number for all methods using 8.000 particles.

Relevant parameters of the image sets used in the mapping experiments.

5 × 10 | 56 × 256 pixels | 1 m | 6 × 11 m | |

10 × 15 | 56 × 256 pixels | 0.3 m | 3.5 × 4.5 m | |

6 × 8 | 56 × 256 pixels | 0.5 m | 3.5 × 4.5 m | |

12 × 9 | 56 × 256 pixels | 0.1 m | 1.3 × 1 m | |

11 × 6 | 56 × 256 pixels | 0.2 m | 2.4 × 1.5 m | |

35 × 10 | 56 × 256 pixels | 0.1 m | 3.6 × 1.1 m |