H-SLAM: Rao-Blackwellized Particle Filter SLAM Using Hilbert Maps

Occupancy Grid maps provide a probabilistic representation of space which is important for a variety of robotic applications like path planning and autonomous manipulation. In this paper, a SLAM (Simultaneous Localization and Mapping) framework capable of obtaining this representation online is presented. The H-SLAM (Hilbert Maps SLAM) is based on Hilbert Map representation and uses a Particle Filter to represent the robot state. Hilbert Maps offer a continuous probabilistic representation with a small memory footprint. We present a series of experimental results carried both in simulation and with real AUVs (Autonomous Underwater Vehicles). These results demonstrate that our approach is able to represent the environment more consistently while capable of running online.


Introduction
Robot localization is a fundamental problem in achieving true autonomy. Especially underwater, where global localization systems like Global Positioning System (GPS) are not available, vehicles have often to rely on Dead Reckoning (DR) navigation that drifts over time. This accumulated drift is problematic when constructing maps because a same geophysical feature may appear as a different one when it is re-observed after drifting.
To overcome this drift, systems like the Long Baseline (LBL), the Short Baseline (SBL), the Ultra-Short Baseline (USBL), the GPS Intelligent Buoyss (GIBs), or the single beacon navigation, are commonly used to provide absolute positioning fixes [1][2][3][4]. However, these systems require time for deployment and constrain the vehicle to their coverage area.
To avoid the use of external structures, a vehicle equipped with exteroceptive sensors such as sonars can make use of Terrain-Based Navigation (TBN) [5] to bound its navigational drift. However, detailed digital terrain maps are not always available. Moreover, those maps are mainly measured from surface ships, thus degrading their resolution as depth increases.
Another solution, is the use of Simultaneous Localization and Mapping (SLAM) methods [6,7], which do not require any external structures and neither a pre-obtained digital map. As in TBN, SLAM needs the use of exteroceptive sensors, mainly cameras or sonars. Although underwater cameras suffer from low visibility in turbid waters, they provide higher resolution and faster refresh rate while they are much cheaper than sonars. On the other hand, sonar sensors have lower resolution and refresh rate, but measure up to hundreds of meters regardless of water visibility issues.
Some of the most successful SLAM methods in the literature use a feature-based approach for SLAM [8][9][10]. Uniquely identifiable features are detected and associated to continuously correct the navigational drift and the learned map. However, underwater environments make robust feature 2.
Implement a new SLAM framework, the H-SLAM.
(a) Use sonar measurements with HM representation.
Capable of running online on an AUV.

3.
Simulated experiments and results of the method proposed.
(a) Experiment with a known map. Localization only (TBN).
Full SLAM experiment.

4.
Real experiments and results of the method proposed.
(a) Datasets obtained by an AUV.

Paper Organization
The paper is organized as follows. Section 2 describes the HM representation and the specifics on how to use it for map localization. Section 3 presents the Rao-Blackwellized Particle Filter (RBPF) used in conjunction with the HM representation for the H-SLAM framework. Section 4 describes the datasets used for testing the algorithms while Section 5 discusses the results obtained with them. Finally, in Section 6, we present the conclusions.

Hilbert Maps
HMs where recently introduced in [41] to offer a continuous probabilistic representation of the space given a collection of range sensor measurements. In other words, it offers a continuous occupancy map representation. Unlike traditional OGs, there is no cell resolution, so any point in the space can be queried. Moreover, it captures spatial relationships between measurements, thus being more robust to outliers and possessing better generalization performance and exploiting that environments have some inherent structure. For example, if two close points are observed occupied the space between them will have a higher probability of being occupied than free while no other measurements are obtained on the neighbourhood.
Developed as an alternative to the Gaussian Process Occupancy Maps (GPOMs) [42], they offer similar advantages at a smaller computational cost. While GPOMs have a cubic computational cost O(n 3 ), HMs computational cost is constant O(1). Instead of training the classifier directly on the training points x, HMs project them to a finite set of features or inducing points Φ(x), where a simple logistic regression classifier is learned. Those features dot product approximates popular kernels in the Gaussian Process (GP) framework k(x, x ) ≈ Φ(x) T Φ(x ), like the Radial-Basis Function. Furthermore, the logistic regression can be trained and updated using Stochastic Gradient Descent (SGD), making computation theoretically independent from the number of observations. Given a dataset D = {x i , y i } where x i ∈ R D is a point in the 2D or 3D space and y i ∈ {−1, 1} is the label corresponding to the occupancy of the point x i . HMs learn the discriminative model p(y|x, w) on the dataset through SGD. Once the model is learned, one can use the parameters w to predict the probability of occupancy of any query point x * as The most important parameters that define a HM are the learning rate of the SGD and features used. Regarding the learning rate η t , it can be constant or decaying with time. Regarding the features, many different features have been applied to HMs [41,43,44], and the basic parameters common to them are the feature_resolution f res , that defines how distant each feature are from each other, and the radius_neighbourhood r th that defines how far a feature affects its surroundings ( Figure 1). The closer the features are, the smaller the details that can be represented. The lower the radius, the less features affect the same point in space. The feature used in this work is a simple triangle feature defined as where r = ||f i − x|| 2 and f i is the position of the feature i. Features f i are spread at f res distance in a square grid and the neighbourhood that they affect is defined by the radius r th . When predicting the occupancy of a point x * , one must gather all the feature weights and multiply it by the value of the feature in that point Φ(x * ) according to (1). In the example shown, the query point is outside f 2 neighbourhood and thus, its contribution is zero.
Being a continuous representation features can be much farther than cells in a traditional OG, but achieve a similar representation at a much lower memory footprint. For example the map described in Figure 2    Comparison between OG and HM representation queried at same resolution. Notice that rounded corners are not the most desirable representation for structured environments, but for underwater scenarios is not usually a drawback.

Hilbert Map Learning and Raycasting
Learning a map from range sensors measurements and querying a point in the map, are both clearly defined in the seminal work of HMs [41]. To include range measurements, they are first discretized into single points. The point at the end of the range is labeled free if the range is maximum and occupied otherwise. Then, the rest of the ray (from vehicle position to measured range) is sampled randomly and labeled free every 1 or 2 m to properly cover the ray ( Figure 3). Those points and labels are learned into the HM. However, to develop a SLAM framework based on HMs, it lacks a necessary raycast method to compare the real range measurements with the expected range measurements that the vehicle would have according to the learned map. On grided OG maps, the cells are queried through the ray path until an occupancy value bigger than a threshold is found [45]. Our HM raycasting method is inspired by the one developed on GPOMs [46].
The raycast starts from the vehicle position in the HM and points in the same relative direction as the real measurement. Points at increasing distance from the vehicle are queried in the HM to obtain the occupancy value ( Figure 4). This distance is defined as the query resolution. When a query point has an occupancy value bigger than a threshold, this point is considered a hit (occupied) and no more points are queried. To get the exact position where the threshold was crossed, a linear interpolation between the hit point and the point previous to the hit point is computed. Finally, the raycasted range is the distance between the vehicle position and the result of the linear interpolation.

Rao-Blackwellized Particle Filter with Hilbert Maps
AUVs are often loaded with a handful of sensors to provide proper positioning. Depth sensor, Attitude and Heading Reference System (AHRS) and Doppler Velocity Log (DVL) provide excellent positioning except for the x and y axis in the absence of GPS, SBL, LBL, USBL or GIB. To assess this positioning one can represent the state of the vehicle as a RBPF [47]. Here, states directly observable using vehicle sensors are removed from the PF and are tracked by a single Extended Kalman Filter (EKF) shared by all particles whose state vector is where z k k is the depth of the vehicle in the world frame, [u k v k w k ] are the velocities in the vehicle frame at the time k. The vehicle orientation φ k , θ k roll and pitch and the yaw rateψ k in the world frame are taken as inputs u k of the EKF prediction model and are not estimated. The remaining states are estimated by the PF, where each particle is defined as where i is the particle index and [x i k y i k ψ i k ] T are the positions and the yaw in the world frame, w i k is the weight of the particle and m i k is the HM of the particle. The particle filter is initialized from the on-board DR filter if an absolute positioning system is available. Otherwise the filter is initialized at the origin for x, y and uses the current sensor measurements to initialize the state model.

State Propagation
At each sensor measurement, the EKF is predicted to the time of the observation. A simple constant velocity model is used for the prediction as where t is the time increment from the previous prediction, u k = [φ k θ kψk ] T is the input control vector and n k = [n u k n v k n w k ] T are the acceleration noises in the linear velocities. Note that noises in roll and pitch [n φ k n θ k ] are so small that can be considered negligible and are not taken into account. Covariance is also predicted as Each particle is also predicted forward by randomly sampling the uncertainties of u k , v k from the EKF and a user specified yaw rate uncertainty σψ. The velocities and their covariances are transformed for each particle from the body frame to the world frame {W} as where Rot(φ k , θ k , ψ i k ) is a rotation matrix given the attitude Euler angles and P u k ,v k ,w k is the 3 × 3 sub-matrix of P k containing the velocity uncertainties. Those obtained values are used to predict each particle positions as whereψ is taken from u k .

State Update
Once the prediction has been computed up to the time of the sensor measurement, the EKF state can be updated with the common EKF update equations. The measurement function is defined as where z k is the measurement, H k defines which states are observed and v k is the noise of the measurement.
Depending on the different measurements z k provided by the different sensors (see Section 4.2) the H k matrix will change. For example, the depth sensor provides depth measures and it is defined as DVL sensor provides velocities in the vehicle frame, and thus it is defined as Finally, AHRS sensor provides orientation in roll and pitch, and angular rate in yaw [φ θψ] that are saved in the input control vector u k .

Weighting, Learning and Resampling
Once a sonar measurement is received, it is segmented according to the returned intensities to obtain a single range and occupancy value. If no significant intensity is found, the range is set to the maximum range value and the measure is set to free. Otherwise, the range is set to the range of the highest intensity and the measure is set to occupied.
If it is an occupied measurement, its range r meas k is compared with each particle map m i k to update the particle weight. The expected range measurement r i,cast k is obtained by casting a ray as described in Section 2, from the particle position in their respective HM m i k . The weight update per each particle is proportional to the difference of those ranges where σ r is the range measurement covariance. This can be thought as a measure of self-consistency of the each particle HM.
After particle weighting, the measurement is learned in each m i k to be used in future weightings and to properly reconstruct the environment. These ranges are first sampled and then learned as points as explained in Section 2.
Finally, the well known Sequential Importance Resampling (SIR) is used each time the number of effective particle N e f f falls below half of the number of particles (N e f f < N/2) [48].
Please note that in the case of TBN, particles carry no HMs and there is a single shared HM. This shared map is only learned beforehand and never updated. The learning step is suppressed in this case.

Datasets
The proposed H-SLAM framework was tested on several datasets. First on a synthetic dataset to ensure correct implementation and to be able to compare against ground truth, and then with two underwater datasets, one structured and one non-structured, gathered by an AUV.

Simulated Dataset
This dataset is used as a proof-of-concept of the algorithms. The dataset is generated from a set of 53 vehicle poses in a 2D map where 36 range measurements spaced 10 • around the vehicle are obtained for each pose (Figure 5a). The increments between the poses are obtained, then linear and angular gaussian noises are added to obtain the odometry measurements. The range measurements are also corrupted by gaussian noise (Figure 5b). When predicting particles, odometry increments [∆x ∆y ∆ψ] are combined with gaussian noise [σ lin σ lin σ ang ] to obtain particle positions.
This dataset is used for both TBN and SLAM. For the TBN case, the original map is sampled at 0.2 m resolution and those points are used to learn its HM representation ( Figure 6). Then this map is used to localize the particles. On the SLAM case, only the noisy odometries and ranges are used as input to the filter because each particle learns its own HM m i k . Using only odometry increments and ranges simplifies the filter explained in Section 3. Each particle state is propagated by compounding their current position with the noisy odometry increments.

Real-World Datasets
These datasets were obtained with Sparus II AUV [49] equipped with a Tritech SeaKing Profiling Sonar for range measurements. The Sparus II AUV provides depth information from a pressure sensor, velocities and altitude from a DVL, and attitude from an AHRS. The profiler is mounted at the payload space of the AUV (Figure 7). With those sensors the AUV is capable to provide a DR navigation that drifts over time as can be observed in the following datasets. Both datasets were taken along Sant Feliu de Guixols' coast ( Figure 8) at a constant depth, during the experiments regarding [50] trials.
No GPS or USBL were available to provide global corrections to the navigation drift or to provide a ground truth to compare with. The profiler provides a 120 • FOV in the front of the vehicle at 1.8 • angular increments. This forward-looking configuration complicates the SLAM in the sense that until a loop is closed, same locations are not measured again.
Each ray measurement provides ranges from 0 m to 10 m at 0.025 m resolution with their corresponding intensity values. Those rays are thresholded according to a minimum and maximum range, and a minimum return intensity to obtain a range measurement to be used in the H-SLAM filter.
The first dataset was taken on the man-made breakwater structure outside of the harbour. The three most eastern blocks of around 14 × 14 m with a spacing of 5 m were surveyed with the AUV (Figure 9). The dataset contains a total of 12,412 range measurements over 15 min mission at 1.5 m constant depth. As can clearly be observed on the figure, when the vehicle returns to the starting point the drift is clearly noticeable. This dataset contains three loop closes, where same features are re-observed after going around each of the three blocks.  The second dataset was taken on the natural rock structure next to the so-called Punta del Molar. Like the previous dataset, the AUV navigated around the rock (Figure 10). The dataset contains a total of 14,417 range measurements over 17 min mission at 2.5 m constant depth. Likewise the first dataset, the drift is clearly observable when the vehicle returns to the starting position.

Results
All the tests on the different datasets were run with the similar parameters to ease the comparison of results (Table 1). Feature resolution and radius of the neighbourhood were increased for the real datasets since they are bigger than the simulated one and have less details. Range covariance was also increased due to the bigger errors obtained when dealing with real sensors.

Simulated Dataset
The simulated dataset is first used on a TBN experiment, where the HM is first learned from samples as explained on Section 4.1. This map is shared between particles being only queried to modify particle weights according to the differences between measured and casted rays. The results are compared against the ground truth but also against the provided odometry inputs in a DR filter that simply composes them ( Figure 11).
As can be observed, the TBN corrects the vehicle trajectory reducing significantly the position error. While the DR filter error keeps increasing, the TBN error is maintained almost constant around 0.4 m (Figure 12).   As expected, moving to SLAM increases the error and the correction of the trajectory is lower than in the TBN case ( Figure 13).
However, errors continue to be bounded although they are much higher due to the nature of map incremental learning and self-consistency checks ( Figure 14).
Another way to compare the results is to compare the map learned using ground truth odometry and measurements against the map learned by the DR filter and the H-SLAM filter ( Figure 15). In this case, the representation obtained by the H-SLAM is much more close to the ground truth one than the one obtained by the DR filter.

Breakwater Dataset
On the case of the breakwater dataset, we can observe a quite problematic area in the small corridors between the blocks. A multipath echo is clearly present when looking to the east. This multipath returns a maximum range which is interpreted as a completely free ray. This problem is clearly visible on the rightmost block, causing the HM to represent it hollow.
When the H-SLAM is applied to the breakwater dataset, the result clearly improves over the trajectory, providing more consistent sizes for the blocks and avoiding the double wall at the end of the dataset (Figure 16). Observing the reprojected measurements on the corrected trajectory, no major drifts are observed. The validity of H-SLAM approach can be seen when comparing the results with the satellite images because they maintain the same structure as they have underwater ( Figure 17). Covariance (m) x y Figure 18. Covariance of the particles over time for the Breakwater dataset.

Rocks Dataset
On the rocks dataset the same parts of the map are not observed until the trajectory finishes. Incremental corrections are made during the whole dataset and at the end a loop-closing is achieved. Several outliers are observed at the boundaries of the dataset due to proximity to other rock formations ( Figure 19). Although the natural rock structure does not maintain the same structure underwater, when comparing the results with the satellite images, the validity of H-SLAM can be seen ( Figure 20). Furthermore, the small occupied spots on the south-western part of the explored zone are clearly caused by the nearby rock structures.
Finally, observing the covariance of the particles over time (Figure 21), a loop closing event is observed at around 900 s that corresponds to revisiting the initial area.

Performance
H-SLAM was not run online when obtaining the datasets, but from previously obtained datasets saved in a rosbag file. This file, part of the Robot Operating System (ROS) [51], allows to replay data exactly as how it was obtained. In this case, the algorithm was run as fast as possible through the bagfile to compare the time it took to gather data (total available time for execution) against the time needed to compute the H-SLAM solution ( Table 2). As can be observed, the computing time is much lower an thus making the algorithm capable of running online on the AUV, even with many more particles than the 40 used on the tests.

Conclusions
In this work, we have presented new SLAM framework named H-SLAM for AUVs equipped with sonars. The combination of a RBPF with a HM representation of the environment provided trajectory corrections that increased the consistency of the recorded measurements both in simulation and in real datasets. Moreover, the computing time required is much lower than the time it took to collect the datasets, being capable of being used online on an AUV.
In the simulated datasets, the RBPF provided a significant correction when used for TBN with a known map, and a lesser correction when used for SLAM. However the final map was much more consistent than the one obtained by the DR filter.
In the real datasets, significantly more consistent maps were also obtained. Especially on the breakwater dataset, the multiple closing loops allowed to obtain a correct trajectory and map that matches the satellite image of the structure.

Future Work
The algorithms have been tested at constant depth providing continuous occupancy maps in 2D. Future work must better reflect the nature of underwater environments, extending H-SLAM to the 3D case. Moreover, multipath errors observed on the real datasets should be filtered out. Our idea is check the range measurements persistence over time before using them in H-SLAM.