This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

Indoor localization and mapping is an important problem with many applications such as emergency response, architectural modeling, and historical preservation. In this paper, we develop an automatic, off-line pipeline for metrically accurate, GPS-denied, indoor 3D mobile mapping using a human-mounted backpack system consisting of a variety of sensors. There are three novel contributions in our proposed mapping approach. First, we present an algorithm which automatically detects loop closure constraints from an occupancy grid map. In doing so, we ensure that constraints are detected only in locations that are well conditioned for scan matching. Secondly, we address the problem of scan matching with poor initial condition by presenting an outlier-resistant, genetic scan matching algorithm that accurately matches scans despite a poor initial condition. Third, we present two metrics based on the amount and complexity of overlapping geometry in order to vet the estimated loop closure constraints. By doing so, we automatically prevent erroneous loop closures from degrading the accuracy of the reconstructed trajectory. The proposed algorithms are experimentally verified using both controlled and real-world data. The end-to-end system performance is evaluated using 100 surveyed control points in an office environment and obtains a mean accuracy of 10 cm. Experimental results are also shown on three additional datasets from real world environments including a 1500 meter trajectory in a warehouse sized retail shopping center.

In recent years, there has been great interest in the modeling and analysis of interior building structures using laser range finders (LRFs). Utilizing many range measurements and the location of the sensor during each measurement, a dense view of the environment can be reconstructed. Applications for the reconstructed point clouds, such as energy efficiency simulation, augmented reality, entertainment, architectural modeling, and emergency response, yield significant motivation to increase the accuracy of the modeling process and decrease the time to acquire the data.

For applications where dense, millimeter level accuracy is needed, the standard practice is to use a static scanning approach. During static scanning, a 3D scanning station is placed on a tripod and a small section of the environment is captured with high detail. The tripod is then moved around and the process is repeated until the entire area has been captured. The small 3D point clouds can then be stitched together to build a single, unified representation of the environment. Stitching is typically achieved either by placing small markers throughout the environment or a combination of manual intervention and point matching techniques. While this process is accurate and reliable, it is both slow and invasive.

To improve acquisition efficiency, the scanning equipment is often mounted on a mobile platform such as a wheeled platform or human operator. Mobile mapping is generally faster than static scanning, but reconstructing the point cloud from the captured range measurements can be more complex [

GPS-denied mobile mapping has been an active area of research for many years. The system presented by Holenstein

In order to handle complex indoor environments, such as staircases, human-mounted systems have shown significant promise [

Another human-mounted mapping system combines a 2D LRF, an IMU, a barometer, and an RGBD camera to provide real-time mapping for rapid response missions [

Foot mounted systems have also been considered for indoor navigation [

Human-mounted mobile mapping systems have also been successfully applied to outdoor environments [

Unmanned Aerial Vehicles (UAVs) have also been employed for mobile mapping applications [

Our previous works on mobile mapping include mapping 3D indoor environments using the ambulatory backpack system [

By assuming that the environment consists of vertical planes, we solved the 3D localization problem of the system in

Using sequential images from the cameras in

In [

In order to address the shortcomings of [

First, rather than relying on loop closure constraints detected from optical imagery, we derive them from an occupancy grid map created via Rao-Blackwellized particle filtering. Previous methods for loop closure detection, such as those relying on keypoints [

Our second contribution is our proposed outlier-resistant, genetic scan matching algorithm that accurately matches scans despite a poor initial condition. Previous genetic scan matching algorithms, such as the Hybrid Genetic Scan Matcher [

Lastly, we present two metrics based on the amount and complexity of overlapping geometry in order to vet the estimated loop closure constraints. Indoor environments contain many locations, such as long narrow hallways, where scan matching is ill-conditioned. By examining the quantity and complexity of overlapping geometry we automatically prevent erroneous loop closures from degrading the accuracy of the reconstructed trajectory.

We then use the dead reckoning trajectory to aggregate temporally adjacent LRF readings into local submaps. Since the XY scanner is subjected to pitch and roll motion, it contains many points, such as ceiling or ground strikes, which do not fit the vertical wall assumption. These points must be eliminated before any 2D SLAM approach can be effectively utilized. We accomplish this by employing a Rao-Blackwellized particle filter to merge sequential scans into geometrically consistent local submaps while simultaneously eliminating points that do not fit the vertical wall assumption. Then we fuse the submaps into a single topographically correct, occupancy grid map by applying a separate particle filtering to the generated submaps.

In contrast to [

We then fuse the dead reckoning trajectory with the verified loop closure constraints via graph optimization. Specifically, we form an edge directed graph using the odometry to create pairwise connections between temporally adjacent nodes. We then insert the vetted loop closure constraints between the detected loop closure locations. We apply a graph optimization procedure such as TORO [

The 2D mapping results are extended to a full 6DOF pose by combining the pitch and roll from the IMU with a height estimate at each locations. We use the adaptive height estimator of Kua

The rest of the paper is organized as follows: Section 2 describes algorithms used in computing the 3D trajectory of the system. Experimental results for the proposed algorithms are included in Section 3. Finally, applications for the data produced by our approach are presented in Section 4 and conclusions are drawn in Section 5.

This section provides a detailed description of the off-line algorithmic pipeline shown in

Unlike wheeled systems where wheel encoders provide dead reckoning, human-mounted mobile mapping systems, such as the one shown in

Assuming that the walls are perfectly vertical, we project the scans along the direction of gravity to undo the warping in the XY scanner’s data. By projecting the scan points along the direction of the gravity vector, we correct for the warping introduced by non-zero pitch and roll.

Assuming that the environment remains static between scans, we use successive readings from the LRFs to estimate the incremental motion between scans. Many approaches have been suggested to solve the scan matching problem including global approaches [

The classical ICP problem is framed in the following manner. Given two dimensional points sets, _{i}_{i}_{i}

Variants of the metric of _{f}

Given an initial transform _{0}), points in

Assuming a fixed _{0}), an optimal set of inlier points _{f}

Using inlier set _{f}

Although the vertical wall assumption corrects for the natural gait of the human operator, scan points that originate from the ceiling, floor, or dynamic objects in the environment are not well modeled by a vertical plane and must be handled separately. To this end, we use the the point-to-line and fractional metrics in the following objective function:
_{f}_{i}_{i}

By solving for both the set of inliers and the optimal transformation parameters, the metric in

_{f}

We take the incremental changes in position and integrate them to recover the dead reckoning trajectory. Since the path is built recursively, any errors in the transformations are compounded and lead to drift in the reconstructed trajectory. Inaccuracies in the recovered path lead to a geometrically inconsistent map and thus are not suitable for most mapping applications. In order to overcome the accumulated drift in the dead reckoning trajectory, we reformulate the problem to obtain a solution that optimizes both the path and the environment simultaneously. Classical solutions such as those involving Kalman filters, particle filters, and graph based approaches are discussed in [

Since our system is mounted on a human operator, the sensor readings can contain a large number of outlier points due to clutter, ceiling strikes, or dynamic objects in the environment. In order to apply classical 2D SLAM algorithms, we must first eliminate the outlier points that result from the roll and pitch motion of the system. We aggregate temporally adjacent scanner readings into local submaps to eliminate outlier points and enhance the sensor’s field of view before applying the standard 2D Rao-Blackwellized particle filtering algorithm as described in Section 2.3. This section discusses the submap generation procedure.

In order to fuse sequential scans into a geometrically consistent local submap, we use a particle filter that only merges scans from a small, temporally close, region of the environment. Although typically formalized for large scale mapping, we utilize the RBPF to generate accurate maps of subsections of the environment [

The number of temporally adjacent scans used in submap construction impacts both the construction time and the amount of geometry in the reconstructed RBPF based submapping algorithm. To limit computation time, we choose local submap sizes based on the following heuristic. Given a location of interest, we collect scans from neighboring locations until either the estimated cumulative translation has exceeded

In this section we discuss the process for fusing the local submaps into a single geometrically consistent occupancy grid map.

Once the submaps have been created, we combine them into a single geometrically consistent occupancy grid map using another round of particle filtering. This time, we match sequential submaps to obtain new odometry estimates. We then fuse the submaps by utilizing the Sampling Importance (SIR) filter and adaptive proposal distribution [

While the resulting grid map is geometrically consistent, the accuracy is still fundamentally limited by the size of the grid cells used. Despite tracking the average of the contributions to each grid cell, quantization errors still exist. Furthermore, because temporally adjacent scans are first aggregated into local submaps, the system trajectory resulting from the RBPF contains poses for only the locations of the submaps, not the original sensor reading locations. In order to compute poses for all sensor readings we must interpolate between the locations of the submaps. The interpolation is carried out by formulating a graph optimization problem where the poses are the nodes of the graph, the odometry readings serve as the edges between temporally adjacent poses, and loop closures extracted from the occupancy grid map provide global constraints on the graph. By fusing the full-rate odometry and RBPF localization results into a single full-rate trajectory we obtain a geometrically consistent trajectory with no temporal or spatial quantization.

This section describes the proposed methodology for extracting loop closure constraints from the occupancy grid maps created in Section 2.3. To detect loop closures we first convert the map into a representation that defines a measure of similarity between poses in the trajectory. Using both the grid map and accompanying trajectory, we explicitly recover which poses have a point fall in each of the cells from the grid map. We define the correlation _{i}_{j}

Next we apply hierarchical clustering to group the poses into smaller sets that contain similar geometry in order to reduce the search space of possible loop closure locations. The clustering algorithm begins with each pose of the trajectory as its own cluster, and iteratively joins the most similar clusters until either the required number of clusters is obtained or all remaining clusters have a similarity metric of 0 [

Additionally, for a hierarchical clustering algorithm it is to necessary to first determine the number of desired clusters. We compute this by inspecting the eigenvalues of the correlation matrix _{c}^{C}

Once the clusters have been identified, we extract the portion of the correlation matrix that corresponds to each cluster.

Once the candidate pairs have been extracted, the transformation and covariance matrices must be computed before they can be used in a graph optimizer. Since the submaps from Section 2.2 are spatially quantized, we take the original pair of scans corresponding to a given loop closure, apply the scan projection algorithm of Section 2.1, and match them via a genetic scan matching algorithm. The result is a set of metric transformations between loop closure pose indices.

A widely known limitation of non-linear optimization techniques is that they can fall into local minima of the objective function. To overcome this limitation, the algorithm is typically supplied with an initial condition _{0}). The resulting solution is often highly dependent on choice of _{0}) and a poor choice of initial condition can lead to an undesirable solution. A class of stochastic optimization techniques, known as genetic algorithms, can overcome this problem by providing a derivative-free approach for exploring the solution space. Each solution, or chromosome, is evaluated according to a cost function and assigned a fitness value. Thus, genetic searches are able to better cope with the presence of local minima or poor initial conditions [

Genetic search (GS) algorithms operate by first considering a randomly distributed set of solution chromosomes. Using an appropriate fitness metric, the population is evaluated and each chromosome is assigned a fitness value. The most fit chromosomes are retained and the rest are discarded. The population is then replenished by creating new chromosomes whose parameters are chosen randomly from the surviving chromosomes. The generated solutions are mutated by adding random variation to the chromosomes’ parameters and the process is iterated until the population reaches a stable configuration. In this manner, GS algorithms avoid local minima while exploring solutions that are not in the original sampling of the solution space.

Similar to the previous works of Martinez

In contrast to the existing approaches of [_{0}, sampled uniformly from a rough initial condition, _{0}. The initial condition allows for a priori information, such as odometry, to be incorporated into the FGSM algorithm. Denoting each individual chromosome _{i}_{i}_{i}_{i}_{i}

To simulate the mutation between generations, random i.i.d. Gaussian noise is added to the newly generated chromosomes. The variance of the added Gaussian noise is determined by considering the variance of the remaining chromosomes’ parameters. Specifically, if the population’s ^{th}

The FGSM Algorithm

1: _{0} ← initial estimate_{0} ← _{0})_{n}_{+1} ≠ _{n}_{n}_{i}_{n}_{i}_{i}_{n}_{n}_{i}_{n}_{n}_{n}_{+1} ← _{n}_{n}_{f} |

An example of the FGSM algorithm is shown in _{0}. Shown in

As seen in line 6 of

Examining

Following [_{f}_{0} is the linearization point and ^{|Df|×3} is the Jacobian of

In order to take advantage of the near-linearity of the Jacobian, we take the following strategy. We first discretize the solution space into small grid cells and bin chromosomes according to their parameters. Each time a chromosome is used as the initial condition for ICP, a record is kept of the resulting transformation. In this way, redundant computation can be eliminated by building a lookup table between initial conditions and the resulting transforms.

The amount of discretization in the solution space represents a trade-off between speed and adherence to the underlying assumptions. The discretization must represent small enough changes so as not to violate the small-angle approximation or the continuity of the objective function, yet be large enough to effectively speed up the genetic search. We have empirically found a spacing of 10 centimeters and 1 degree to provide the best trade-off between computation and accuracy for our experimental setup. Section 3.2 provides empirical justification for these values.

Even though the FGSM algorithm is a robust way of matching loop closure scans with unknown initial conditions, care must still be taken to eliminate erroneous transformations. Poor scan matches can arise when a loop closure has been falsely identified or when geometry is ill-conditioned. In these situations it is best to recognize that the algorithm has failed and avoid using the loop closure constraint. In this section we now propose two metrics, independent of the detection and estimation tasks, to robustly reject erroneous loop closure transformations while maintaining a high true positive rate. The proposed metrics are designed to encompass two important factors for characterizing a loop closure transformation: the amount and complexity of the shared geometry.

The first metric characterizes the amount of shared geometry between matched scans. Using the proposed transformation from Section 2.5, the scans are rotated and translated into alignment. Next, a pair of normalized two-dimensional histograms are built by binning the scans’ points into equally sized spatial bins. Then the histograms are correlated using the intersection kernel [_{1} (_{2} (

Our second metric measures the complexity of the overlapping geometry. Using the proposed transform from Section 2.5, _{f}_{f}_{i}^{R}^{R}

After the optimized 2D path has been obtained, the height of the system must be estimated at each pose before a full 6DOF path can be recovered. We recover height using the adaptive method presented in [

The height information along with the IMU readings and the optimized 2D path are combined to form a full six degree of freedom trajectory. The six degree of freedom pose at time _{t}_{t}_{t}_{t}_{t}, ψ_{t}_{t}_{a}

Since the 2D SLAM and height estimation algorithms provide values in the global coordinate system, the position component of _{t}_{t}_{t}_{t}_{t}

The rotation matrix in

Using the data collected by the left and right geometry scanners and the recovered trajectory we are able to generate a dense, 3D point cloud by placing the sensor readings at the correct pose location. Since the system is also equipped with 2 fish-eye cameras, the sensor readings can be colorized by projecting them into the temporally closest image.

In this section we present results for the proposed algorithms. Section 3.1 presents performance results of the FGSM algorithm using a variety of initial conditions. Section 3.2 provides empirical justification for the FGSM’s discretization parameters used throughout. A thorough analysis of the vetting criteria is presented in Section

In this section we compare the proposed FGSM algorithm and a few state of the art alternatives. First, we compare against the open source libpointmatcher library [

We construct the following experimental testbed to quantify the performance of each of the three chosen algorithms. We first hand-construct a set of 177 ground truth scan matching results taken from 10 different environments. Then, random initial conditions are created by selecting values from a collection of i.i.d. zero-mean Gaussian random variables with increasing standard deviation. Ten unique trials are conducted for each scan pair resulting in 1,770 attempts for each standard deviation. Each algorithm is run and the resulting transforms are recorded. We consider any transform within 5 cm or 1 degree to have converged to the correct solution.

The first experiment assesses the algorithms’ performance in the presence of translation only error. Using increasing levels of translation offset, ranging from 0.25 m to 10 m in standard deviation, we ran the experimental setup and computed the percentage of successful scan matches.

In the second experiment we repeat the same procedure, but add only rotational offset using standard deviations ranging from 18° to 180°.

In practical scenarios the error in initial condition is typically a mix of both rotation and translation. To characterize the effects of both error sources simultaneously, we repeat the experiment but this time add variations to both the rotational and translational components of the initial condition. As seen in

As stated in Section 2.5, the amount of solution space discretization in the FGSM algorithm represents a trade-off between computational efficiency and accuracy. In order to determine an appropriate level of discretization, we conducted the following test. We selected seven levels of discretization, ranging from no discretization to 1 m and 10 degrees, and evaluated the performance of the FGSM at increasing amounts of error in the algorithm’s initial condition. For each chosen level of discretization and error in initial condition, we conducted a trial of one thousand scan matches drawn from a database of one hundred scan pairs. For each scan match we used two hundred initial chromosomes and computed the accuracy and running time by comparing the FGSM results to the manually defined ground truth alignment.

In order to characterize the effectiveness of the proposed metrics for loop closure verification, a database of 15 datasets has been collected from 10 unique environments. The classification metrics are evaluated in the following two scenarios. First, between 50–100 true loop closures are manually defined for each dataset resulting in 590 positive examples. To simulate the presence of falsely identified loop closures, 10 loop closures are randomly defined for each dataset yielding 150 potentially erroneous examples. The FGSM algorithm is then run on each loop closure candidate and the resulting transformations were then manually inspected and labeled.

^{R}

Statistics for various thresholds are computed and the resulting receiver operating characteristic (ROC) curve is shown in ^{R}

Since the existing work on loop closure verification [

The same test is repeated, but this time using automatically detected loop closures via the method of Section 2.4. Using 5 new datasets, 361 examples are automatically detected from grid map data. The FGSM algorithm is run and the results are manually inspected and labeled.

We evaluated the end-to-end system performance using the following experiment. We collected data along a 350 m trajectory from an office building with approximately 1,500 m^{2} of floor space. The office environment contained 100 pre-surveyed control points each denoted using a paper target containing a checkerboard-like pattern of known size. Using a Leica HDS6000 and C10 Scanstation, 17 high-density static scans were automatically stiched together to provide millimeter level accuracy for each survey point. We then localized the system using both the proposed algorithm and our previous approach [

To further demonstrate the end-to-end performance of the backpack system, we have applied the proposed algorithms to three more datasets of increasing complexity.

The second set, shown in

The final dataset is the most challenging because the trajectory contained many interior loops and a large portion of the surrounding geometry did not fit the vertical wall assumption made in Section 2.1. Taken from a warehouse-sized retail shopping center, the scanned area exceeded 5,820 m^{2}. The operator traversed the environment for 31 min and covered a total distance of 1,500 m. Despite the large amount of accumulated bias present in

Although the proposed algorithms were designed to be run off-line, it is important to highlight the approximate running time for the various stages of the algorithms.

The 3D trajectories and point clouds generated using the backpack system can be utilized in many applications such as augmented reality, modeling, or architectural planning. However, other applications, such as entertainment and texture mapped models, require a lightweight triangulated mesh. Many methods exist that can convert a point cloud to a triangulated mesh. Using the method presented by Turner and Zakhor [

Textures from the collected images can then be projected onto the mesh elements resulting in a photo-realistic rendering of the captured environment [

In this paper, we highlight three novel contributions in our off-line, end-to-end pipeline for automatically mapping indoor environments using an ambulatory, human-mounted backpack system. First, we present an algorithm which automatically detects loop closure constraints from an occupancy grid map. In doing so, we ensure that constraints are detected only in locations that are well conditioned for scan matching. Secondly, we address the problem of scan matching with poor initial condition by presenting an outlier-resistant, genetic scan matching algorithm that accurately matches scans despite a poor initial condition. Third, we present two metrics based on the amount and complexity of overlapping geometry in order to vet the estimated loop closure constraints. By doing so, we automatically prevent erroneous loop closures from degrading the accuracy of the reconstructed trajectory.

The proposed algorithms were tested on a verity of datasets. The fractional genetic scan matching algorithm was tested using a hand labeled set of scan pairs disturbed by a random initial condition. The algorithm is shown to be extremely robust to translational initial condition and moderately robust against rotation. Two metrics were also proposed to verify proposed transformations. Tested on a set of 1,500 manually defined examples, the verification metrics are able to maintain a true positive rate of 84.7% when a false alarm rate of 1% is required. Finally, a metric evaluation of the end-to-end system was presented using approximately 100 pre-surveyed control points and resulted in a mean error of 10 cm. Further empirical results were presented on three datasets including a 1,500 m trajectory in a warehouse sized shopping center.

Based on our experimental results, the accuracy of the proposed method makes the system applicable for low accuracy GPS-denied mobile mapping tasks such as architectural pre-design and as-built floor plan recovery. The main advantage of the presented ambulatory system is that the operator can easily map uneven terrain such as staircases or thick carpeting that would be difficult for rolling cart systems such as the TIMMS unit [

Future work is needed to extend the algorithms beyond man-made indoor environments dominated by vertical walls. Since the 2D trajectory is created using a 2D occupancy grid map, we implicitly make the assumption that the environment can be accurately modeled using a single occupancy grid map. For multistory buildings this is not true. We plan to extend the straightforward 2D particle filter to recognize when the operator changes floors and keep track of a separate grid map for each building level. In order to map environments that have little artificial structure, future work is needed to relax the vertical wall assumption while still ensuring accurate and robust operation.

Additionally, a significant amount of data captured by our system is not utilized to improve the performance. The side facing cameras provide rich optical imagery that can be utilized to improve the location estimates. A promising avenue for future research could be to implement an algorithm which fuses both the laser readings and the optical imagery into a single, accurate trajectory.

The authors would like to thank ARPA-E for supporting this research.

The authors declare no conflict on interest.

A CAD model of the ambulatory human-mounted backpack system used for data collection.

(

Block diagram of the algorithms used for localizing an ambulatory backpack system.

An example of scan projection. (

An example of performing the FICP algorithm. (

A typical result of RBPF based submapping. The original sensor’s readings are shown in red while the resulting submap is shown in blue. (

An example result of applying the Rao-Blackwellized particle filtering algorithm to the generated submaps. (

(

(

(

An example of the FGSM process. The scans are shown in red and blue while the chromosomes are shown in green. A black circle denotes the location of the best chromosome in the population. (

(

The effect of solution space discretization on the FGSM algorithm for increasing levels of error in initial condition. (

A scatter plot showing the distribution of loop closure candidates using the proposed metrics. A red × corresponds to a failed loop closure transformation, while a green ○ represents a successful candidate pair. The chosen thresholds are denoted using the dashed black lines. (

Comparison of receiver operating characteristic curves when using both metrics

(

Results of applying the end-to-end system. (

(

(

A comparison of the proposed FGSM algorithm to previous methods in the case of translation only error. The percentage of trials that succeeded for various standard deviations in the translational component of the initial condition.

_{T} |
|||
---|---|---|---|

0.25 | 93% | 90% | 94% |

0.5 | 89% | 91% | 95% |

1.0 | 82% | 89% | 94% |

2.0 | 68% | 73% | 91% |

3.0 | 58% | 59% | 85% |

5.0 | 45% | 42% | 75% |

8.0 | 30% | 36% | 66% |

10.0 | 25% | 30% | 62% |

A comparison of the proposed FGSM algorithm to previous methods in the case of rotational only error. The percentage of trials that succeeded for various standard deviations in the rotational component of the initial condition.

_{θ} |
|||
---|---|---|---|

18 | 74% | 60% | 92% |

30 | 59% | 44% | 84% |

45 | 45% | 31% | 73% |

60 | 34% | 24% | 63% |

90 | 25% | 17% | 49% |

180 | 17% | 11% | 37% |

A comparison of the proposed FGSM algorithm to previous methods in the case of both rotational and translational error. The percentage of trials that succeeded for various standard deviations in the rotational and translational components of the initial condition.

_{T}, σ_{θ} |
|||
---|---|---|---|

(0.25, 18) | 74% | 60% | 91% |

(0.5, 30) | 55% | 43% | 84% |

(1.0, 45) | 42% | 33% | 74% |

(2.0, 60) | 28% | 21% | 61% |

(3.0, 90) | 18% | 11% | 45% |

(5.0, 180) | 9% | 6% | 33% |

A rough estimate of the running time for the various stages of the proposed algorithm for a 40 min data collection.

Dead Reckoning | 3 |

Submap Generation | 50 |

RBPF Grid Mapping | 60 |

Loop Closure Detection | 10 |

Transform Estimation | 2 |

Verification | 1 |

Graph Optimization | 2 |

Height Estimation | 10 |

3D Path Generation | 1 |