1. Introduction
With the rapid development of the geospatial information service industry, the demand for geospatial data is also growing enormously. The mobile mapping system has also been rapidly developing in recent years due to its ability to provide efficient, fast, and complete data collection functions. Simultaneous localization and mapping (SLAM) is considered to be an effective method for solving positioning and mapping problems in GNSS-denied areas, as it unifies positioning and mapping problems in a single framework, which has been intensively investigated in the robotics community for two decades [
1]. SLAM builds a consistent map of the environment incrementally and determines its location within this map simultaneously [
2]. SLAM is essentially a probability-based optimal estimation problem. One advantage of the probability estimation algorithm is that it can stably measure the noise in the environment and indicate the uncertainty in the measurement and estimation process [
3].
There are three basic SLAM paradigms, from which most others are derived. The first, known as Extended Kalman Filter (EKF) based SLAM, is the earliest and probably the most influential algorithm. There are various forms of EKF based algorithms that have been proposed for a variety of different tasks and environments; the EKF based SLAM method has been studied in depth [
4,
5]. EKF based SLAM can achieve better results when the sensor noise satisfies the Gaussian distribution assumption and the system nonlinearity is small [
6]. A key concern of the EKF based approach to SLAM lies in the quadratic nature of the covariance matrix: as the landmark points continue to increase, the covariance matrix of the system increases in a quadratic manner, resulting in a large computational pressure [
7]. EKF based SLAM is not suitable for large-scale environments. The second SLAM paradigm is based on particle filters. Particle filter is also a kind of Bayesian estimation which represents a posterior probability through a set of particles. The core idea in particle filter is to express their distribution by taking examples of random states extracted from posterior probabilities [
7]. The advantage of particle filter over EKF is that, instead of model linearization, it uses some samples to obtain the state estimation. Particle filter has the ability to handle nonlinear, non-Gaussian distribution, and multimodal problems [
8]. Rao–Blackwellized particle filtering (RBPF), derived from particle filter, was thus applied in the algorithm proposed by Montemerlo, named FastSLAM [
9]. However, the number of particles required for particle filter increases exponentially with the underlying state space (the space of maps and robot position) [
7]. The more complicated the environment is, the more particles are required, and the higher the complexity of the algorithm. In addition, the resampling phase can result in loss of sample validity and diversity, leading to sample depletion. How to maintain the validity and diversity of particles and overcome the sample depletion is another research focus of particle filter [
3]. Due to the linearization and updating efficiency of the filter-based SLAM method, it is not applicable to large-scale environments. The third SLAM paradigm is graph-based SLAM, which solves the SLAM problem through nonlinear sparse optimization [
7]. Graph-based SLAM draws intuition from the fact that the SLAM problem can be modeled as a sparse graph of constraints, where the nodes represent the state of the system composed of the robot and the environment at different times, and the edges represent the spatial constraints between the nodes [
10]. The constraints generally arise from scan matching and observations, such as from an odometer, Inertial Measurement Unit (IMU), and so on. Graph-based SLAM is usually divided into the front-end and back-end. The front-end constructs the nodes and edges of the graph according to the observation values and system constraints. The back-end applies optimization techniques to complete the graph optimization. Since the graph-based SLAM method utilizes all the observations information to optimize the robot’s complete trajectory and environment, we can get a globally consistent trajectory and map, so it is also known as the full SLAM method [
10]. The graph-based SLAM method has the advantage that it can scale to much higher-dimensional maps than EKF based SLAM, the update time of the graph is constant, and the amount of memory required is linear (under some mild assumptions) [
11]. It is also easy for graph-based SLAM methods to handle the data association problems, which are usually addressed by the SLAM front-end, because it is easy to integrate additional knowledge (constraints) into data association. Compared to EKF based SLAM and particle filter based SLAM methods, graph-based SLAM has undergone a renaissance and currently belongs among the state-of-the-art techniques with respect to speed and accuracy [
10].
In the framework of graph-based SLAM, the front-end is responsible for providing the initial values of the location and map, and the back-end is responsible for optimizing all the poses of scans and submaps. However, due to the existence of observation errors and data association errors, the errors generated previously will inevitably accumulate to the next moment, which will cause incorrect location and mapping results. In order to eliminate the cumulative errors and build globally consistent tracks and maps, various methods have already been addressed in previous works. A nature idea in SLAM is recognizing previously visited places to reduce cumulative errors, a process known as loop closure detection. Loop closure detection refers to the robot recognizing the place where it has been, thus establishing the relationship between the current data and all previous data, and then adding new constraints to the back-end to eliminate error accumulation. Popular loop closure detection approaches can be divided into two categories: the appearance-based approach and the optimization-based approach [
12]. The underlying idea behind the appearance-based approach is that loop closure detection is done by comparing all previous “images” with the new one [
13]. The core issue in the appearance-based approach is how to calculate similarity between previous “images” and the new one. Generally, we extract distinctive features and describe them in a mathematical language, where the features are known as descriptors, and then we can calculate the similarity score between “images” by the descriptors [
14,
15,
16,
17]. Deep learning and machine learning can also be employed for loop closure detection [
12,
18]. The appearance-based approach is widely used in the computer vision community, and many mature algorithms have been proposed, as an image contains abundant information that can be used to extract distinctive features. However, compared to images, 2D LiDAR scans encode much less information (such as intensity gradients and some structured features, e.g., corners have weak variations in the range measurements), so it is difficult to extract distinctive features in 2D (LiDAR) scans for loop closure detection [
12]. The optimization-based approach is represented by Google’s Cartographer LiDAR SLAM algorithm, which combines scan-to-submap matching with loop closure detection and graph optimization. The key idea in Cartographer is to build multiple submaps and align new scans to nearby submaps to generate constraints on a graph. In order to achieve real-time loop closure, a branch-and-bound approach for computing scan-to-submap matches as constraints is used in Cartographer [
19]. All loop closure detection approaches and the optimization-based approach of Cartographer are based on the fact the robot will go back to where it has already been; however, in some special mobile mapping scenarios, there are no loop closure conditions, such as in a U-shaped mapping scene, where the robot cannot go back to where it has already been. Another way is to add additional sensors to assist SLAM, such as GNSS, INS, and so on. In [
20,
21], GNSS/INS is utilized to improve the positioning accuracy of the SLAM result. Unlike SLAM techniques, GNSS sensors do not drift over time; however, GNSS signals are not always available and suffer from signal occlusion and severe multipath effects in urban or indoor environments. At present, combining additional sensors increases the complexity and hardware cost of the SLAM system. SLAM problems focus on detecting the environment without taking any special prior knowledge into account; however, prior knowledge can improve the accuracy of the SLAM result greatly, as prior knowledge can be added to the SLAM problem as constraints to reduce the cumulative error. The use of prior knowledge in graph-based SLAM has been exploited by experts and scholars. Kümmerle et al. have used the correspondences detected between 3D scans and publicly available aerial images as prior knowledge to achieve global consistency [
22], and Schuster et al. use a landmark map as prior knowledge to improve location accuracy based on GraphSLAM [
23]. However, this method also needs to extract feature points and find constraint relationships by feature point matching.
Although the use of prior knowledge in SLAM has been studied before, different from other prior knowledge, distance measurement is an easy-to-measure and easy-to-use additional observation, and a stable geometry can be formed between the distance measurements to effectively eliminate cumulative errors when there are three or more distance measurements. In this paper, we first propose the method of adding distance constraints of control network to the back-end optimization to effectively eliminate cumulative errors, we analyzed the necessary conditions for the distance constraints of control network to work and analyzed the effect of different control network construction methods on the slam mapping results. The field test shows that the algorithm proposed is an effective way to eliminate cumulative errors and improve the accuracy of the SLAM mapping result obviously when there are no loop closure conditions in large-scale GNSS-dined area, for example in a narrow corridor or a U shaped area or in a large parking lot.
The remainder of this paper is organized as follows:
Section 2 gives the detailed parameters of the hardware;
Section 3 describes the methods used in this paper; and
Section 4 introduces the experimental results and draws the conclusion.
2. System Overview
In order to verify the performance of the algorithm described in the following section, a LiDAR/IMU integrated system (
Figure 1) was designed and implemented. The measurement sensor LiDAR and inertial measurement unit (IMU) are integrated on the mobile mapping platform to form the hardware of the system. A “UTM-30LX-EW” LiDAR system, which was manufactured by the Hokuyo Company (Osaka, Japan), was adopted for the platform. The LiDAR system operates at 40 Hz, has a scanning angle range of 270° with an angular resolution of 0.25°, and has a maximum effective range of 30 m with a range accuracy of
mm at 0.1 m–10 m, and
mm at 10 m–30 m. The measurement accuracy of the LiDAR system is in millimeters and it is a medium-precision device. The model of IMU is MTiG, the bias stability of the IMU gyroscopes is about 200.0°/h, the bias stability of the IMU accelerators is about 2000 mGal (1 Gal = 1 cm/s), and the sampling frequency of the IMU is 200 Hz. According to the technical indicators, the IMU we adopted is a MEMS-level (Micro-Electro-Mechanical System) device.
In order to get the distance constraints, we firstly need to layout survey control network and use a total station to survey in the area to be mapped. Here, a TIANYU CST-632 total station (GuangZhou, China) was adopted. The angle accuracy of the total station was 2 s and the range accuracy was mm.
Finally, to measure the mapping accuracy of the proposed algorithm, a high-precision terrestrial laser scanner, FARO Focus3D X130 HDR (Florida, USA), was used to map the experimental area with millimeter accuracy. The detailed parameters of all hardware devices are shown in
Table 1.
4. Field Test and Results
In order to verify the performance of the algorithm described above, we conducted corresponding field tests. They were carried out in a typical urban GNSS-weak outdoor area in a campus, as shown in
Figure 5a, and the survey area was roughly shaped as a “U”-type area. According to the conditions of the area to be mapped, a surveying control network (
Figure 5b) containing four control points was reasonably arranged and surveyed with a TIANYU CST-632 total station. The solid line in black in
Figure 5b indicates the measurement route. The total length of the measurement route is about 300 m. The LiDAR/IMU integrated system was mounted on a vehicle; when the vehicle was running near the control point, we aligned the LiDAR scan center with the control point (the origin of the carrier coordinate coincides with the control point) by carefully adjusting the movement of the vehicle, and then we marked the scanning frame by the data collection software. The relative distance between two marked scans will be added as constraints to the back-end optimization.
The corners and walls of the building and other obvious fixed objects are selected as the main feature points for accuracy evaluation, and the high-precision map which was produced by the terrestrial laser scanner FARO Focus3D X130 HDR is used as the reference map. The result map produced by our algorithm and the reference map can be both opened by ArcGIS, we manually extract the positions of the common feature points from the reference map and our result map respectively in ArcGIS. Firstly, we consider the simple case: we separately test the case of adding one constraint edge, two constraint edges, and three constraint edges. The mapping results are shown in
Figure 6 and the RMS errors of the coordinates of the selected common feature points are shown in
Table 2. From
Figure 6, we can intuitively see that the mapping results of adding one or two constraint edges are basically coincident with the mapping result without distance constraint, while the mapping result of adding three constraint edges is obviously improved compared to the mapping result without distance constraints; the RMS values in
Table 2 also illustrates this phenomenon. The RMS values of adding one distance constraint (L1) and without the distance constraint are 0.1612 m and 0.1904 m, respectively; and the RMS values of adding two distance constraints (L1, L2) and without the distance constraints are 0.6608 m and 0.7303 m, respectively. Considering the errors brought by manual operation, such as errors caused by manually selecting the common feature point, the mapping results of adding one or two distance constraints and without distance constraints are basically coincident. When adding three distance constraints (L1, L2, L3), the RMS is significantly reduced to 0.2369 m.
The results show that adding one or two distance constraints to the back-end optimization is basically not effective in improving the accuracy of the SLAM results. As having one or two distance constraints is not stable, they are not effective in reducing the angular drift of SLAM. The third test shows that it is effective to improve the accuracy of the SLAM results when there are three distance constraint edges and they can form a triangle, it means that the relative position of the control points are known in the global coordinate system, which is usually determined by the coordinate system of the first scan data in SLAM, as we all know that the triangle is the most stable geometry. The stable triangular structure between the constrained edges can effectively reduce the drift of angle and distance.
The above experiments confirm that the distance constraints without stable geometry cannot improve the accuracy of SLAM effectively. Only when different distance constraints (at least three) form a stable triangulation is it effective to reduce the angular and distance drift of SLAM by adding distance constraints to the back-end optimization. According to the number of the points in the control network, there are six distances in the control network, and, according to the Delaunay triangulation network theory, L1, L2, L4, L5, and L6 can form a stable Delaunay triangulation network. The SLAM result of adding all the existing six distance constraints (MAP_A) and the SLAM result of adding the distance constraints of Delaunay triangulation network (MAP_B) are shown in
Figure 7 and the RMS errors of the coordinates of the selected common feature points are shown in
Table 3. From
Figure 7a, we can intuitively see that MAP_A and MAP_B are basically coincident. Compared to the high-precision reference map produced by terrestrial laser scanner, the RMS values of 49 selected common feature points are, respectively, 0.3356 m (MAP_A) and 0.3614 m (MAP_B), while the RMS values in the absence of any distance constraints (MAP_C) is 1.646 m. Considering the error brought by selecting the common feature points, adding all the existing distance constraints or adding the distance constraints of the Delaunay triangulation network basically has the same effect on improving the accuracy of the SLAM result.