Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

POU-SLAM: Scan-to-Model Matching Based on 3D Voxels

Appl. Sci. 2019, 9(19), 4147; https://doi.org/10.3390/app9194147

by Jianwen Jiang¹, Jikai Wang¹, Peng Wang²

and Zonghai Chen^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2019, 9(19), 4147; https://doi.org/10.3390/app9194147

Submission received: 9 September 2019 / Revised: 26 September 2019 / Accepted: 26 September 2019 / Published: 3 October 2019

(This article belongs to the Collection Advances in Automation and Robotics)

Round 1

Reviewer 1 Report

My major concerns have been addressed.

I recommend grammar and copy editing.

Author Response

Responses to Reviewer 1:

Thanks for your thorough review which significantly contributes to improving the quality of the manuscript (Title: POU-SLAM: scan-to-model matching based on 3D voxels). We have considered the comments and suggestions and revised our manuscript.

Thank you.

Sincerely yours,

Jianwen Jiang, Jikai Wang, Peng Wang, Zonghai Chen

Author Response File: Author Response.doc

Reviewer 2 Report

Both the abstract and conclusion should present more quantitative data to support the key contributions of this paper. To clearly describe the trade-off computational efficiency and mapping accuracy is crucial to the SLAM application.

What is “The quantity c as shown in (??)” in Section V-B, and why not explain how it influences the model. Please also explain how the non-ground features (planar or edge?) in figure 5 associate to the translational error. Is the translational error significantly increasing while the number of features drops?

Please check the capitalization in Algorithm I and figure 5. The pseudocode section should be rewritten to improve the readability of the proposed working flow.

The equations in figure 3 should be relocated in the paragraph. Algebraic symbols described in the legend should be drawn on figure.

n should be n_p in table 3, and this table cannot convince me that POU model significantly improved the mapping accuracy compares to the plane model. 0.02~0.03% is neglectable in a short distance SLAM.

Author Response

Responses to Reviewer 2:

Question 1: Both the abstract and conclusion should present more quantitative data to support the key contributions of this paper. To clearly describe the trade-off computational efficiency and mapping accuracy is crucial to the SLAM application.

Answer: The abstract and conclusion have been modified to present more quantitative data to support the key contributions of the paper and clearly describe the trade-off computational efficiency and mapping accuracy.

Abstract

Purpose: Localization and mapping with LiDAR data is a fundamental building block for autonomous vehicles. Though LiDAR point clouds can often encode the scene depth more accurate and steadier compared with visual information, laser-based Simultaneous Localization And Mapping (SLAM) remains challenging as the data is usually sparse, density variable and less discriminative. The purpose of this paper is to propose an accurate and reliable laser-based SLAM solution.

Design/methodology/approach: The method starts with constructing voxel grids based on the 3D input point cloud. These voxels are then classiﬁed into three types to indicate different physical objects according to the spatial distribution of the points contained in each voxel. During the mapping process, a global environment model with Partition of Unity(POU) implicit surface is maintained and the voxles are merged into the model from stage to stage, which is implemented by Levenberg-Marquardt algorithm.

Findings: We propose a laser-based SLAM method. The method uses POU implicit surface representation to build the model and is evaluated on the KITTI odometry benchmark without loop closure. Our method achieves around 30% translational estimation precision improvement with acceptable sacrifice of efficiency compared to LOAM. Overall, our method uses a more complex and accurate surface representation than LOAM to increase the mapping accuracy at the expense of computational efficiency. Experimental results indicate that the method achieves accuracy comparable to the state-of-the-art methods.

Originality/value: We propose a novel, low-drift SLAM method that falls into a scan-to-model matching paradigm. The method, which operates on point clouds obtained from Velodyne HDL64, is of value to researchers developing SLAM systems for autonomous vehicles.

Conclusion

We present a new 3D LiDAR SLAM method that is composed of a new feature voxel map and a new scan-to-model matching framework. We build a novel feature voxel map with voxels including salient shape characteristics. In order to adapt to the proposed map, we implement scan-to-model matching using POU implicit surface representation to blend the correspondence voxels in map together. Our method achieves an average translational error of 0.61% compared to 0.84% translsational error of LOAM on KITTI odometry benchmark. The mapping accuracy is improved due to the application of the POU surface model. However, to implement feature matching based on the POU model burdens our system with more computation cost. Our method runs at 2s per scan with one thread. For comparison, LOAM runs at 1s per scan on the same KITTI dataset under the scan to model matching framework. As is shown in the the experimental results, the proposed method yields accurate results that are on par with the state-of-the-art. Future work will proceed in two directions. From the research perspective, a specific and efficient octree will be designed to get 3D grid. Meanwhile, we will deploy the method to fulfil real time application with the aid of multiple threads or GPU to accelerate data processing.

Question 2: What is “The quantity c as shown in (??)” in Section V-B, and why not explain how it influences the model. Please also explain how the non-ground features (planar or edge?) in figure 5 associate to the translational error. Is the translational error significantly increasing while the number of features drops?

Answer: “The quantity c as shown in (??)” is “The quantity c as shown in (3)”. For the of the non ground features, its influence to the pose estimation result is shown in Table I. When is set too large, we can not extract enough features and the result gets worse. Table II shows the influence of for the ground features to the result. The quantity of the ground features are generally large and the threshold has a slight influence on the number of features and the final result.

TABLE I: THE DRIFTS VARYING WITH THE FOR NON GROUND FEATURES

Parameter	Drift on KITTI training dataset
	1.03%
	0.92%
	0.90%

TABLE II: THE DRIFTS VARYING WITH THE FOR GROUND FEATURES

Parameter	Drift on KITTI training dataset
	0.94%
	0.93%
	0.90%

Figure 1 shows the number of non-ground planar features in different frames on sequence 01 when is set to 0.85. The average number of non-ground planar features is about 7200. Deschaud [1] holds the view that the pose estimation result is related to the number and distribution of features. The features of different position contribute to different angles and translations. They sample 1000 features based on their sampling strategy which considers the feature distribution and achieve great result. Compared with their method, we do not use a sampling strategy but use more features which compensate for the influence of feature distribution on the results to a certain extent. In our method, the variance of the feature numbers has an influence on the extracted feature distributions and thus further influences the performance. However, since we extract enough number of features for scan-matching, which is more than 2000 at least, there is little influence on the translational error when the number of features drops.

Fig. 1. The number of non-ground planar features in different frames on sequence 01.

Question 3: Please check the capitalization in Algorithm I and figure 5. The pseudocode section should be rewritten to improve the readability of the proposed working flow.

Answer: The pseudocode section has been rewritten as figure 2 and the capitalization in figure 5 has been modified as figure 1.

Fig. 2. Algorithm I.

Question 4: The equations in figure 3 should be relocated in the paragraph. Algebraic symbols described in the legend should be drawn on figure.

Answer: The equations in figure 3 have been relocated in equation (9) and equation (10). Figure 3 has been modified as follows.

(9)

(10)

Fig. 3. Illustration of POU implicit surface. The Figure shows the POU implicit surface representation. Two cells, and , are associated with their support radius and , respectively. The value of a point in the slashed region can be evaluated by ; ; , where and are the distances from the point to the and , respectively, and are the support radius, and are local plane functions.

Question 5: n should be np in table 3, and this table cannot convince me that POU model significantly improved the mapping accuracy compares to the plane model. 0.02~0.03% is neglectable in a short distance SLAM.

Answer: Yes, n is np in table 3. In the plane model, all the points in the planar patch are used to calculate the normal of the planar patch. Then, the distances between the planar feature and the corresponding planar patches is calculated using the same method with LOAM. In POU model, we first compute the normal of each point in the input point cloud using PCL tools. Then, the distances between the planar feature and the corresponding planar patches is calculated as , in which is the point that makes up the planar patch and is the normal of . Since less information is used to compute the point normal in the plane model, errors are introduced into the computed distance. Compared with the plane model, since we precompute the point normal in the whole input point cloud, the normals are more accurate, which leads to a more accurate distance between the planar feature and the corresponding planar patches. Compared to plane model, the normal of POU model is more stable and the result has a slight increase, which has been shown in Table III. The core idea of the POU model is that we use the weight function to blend the subdomains together. Our intension is to desmonstrate the effectiveness of the application of the POU model. However, in our implementation, the plane model still adopt the same B-spline weight with the POU model. Thus, technically, the proposed plane model still belongs to the POU framework, which means that the results in TABLE III are not sufficient for illusiting the effectiveness of the application of POU model. To better illustrate the effectiveness, we implement camparison experiments in which all the weights of planar patches are the same. The results are shown in Table IV. We can tell that the performance is significantly improved due to the application of POU framework.

TABLE III: THE COMPARISON RESULTS BETWEEN POU MODEL AND PLANE MODEL

Parameter	Plane Model	POU Model
	0.93%	0.90%
	0.94%	0.92%
	0.94%	0.92%

TABLE IV: THE COMPARISON RESULTS BETWEEN POU WEIGHT AND THE SAME WEIGHT

Parameter	The Same Weight	POU Weight
	1.84%	0.90%
	1.91%	0.92%
	1.94%	0.92%

[1] Deschaud, Jean Emmanuel. "IMLS-SLAM: scan-to-model matching based on 3D data." (2018):2480-2485.

Thank you.

Sincerely yours,

Jianwen Jiang, Jikai Wang, Peng Wang, Zonghai Chen

Author Response File: Author Response.doc

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

The quality is very good. The presentation is in a good logic. I can easily follow the authors. I think this paper should attract much attention from the readers. So I support its publication.

Author Response

Thank you.

Sincerely yours,

Jianwen Jiang, Jikai Wang, Peng Wang, Zonghai Chen

Reviewer 2 Report

General notes:

Interesting approach and well-detailed paper. With many methods out there for SLAM using Lidar data, novelty is a concern. On the other hand, there is sufficient detail and the work appears scientifically sound that the work is publishable.

Major problems:

- POU is central to the algorithm and approach and is even in the title. The original paper on POU is not cited (https://www.cc.gatech.edu/~turk/my_papers/mpu_implicits.pdf), and no prior work on using POU for odometry or other purposes is presented. Related Work section should discuss POU extensively. The paper shouldn't look like POU is introduced here.

- A very interesting method to differentiate object shapes using eigenvectors and eigenvectors. The paper should cite where this idea came from. It would be dishonest to pretend it is newly introduced in this paper. For example, the idea appears here: https://www.scientific.net/AMR.424-425.894

- Without loop closure, it is not technically a SLAM algorithm, but rather Odometry. Is there a good rationale as to why it is called POU-SLAM? perhaps it should be changed. If "SLAM" is used in this way in prior work (i.e. odometry without loop closure), then it should be ok here. As far as I know, for example, ORB-SLAM and others, SLAM deals with loop closure.

Other concerns:

- Acronyms should be explained, for example, LOAM and ICP under related work.

- English and grammar need work. I recommend using the software "Grammarly", it's free and available as a plug-in for Word. It can correct most of the issues I'm spotting (mostly in Abstract)

- Title word capitalization is not consistent.

Author Response

Question 1: POU is central to the algorithm and approach and is even in the title. The original paper on POU is not cited (https://www.cc.gatech.edu/~turk/my_papers/mpu_implicits.pdf), and no prior work on using POU for odometry or other purposes is presented. Related Work section should discuss POU extensively. The paper shouldn't look like POU is introduced here.

Answer: We cited the original paper on POU in IV B. We have added the introduction to POU in related work. Ireneusz Tobor et.al [1] show how to reconstruct multi-scale implicit surfaces with attributes, given discrete point sets with attributes. Tung-Ying Lee et.al propose a new 3D non-rigid registration algorithm to register two multi-level partition of unity (MPU) implicit surfaces with a variational formulation.

[1] Ireneusz Tobor, Patrick Reuter, and Christophe Schlick. Reconstructing multi-scale variational partition of unity implicit surfaces with attributes. international conference on shape modeling and applications, 68(1):25–41, 2006.

[2] Tung-Ying Lee and Shang-Hong Lai. 3d non-rigid registration for mpu implicit surfaces. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 1–8, 2008.

Question 2: A very interesting method to differentiate object shapes using eigenvectors and eigenvectors. The paper should cite where this idea came from. It would be dishonest to pretend it is newly introduced in this paper. For example, the idea appears here:

https://www.scientific.net/AMR.424-425.894

Answer: We cited a related paper where the idea came from in section II. We also stated where the idea came from again in section III.

Question 3: Without loop closure, it is not technically a SLAM algorithm, but rather Odometry. Is there a good rationale as to why it is called POU-SLAM? perhaps it should be changed. If "SLAM" is used in this way in prior work (i.e. odometry without loop closure), then it should be ok here. As far as I know, for example, ORB-SLAM and others, SLAM deals with loop closure.

Answer: Up to now, even though SLAM has been extensively researched and various solutions have been proposed, there is still no rigid definition of SLAM. For example, Ji Zhang et.al proposed LOAM. LOAM does not use “SLAM” without loop closure. Jean-Emmanuel Deschaud proposed IMLS-SLAM. IMLS-SLAM uses the “SLAM” without loop closure.

Question 4: There is an aspect ratio problem in Figure 3.

Answer: We have adjusted the ratio of the Figure 3.

Question 5: Some abbreviations were not defined in this paper. Eg. LOAM, NDT.

Answer: LOAM is the abbreviation for Lidar Odometry And Mapping. NDT is the abbreviation for Normal Distributions Transform. These abbreviations have been defined in the paper.

Thank you.

Sincerely yours,

Jianwen Jiang, Jikai Wang, Peng Wang, Zonghai Chen

Author Response File: Author Response.pdf

Reviewer 3 Report

My major comments are listed as follow:

1. In section III-A, only surface and linear features were extracted from voxels. How does sphere or cylinder shape point clouds may appear in your assumption?

2. Please describe more detail about the suggested threshold (25) of voxel points, I think this threshold should be varied based on the sampling density.

3. In the voxel sampling stage, how to define the cth and pth thresholds in each scan? What is your strategy to giving these thresholds in the real-time process?

4. The methodology section does not clearly explain how to avoid blunder and error propagation in the motion estimation stage.

5. Please explain the reasons why authors choosing these specific sequences from KITTI benchmark. How is the performance in a long path test case?

6. I would like to see some discussion or advice in relation to the computational cost of the proposed method. What is your operating platform and runtime?

Some minor comments are listed as follow:

1. There is an aspect ratio problem in Figure 3.

2. Some abbreviations were not defined in this paper. Eg. LOAM, NDT.

Author Response

Question 1: In section III-A, only surface and linear features were extracted from voxels. How does sphere or cylinder shape point clouds may appear in your assumption?

Answer: Whether sphere or cylinder shape point clouds appear or not depends on the thresholds and . In our method, obtaining precise scan-matching results requires sufficient number of features. For laser scans from different kinds of environment, structured or unstructured, we aim to extract enough features by selecting appropriate thresholds.. In structured environment, the thresholds are set high because there are many planar and linear features. In a mixed environment of structured and non-structured features, in order to extract enough features, the thresholds are set lower than those in a structured environment. In non-structured environment, the thresholds are set the lowest to extract enough features. The smaller the thresholds, features on sphere or cylinder are more likely to be extracted. Even if features on sphere or cylinder are extracted, our POU implicit surface representation use the weight function to blend each planar patch together to represent the surface which can represent the surface better than just using a plane.

Question 2: Please describe more detail about the suggested threshold (25) of voxel points, I think this threshold should be varied based on the sampling density.Answer: The threshold of voxel points is related to the horizontal and vertical resolution of Lidar point cloud. In the process of feature extracting, the densities of features in corresponding voxels vary. We consider the density while searching for the neighbors of the features. The lower the density of the features, the lower the probability of finding enough neighbors within a certain radius is. Our experiment result is achieved when is set to 25.

Question 3: In the voxel sampling stage, how to define the cth and pth thresholds in each scan? What is your strategy to giving these thresholds in the real-time process?

Answer: Similar to our response to Q6, our method is evaluated by processing KTTI datasets rather than being deployed in an autonomous vehicle. With regarding to real time implementation, which heavily depends on the hardware configuration, we cannot assert if our method is real time or not. In our experiment, we define the thresholds cth and pth based on the KITTI Odometry sequence 01. Laser scans contained in this dataset are collected from highway. Since the environment has only very few distinct structures which can be used for scan-matching, it is more challengeable to extract enough features than other seuqences. Most of the laser returns correspond to the flat street and only few correspond to traffic signs or some sparse trees or bushed along the highway. Therefore, if the threshlds can perform well on such data, they can be applicable to other sequences too. For structured environment, the thresholds can be raised and enough features can be extracted.

Question 4: The methodology section does not clearly explain how to avoid blunder and error propagation in the motion estimation stage.

Answer: We use a scan-to-model matching framework. The basic idea of scan-to-model is that it does scan matching with the last () scans, and the final result is obtained by aggregating the successfully matched () scans. Literally, by matching the current scan with the historical scans, the error propagation problem can be suppressed. As for blunder, if we may understand right, the reviewer meant mis-match between the scan and the model. We limit the scale of relative transformation between the current scan and the model. When the blunder occurs, we will abandon the result of the current scan and subsequent scan-matching implementation would be slightly affected.

Question 5: Please explain the reasons why authors choosing these specific sequences from KITTI benchmark. How is the performance in a long path test case?

Answer: The sequences are not specifically chosen. The reason that we choose KITTI Odometry sequences 00-10 is that these sequences are with ground truth. Besides, other methods, such as LOAM, also conduct the experiments with these sequences. Therefore, by using these data, we can not only evaluate our method by comparing our performance with other baselines, but also with the ground truth.

Question 6: I would like to see some discussion or advice in relation to the computational cost of the proposed method. What is your operating platform and runtime?

Answer: The operating platform is Intel i7-7820@3.60 GHz with 16 GB RAM. Our method is evaluated by processing KTTI datasets rather than being deployed in an autonomous vehicle. With regarding to real time implementation, which heavily depends on the hardware configuration, we cannot assert if our method is real time or not. However, we can still provide some of the parameters so people can know the runtime performance of our method. First, we split the point cloud into voxels which contain certain points and calculate the linear-ness values and surface-ness value for these voxels. It takes 0.5s. Second, we do scan matching for each feature. It takes time depending on the number of features. In KITTI Odometry sequence 01, the is about 6000 and runtime is about 1.5s. Because of the KITTI dataset and our implementation, our SLAM runs at 2s per scan with one thread. For comparison, LOAM runs at 1s per scan on the KITTI dataset.

Question 7: Acronyms should be explained, for example, LOAM and ICP under related work.

Answer: LOAM is the abbreviation for Lidar Odometry And Mapping. ICP is the abbreviation for Iterative Closest Point. These acronyms have been explained in related work.

Question 8: English and grammar need work. I recommend using the software "Grammarly", it's free and available as a plug-in for Word. It can correct most of the issues I'm spotting (mostly in Abstract)

Answer: We have modified our English and grammar in Abstract.

Question 9: Title word capitalization is not consistent.

Answer: We have modified the title word capitalization and they are consistent now.

Thank you.

Sincerely yours,

Jianwen Jiang, Jikai Wang, Peng Wang, Zonghai Chen

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Thank you for the detailed responses to my questions. After reviewing the feedbacks, I decided to reject this manuscript due to following reasons:

1. From the experimental evaluation, authors should address how many surface features and linear features were found and how they enhanced POU model. There is no explanation about the performance of the scan-to-model matching framework neither.

2. From Table 1, the highway case (sequence 01) shows a 50% improvement compared to LOAM and SUMA. However, it should be the most featureless case amount the others. Discussion about how the proposed methods contribute to the motion estimation was missing in the experimental results.

3. The robustness of the proposed algorithm is not solid enough, many thresholds were given on a case-by-case basis.

4. The LiDAR motion model should disclosure how they compromise the misalignment between each surface.

The novelty is good in this study; however, the content and presentation still need further improvement.

Article Menu

POU-SLAM: Scan-to-Model Matching Based on 3D Voxels

Further Information

Guidelines

MDPI Initiatives

Follow MDPI