Efficient Hardware Accelerator Design of Non-Linear Optimization Correlative Scan Matching Algorithm in 2D LiDAR SLAM for Mobile Robots
Abstract
:1. Introduction
- (1)
- For the issue of the high computational cost and resource cost of the conventional CSM algorithm, a two-step NLO-CSM algorithm is adopted in this paper. The CSM algorithm performs scanning and matching on a down-sampling low-resolution map to reduce computation. Based on a good initial pose found by the first-step CSM algorithm, the second-step NLO algorithm performs iterative operations to obtain the optimal pose. The optimized NLO-CSM algorithm not only avoids the high computational complexity of brute force searching on the grid map in the conventional CSM algorithm but also reduces computation time and energy consumption of computing hardware, as required. The optimized NLO-CSM algorithm can achieve good scan match performance by avoiding the divergence caused by poor initial pose while performing the NLO algorithm only.
- (2)
- This paper also presents a comprehensive algorithmic analysis of the adopted NLO-CSM algorithm. A corresponding efficient hardware accelerator design is proposed, based on the analysis, to accelerate the major computation-intensive tasks in the NLO-CSM algorithm. By exploiting the algorithm similarity and operator sharing between the two-step algorithm computations, module reusing technique is adopted to further reduce the hardware overhead, as required by the computation of the two-step NLO-CSM algorithm. In addition, pipeline processing strategy is adopted to realize fast computing, therefore achieving high energy efficiency. The algorithmic analysis and corresponding hardware design provide a practical reference for efficient hardware design of scan matching algorithms.
- (3)
- Systematic hardware evaluation, based on both FPGA and ASIC (Application Specific Integrated Circuit) implementations of the proposed NLO-CSM hardware accelerator, has been done. Comparisons among the CPU software solution, FPGA-based hardware accelerator, and ASIC-based hardware accelerator have been carried out to prove the effectiveness of the proposed work, in terms of computing speed and energy efficiency improvements, against existing state-of-the-arts.
2. Algorithm Analysis of NLO-CSM
2.1. Conventional CSM Algorithm
Algorithm 1: Conventional CSM Algorithm | |
1: | FunctionCSM (Map, xm, xn, ym, yn, θm, θn) |
2: | max_score = 0, best_pose = (xm, ym, θm), |
3: | for xi in [xm, xn] |
4: | for yi in [ym, yn] |
5: | for θi in [θm, θn] |
6: | score = get_score (Map, Li), Li = (xi, yi, θi) |
7: | if score > best_score then best_pose = L, max_score = score |
8: | end for |
9: | end for |
10: | end for |
11: | returnbest_pose |
12: | Functionget_score(Map, Li), L = (xi, yi, θi) |
13: | θi → sin θi, cos θi, Scan_points = [s0, …si, …sk], si = (si,x, si,y), score = 0 |
14: | for si in [s0, …si, …sk] |
15: | |
16: | score = score + Map(Si,x, Si,y) |
17: | end for |
18: | returnscore |
2.2. NLO-CSM Algorithm
Algorithm 2: NLO-CSM Algorithm | |
1: | Map = matrix[p, q], Scan_points = [s0, …si, …sk], si = (si,x, si,y)//n is the number of LiDAR points, n = k+1 |
2: | Function Coarse_match |
3: | max_score = 0, best_pose = (xm, ym, θm) |
4: | for xi in [xm, xn] |
5: | for yi in [ym, yn] |
6: | for θi in [θm, θn] |
7: | score = get_score(Li), Li = (xi, yi, θi) |
8: | if score > best_score then |
9: | best_pose = L, max_score = score |
10: | Function Fine_match(L0), L0 = (L0,x, L0,y, L0,θ) |
11: | ∆L = (∆Lx, ∆Ly, ∆Lθ) = (0, 0, 0), H = 0, K = 0 |
12: | for i in [1, λ]//λ is the number of Gauss-Newton iteration times |
13: | Li = Li-1+∆L = (Li-1,x, Li-1,y, Li-1,θ) + (∆Lx, ∆Ly, ∆Lθ) |
14: | (H, K) = get_HessianDerivs(Li) |
15: | ∆L = (∆Lx, ∆Ly, ∆Lθ) = H−1∙K |
16: | SubFunction get_score(Li), Li = (xi, yi, θi) |
17: | θi → sin θi, cos θi, score = 0 |
18: | for si in [s0, …si, …sk] |
19: | |
20: | score = score + Map(Si,x, Si,y) |
21: | return score |
22: | SubFunction get_HessianDerivs (Li), Li = (Li,x, Li,y, Li,θ) |
23: | θi → sin Li,θ, cos Li,θ |
24: | for i in [1, k] |
25: | |
26: | |
27: | |
28: | |
29: | end |
30: | , |
3. NLO-CSM Algorithm Hardware Accelerator Design
3.1. Overall System Architecture
- (1)
- The preprocessing module performs the storage update of the pose, as well as the input-data processing of the pose angles, i.e., the and calculation, in the get_score and get_HessianDerivs processes.
- (2)
- The local memory module stores the matched occupancy grid map and the LiDAR points obtained by scanning frames of the LiDAR sensor.
- (3)
- The score/K&H matrix calculation module is the core calculation unit of the accelerator, including derivative and coordinate calculator, Grid map read controller, matrix multiplier, gradient calculator, and matrix MAC unit.
- (1)
- According to Equation (1), the calculation of and in the function get_score and get_HessianDerivs shares the same trigonometric functions and multiplication calculations, and it is segmented into subtask 1. The same hardware circuit in the score/K&H matrix calculation module can be reused to reduce the repeated calculation and hardware overhead, as shown in Figure 10a.
- (2)
- In the local memory module, the two-dimensional grid map is stored in the one-dimensional form. The values of four LiDAR points in the grid map need to be read at a time, so the access to the local memory is segmented into subtask 2.
- (3)
- As shown in Figure 10b, both of and use the same input data, and the calculation of relevant coordinates are consistent. Therefore, the same operation can be reused to reduce the repeated computation and hardware overhead, and it is segmented into subtask 3.
- (4)
- The small size matrix multiplication calculation is set as subtask 4, which finishes the calculation of .
- (5)
- The matrix multiplication and summation of the H and K matrix is segmented into subtask 5.
3.2. Architectures of Subunits
4. Implementation Results and Discussion
4.1. FPGA Implementation and Evaluation
4.2. ASIC Implementation and Discussion
4.3. Comparison with the State-of-the-Art and Discussions
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Thrun, S. Probabilistic robotics. Commun. ACM 2002, 45, 52–57. [Google Scholar] [CrossRef]
- Smith, R.C.; Cheeseman, P. On the representation and estimation of spatial uncertainty. Int. J. Robot. Res. 1986, 5, 56–68. [Google Scholar] [CrossRef]
- Durrant-Whyte, H.; Bailey, T. Simultaneous localization and mapping: Part I. IEEE Robot. Autom. Mag. 2006, 13, 99–110. [Google Scholar] [CrossRef] [Green Version]
- Zhao, J.; Huang, S.; Zhao, L.; Chen, Y.; Luo, X. Conic Feature Based Simultaneous Localization and Mapping in Open Environment via 2D Lidar. IEEE Access 2017, 7, 173703–173718. [Google Scholar] [CrossRef]
- Zhang, C.; Yong, L.; Chen, Y.; Zhang, S.; Ge, L.; Wang, S.; Li, W. A rubber-tapping robot forest navigation and information collection system based on 2D LiDAR and a gyroscope. Sensors 2019, 19, 2136. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Santos, J.M.; Portugal, D.; Rocha, R.P. An evaluation of 2D SLAM techniques available in Robot Operating System. In Proceedings of the 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Linköping, Sweden, 21–26 October 2013; pp. 1–6. [Google Scholar]
- Barczyk, M.; Bonnabel, S.; Deschaud, J.-E.; Goulette, F. Invariant EKF Design for Scan Matching-Aided Localization. IEEE Trans. Control. Syst. Technol. 2015, 23, 2440–2448. [Google Scholar] [CrossRef] [Green Version]
- Wang, D.; Liang, H.; Mei, T.; Zhu, H.; Fu, J.; Tao, X. Lidar Scan matching EKF-SLAM using the differential model of vehicle motion. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), Gold Coast, QLD, Australia, 23–26 June 2013; pp. 908–912. [Google Scholar]
- Murphy, K.; Russell, S. Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks; Springer: Berlin/Heidelberg, Germany, 2001; pp. 2440–2448. [Google Scholar]
- Pitt, M.K.; Shephard, N. Filtering via simulation: Auxiliary particle filters. J. Am. Stat. Assoc. 2012, 94, 590–599. [Google Scholar] [CrossRef]
- Jo, H.; Cho, H.M.; Jo, S.; Kim, E. Efficient Grid-Based Rao–Blackwellized Particle Filter SLAM With Interparticle Map Sharing. IEEE/ASME Transactions on Mechatronics 2018, 23, 714–724. [Google Scholar] [CrossRef]
- Olson, E.B. Real-time correlative scan matching. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, 12–17 May 2009; pp. 4387–4393. [Google Scholar]
- Karto SLAM. Available online: http://www.ros.org/wiki/karto (accessed on 15 September 2015).
- Kohlbrecher, S.; von Stryk, O.; Meyer, J.; Klingauf, U. A flexible and scalable SLAM system with full 3D motion estimation. In Proceedings of the 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Kyoto, Japan, 1–5 November 2011; pp. 155–160. [Google Scholar]
- Hess, W.; Kohler, D.; Rapp, H.; Andor, D. Real-time loop closure in 2D LIDAR SLAM. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 1271–1278. [Google Scholar]
- Sugiura, K.; Matsutani, H. A Universal LiDAR SLAM Accelerator System on Low-Cost FPGA. IEEE Access 2022, 10, 26931–26947. [Google Scholar] [CrossRef]
- Bao, M.; Wang, K.; Li, R.; Ma, B.; Fan, Z. RIA-CSM: A Real-Time Impact-Aware Correlative Scan Matching Using Heterogeneous Multi-Core SoC. IEEE Sens. J. 2022, 22, 5787–5796. [Google Scholar] [CrossRef]
- Wang, C.; Luo, J.; Zhou, J. A 1-V to 0.29-V sub-100-pJ/operation ultra-low power fast-convergence CORDIC processor in 0.18-µm CMOS. In Microelectronics Journal (MEJ); Elsevier: Amsterdam, The Netherlands, 2018; Volume 76, pp. 52–62. [Google Scholar]
Dataset | Error Type | Localization Error | |
---|---|---|---|
This Work | IEEE Sensors Journal’ 2022 [15] | ||
Deutsches Museum | Abs translational error (m) | 0.02332 ± 0.01634 | 0.03034 ± 0.02193 |
Sqr translational error () | 0.00081 ± 0.00155 | 0.00140 ± 0.00241 | |
Abs rotational error (deg) | 0.14480 ± 0.13388 | 0.15550 ± 0.15115 | |
Sqr rotational error (deg2) | 0.03889 ± 0.08288 | 0.04687 ± 0.09481 | |
Revo LDS | Abs translational error (m) | 0.02092 ± 0.02595 | 0.03185 ± 0.01916 |
Sqr translational error () | 0.00111 ± 0.00299 | 0.00138 ± 0.00156 | |
Abs rotational error (deg) | 0.23718 ± 0.23081 | 0.22776 ± 0.16350 | |
Sqr rotational error (deg2) | 0.10941 ± 0.19017 | 0.07839 ± 0.09769 |
Computing Task | Hardware Module | |
---|---|---|
Subtask 1 | , | Derivative & Coordinate calculator |
Subtask 2 | Grid map read controller | |
Subtask 3 | , | Gradient calculator |
Subtask 4 | Matrix multiplier | |
Subtask 5 | Matrix MAC unit |
LUT | 4142 |
---|---|
FF | 3193 |
DSP | 19 |
BRAM | 297 KB |
Power | 0.79 W@100 MHz |
Process | 65 nm |
---|---|
Area | 1.49 × 1.23 mm2 |
Supply Voltage | 1.08 V |
Gates | 1.37 M |
Memory | 259 KB |
Power | 11.2 mW@116 MHz |
Publication | IEEE Access’2022 [16] | IEEE Sensors Journal’2022 [17] | This Work |
---|---|---|---|
FPGA Platform | Zynq-7020 (SoC) (28 nm FPGA) | AX7Z100 (SoC) (28 nm FPGA) | Zynq-7020 (SoC) (28 nm FPGA) |
Algorithm | Conventional CSM | RIA-CSM (Real-Time Impact-Aware CSM) | NLO-CSM |
Abs translational error (m) | 0.0376 ± 0.0307 (Based on MIT-CSAIL dataset) | 0.03185 ± 0.01916 (Based on Revo LDS dataset) | 0.02092 ± 0.02595 (Based on Revo LDS dataset) |
Grid Resolution (cm) | 5 | 5 | 5 |
Frequency (MHz) | 100 | 133 | 100 |
LUTs | 21026 | 13870 | 4142 |
FFs | 20121 | 22747 | 3193 |
BRAM (KB) | 444 | 1962.5 | 297 |
DSPs | 24 | 32 | 19 |
Frame Rate (fps) | 12.13 @100 MHz | 89.58 @133 MHz (67.19 @100 MHz) | 111.29 @100 MHz |
Power Consumption | 2.3 W @100 MHz | 2.113 W @133 MHz (1.58 W @100 MHz) | 0.79 W @100 MHz |
Energy Efficiency | 189.52 mJ/frame | 23.58 mJ/frame | 7.15 mJ/frame |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, A.; Yu, G.; Wang, Q.; Han, D.; Zhao, S.; Liu, B.; Yu, Y.; Li, Y.; Wang, C.; Zou, X. Efficient Hardware Accelerator Design of Non-Linear Optimization Correlative Scan Matching Algorithm in 2D LiDAR SLAM for Mobile Robots. Sensors 2022, 22, 8947. https://doi.org/10.3390/s22228947
Hu A, Yu G, Wang Q, Han D, Zhao S, Liu B, Yu Y, Li Y, Wang C, Zou X. Efficient Hardware Accelerator Design of Non-Linear Optimization Correlative Scan Matching Algorithm in 2D LiDAR SLAM for Mobile Robots. Sensors. 2022; 22(22):8947. https://doi.org/10.3390/s22228947
Chicago/Turabian StyleHu, Ao, Guoyi Yu, Qianjin Wang, Dongxiao Han, Shilun Zhao, Bingqiang Liu, Yu Yu, Yuwen Li, Chao Wang, and Xuecheng Zou. 2022. "Efficient Hardware Accelerator Design of Non-Linear Optimization Correlative Scan Matching Algorithm in 2D LiDAR SLAM for Mobile Robots" Sensors 22, no. 22: 8947. https://doi.org/10.3390/s22228947