Multi-Scale Attentive Aggregation for LiDAR Point Cloud Segmentation
Abstract
:1. Introduction
1.1. Background
1.2. Reviews
1.3. Our Works
- (1)
- An Attentive Skip Connection (ASC) module based on the attention mechanism was proposed to replace the traditional skip connection to bridge the semantic gap between point cloud features in the encoder and decoder.
- (2)
- A multi-scale aggregation was introduced to fuse point-cloud features of different scales not only from the decoder but also from the encoder.
- (3)
- A Channel Attentive Enhancement (CAE) module was introduced to the local spatial encoding module of RandLA-Net [23] to further increase the representation ability of local features.
- (4)
- Our MSAAN significantly outperformed state-of-art methods on the CSPC and Toronto3D datasets with at least 5% on mean intersection over union (mIoU) score.
2. Methods
2.1. Backbone of the Encoder
2.2. PFE (Point Feature Enrichment) Module
2.3. LAE (Local Attention Enhancement) Module
2.4. ASC (Attentive Skip Connection) Module
2.5. Multi-Scale Aggregation
3. Experiments and Analysis
3.1. Experiment Design
3.2. Experiments and Analysis
3.3. Ablation Study
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Zhang, J.X.; Lin, X.G.; Ning, X.G. SVM-based classification of segmented airborne LiDAR point clouds in urban areas. Remote Sens. 2013, 5, 3749–3775. [Google Scholar] [CrossRef] [Green Version]
- Chehata, N.; Li, G.; Mallet, C. Airborne LiDAR feature selection for urban classification using random forests. Geomat. Inf. Sci. Wuhan Univ. 2009, 38, 207–212. [Google Scholar]
- Zhuang, Y.; Liu, Y.; He, G.; Wang, W. Contextual classification of 3D laser points with conditional random fields in urban environments. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Hamburg, Germany, 28 September–2 October 2015; pp. 3908–3913. [Google Scholar]
- Lu, Y.; Rasmussen, C. Simplified Markov random fields for efficient semantic labeling of 3D point clouds. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, 7–12 October 2012; pp. 2690–2697. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Lawin, F.J.; Danelljan, M.; Tosteberg, P.; Bhat, G.; Khan, F.S.; Felsberg, M. Deep projective 3D semantic segmentation. In Proceedings of the 17th International Conference on Computer Analysis of Images and Patterns, Ystad, Sweden, 22–24 August 2017; pp. 95–107. [Google Scholar]
- Boulch, A.; Saux, B.L.; Audebert, N. Unstructured point cloud semantic labeling using deep segmentation networks. In Eurographics Workshop on 3D Object Retrieval; The Eurographics Association: Geneva, Switzerland, 2017. [Google Scholar]
- Wu, B.; Wan, A.; Yue, X.; Keutzer, K. SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud. arXiv 2017, arXiv:1710.07368. [Google Scholar]
- Wu, B.; Zhou, X.; Zhao, S.; Yue, X.; Keutzer, K. SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud. arXiv 2018, arXiv:1809.08495. [Google Scholar]
- Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 4213–4220. [Google Scholar]
- Meng, H.Y.; Gao, L.; Lai, Y.; Manocha, D. VV-Net: Voxel Vaenet with Group Convolutions for Point Cloud Segmentation. arXiv 2018, arXiv:1811.04337. [Google Scholar]
- Rethage, D.; Wald, J.; Sturm, J.; Navab, N.; Tombari, F. Fully-convolutional point networks for large-scale point clouds. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 596–611. [Google Scholar]
- Graham, B.; Engelcke, M.; van der Maaten, L. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. In Proceedings of the IEEE Computer Vision and Pattern Recognition CVPR, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Su, H.; Jampani, V.; Sun, D.; Maji, S.; Kalogerakis, V.; Yang, M.-H.; Kautz, J. SPLATNet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–21 June 2018. [Google Scholar]
- Rosu, R.A.; Schutt, P.; Quenzel, J.; Behnke, S. Latticenet: Fast Point Cloud Segmentation Using Permutohedral Lattices. arXiv 2019, arXiv:1912.05905. [Google Scholar]
- Dai, A.; Nießner, M. 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 452–468. [Google Scholar]
- Jaritz, M.; Gu, J.; Su, H. Multi-view Pointnet for 3D Scene Understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCVW), Seoul, Korea, 1 October 2019; pp. 3995–4003. [Google Scholar]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans. Pattern Anal. and Mach. Intell. 2020. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 3–9 December 2017. [Google Scholar]
- Jiang, M.; Wu, Y.; Zhao, T.; Zhao, Z.; Lu, C. PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation. arXiv 2018, arXiv:1807.00652. [Google Scholar]
- Zhao, H.; Jiang, L.; Fu, C.W.; Jia, J. PointWeb: Enhancing local neighborhood features for point cloud processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5565–5573. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 11108–11117. [Google Scholar]
- Li, Y.Y.; Bu, R.; Sun, M.C.; Wu, W.; Di, X.H.; Chen, B.Q. PointCNN: Convolution on X-Transformed Points. Adv. Neur. Inf. 2018, 31, 820–830. [Google Scholar]
- Thomas, H.; Qi, C.R.; Deschaud, J.-E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. KPConv: Flexible and Deformable Convolution for Point Clouds. arXiv 2019, arXiv:1904.08889. [Google Scholar]
- Ye, X.; Li, J.; Huang, H.; Du, L.; Zhang, X. 3D Recurrent neural networks with context fusion for point cloud semantic segmentation. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 415–430. [Google Scholar]
- Landrieu, L.; Simonovsky, M. Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4558–4567. [Google Scholar]
- Wang, L.; Huang, Y.; Hou, Y.; Zhang, S.; Shan, J. Graph Attention Convolution for Point Cloud Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 10296–10305. [Google Scholar]
- Bello, I.; Zoph, B.; Vaswani, A.; Shlens, J.; Le, Q.V. Attention augmented convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 3286–3295. [Google Scholar]
- Chen, L.; Zhang, H.; Xiao, J.; Nie, L.; Shao, J.; Liu, W.; Chua, T.-S. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5659–5667. [Google Scholar]
- Li, H.; Xiong, P.; An, J.; Wang, L. Pyramid attention network for semantic segmentation. arXiv 2018, arXiv:1805.10180. [Google Scholar]
- Fan, L.; Wang, W.C.; Zha, F.; Yan, J. Exploring new backbone and attention module for semantic segmentation in street scenes. IEEE Access 2018, 6, 71566–71580. [Google Scholar] [CrossRef]
- Wang, X.; He, J.; Ma, L. Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–15 December 2019; pp. 4573–4583. [Google Scholar]
- Jia, M.; Li, A.; Wu, Z. A Global Point-Sift Attention Network for 3d Point Cloud Semantic Segmentation. In Proceedings of the International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 5065–5068. [Google Scholar]
- Ji, S.; Wei, S.; Lu, M. A scale robust convolutional neural network for automatic building extraction from aerial and satellite imagery. Int. J. Remote Sens. 2018, 40, 3308–3322. [Google Scholar] [CrossRef]
- Wei, S.; Ji, S.; Lu, M. Toward Automatic Building Footprint Delineation from Aerial Images Using CNN and Regularization. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2178–2189. [Google Scholar] [CrossRef]
- Pintore, G.; Agus, M.; Gobbetti, E. AtlantaNet: Inferring the 3D Indoor Layout from a Single 360 Image Beyond the Manhattan World Assumption. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Tong, G.; Li, Y.; Chen, D.; Sun, Q.; Cao, W.; Xiang, G. CSPC-Dataset: New LiDAR Point Cloud Dataset and Benchmark for Large-scale Semantic Segmentation. IEEE Access 2020, 8, 87695–87718. [Google Scholar] [CrossRef]
- Tan, W.; Qin, N.; Ma, L.; Li, Y.; Du, J.; Cai, G.; Yang, K.; Li, J. Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways. arXiv 2020, arXiv:2003.08284. [Google Scholar]
- Boulch, A.; Guerry, J.; Le Saux, B.; Audebert, N. SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Comput. Graph. 2017, 71, 189–198. [Google Scholar] [CrossRef]
- Huang, J.; You, S. Point cloud labeling using 3D Convolutional Neural Network. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancún, Mexico, 4–8 December 2016; pp. 2670–2675. [Google Scholar]
- Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark. arXiv 2017, arXiv:1704.03847. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Sun, Y.B.; Liu, Z.W.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. Acm Trans. Graphic 2019, 38, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Ma, L.F.; Li, Y.; Li, J.; Tan, W.K.; Yu, Y.T.; Chapman, M. Multi-scale Point-wise Convolutional Neural Networks for 3D Object Segmentation from LiDAR Point Clouds in Large-scale Environments. IEEE Trans. Intell. Transport. Syst. 2019, 99, 1–16. [Google Scholar] [CrossRef]
- Li, Y.; Ma, L.; Zhong, Z.; Cao, D.; Li, J. TGNet: Geometric Graph CNN on 3D Point Cloud Segmentation. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3588–3600. [Google Scholar] [CrossRef]
Scenes | Ground | Building | Car | Bridge | Vegetation | Pole | Total |
---|---|---|---|---|---|---|---|
Scene-1 | 6,082,987 | 9,032,520 | 651,442 | 0 | 641,970 | 24,034 | 16,433,953 |
Scene-2 | 4,358,082 | 3,992,075 | 525,815 | 90,637 | 257,708 | 43,930 | 9,268,247 |
Scene-3 | 8,736,662 | 599,645 | 469,273 | 97,712 | 163,830 | 46,579 | 15,510,510 |
Scene-4 | 10,282,388 | 835,169 | 71,577 | 0 | 5,116,352 | 8285 | 16,323,771 |
Scene-5 | 5,332,925 | 4,197,404 | 34,960 | 0 | 322,488 | 49,397 | 9,937,174 |
Section | Road | Road Marking | Natural | Building | Utility Line | Pole | Car | Fence |
---|---|---|---|---|---|---|---|---|
L001 | 11,178 | 433 | 1408 | 6037 | 210 | 263 | 1564 | 83 |
L002 | 6353 | 301 | 1942 | 866 | 84 | 155 | 199 | 24 |
L003 | 20,587 | 786 | 1908 | 11,672 | 332 | 408 | 1969 | 300 |
L004 | 3738 | 281 | 1310 | 525 | 37 | 71 | 200 | 4 |
Network | Ground | Building | Car | Vegetation | Pole | Bridge | mIoU | OA |
---|---|---|---|---|---|---|---|---|
SnapNet [40] | 42.8 | 43.9 | 6.0 | 10.8 | 0.0 | 0.0 | 17.3 | 54.8 |
PointNet++ [20] | 46.9 | 47.7 | 5.9 | 0.5 | 0.0 | 0.0 | 16.8 | 56.9 |
3D CNN [41] | 78.2 | 90.5 | 1.3 | 5.4 | 0.5 | 0.2 | 19.2 | 58.4 |
DeepNet [42] | 79.9 | 35.3 | 8.7 | 8.6 | 0.3 | 0.0 | 22.2 | 61.2 |
KPConv [25] | 94.1 | 87.8 | 66.6 | 77.5 | 0.0 | 0.0 | 54.3 | 93.6 |
RandLA-Net [23] | 85.6 | 84.3 | 48.6 | 63.6 | 11.5 | 1.4 | 49.2 | 87.7 |
Ours | 89.7 | 88.2 | 61.0 | 63.2 | 20.6 | 64.0 | 64.5 | 91.9 |
Network | Ground | Building | Car | Vegetation | Pole | Bridge | mIoU | OA |
---|---|---|---|---|---|---|---|---|
SnapNet [40] | 40.2 | 38.4 | 0.2 | 8.4 | 0.0 | - | 17.5 | 52.3 |
PointNet++ [20] | 47.2 | 48.0 | 5.9 | 0.6 | 0.0 | - | 20.3 | 57.1 |
3D CNN [41] | 71.0 | 56.5 | 1.3 | 9.1 | 1.5 | - | 27.9 | 69.9 |
DeepNet [42] | 71.3 | 44.9 | 0.9 | 10.6 | 0.5 | - | 25.6 | 63.3 |
KPConv [25] | 87.5 | 88.7 | 63.2 | 54.8 | 0.0 | - | 58.8 | 92.4 |
RandLA-Net [23] | 90.6 | 89.3 | 32.9 | 48.0 | 22.1 | - | 56.6 | 92.7 |
Ours | 92.0 | 90.9 | 39.2 | 52.0 | 34.7 | - | 61.8 | 93.9 |
Network | Road | Road Mark | Natural | Utility Line | Building | Pole | Car | Fence | mIoU | OA |
---|---|---|---|---|---|---|---|---|---|---|
PointNet++ [20] | 91.4 | 7.6 | 89.8 | 68.6 | 74.0 | 59.5 | 54.0 | 7.5 | 56.6 | 91.2 |
DGCNN [43] | 90.6 | 0.4 | 81.3 | 47.1 | 64.0 | 53.9 | 49.3 | 7.3 | 49.6 | 89.0 |
KPConv [25] | 90.2 | 0.0 | 86.8 | 81.1 | 86.8 | 73.1 | 42.9 | 21.6 | 60.3 | 91.7 |
MS-PCNN [44] | 91.2 | 3.5 | 90.5 | 62.3 | 77.3 | 68.5 | 53.6 | 17.1 | 58.0 | 91.5 |
TGNet [45] | 91.4 | 10.6 | 91.0 | 68.3 | 76.9 | 66.3 | 54.1 | 8.2 | 58.3 | 91.6 |
RandLA [23] | 93.8 | 49.0 | 93.4 | 79.6 | 83.5 | 62.7 | 76.8 | 8.5 | 68.4 | 93.5 |
Ours | 96.1 | 59.9 | 94.4 | 85.8 | 85.4 | 77.0 | 83.7 | 17.7 | 75.0 | 95.9 |
Network | Ground | Building | Car | Vegetation | Pole | Bridge | mIoU | OA |
---|---|---|---|---|---|---|---|---|
RandLA | 85.6 | 84.3 | 48.6 | 63.6 | 11.5 | 1.4 | 49.2 | 87.7 |
RandLA + MS (decoder) | 88.5 | 88.5 | 47.6 | 59.0 | 23.6 | 7.9 | 52.5 | 91.2 |
RandLA + MS (ours) | 89.1 | 88.1 | 49.6 | 53.1 | 19.7 | 25.8 | 54.2 | 91.2 |
RandLA + MS + CA | 91.8 | 85.2 | 67.6 | 38.5 | 42.1 | 8.3 | 55.6 | 91.0 |
RandLA + MS + CA + ASC | 92.0 | 90.7 | 71.0 | 69.7 | 33.9 | 8.4 | 60.9 | 94.1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Geng, X.; Ji, S.; Lu, M.; Zhao, L. Multi-Scale Attentive Aggregation for LiDAR Point Cloud Segmentation. Remote Sens. 2021, 13, 691. https://doi.org/10.3390/rs13040691
Geng X, Ji S, Lu M, Zhao L. Multi-Scale Attentive Aggregation for LiDAR Point Cloud Segmentation. Remote Sensing. 2021; 13(4):691. https://doi.org/10.3390/rs13040691
Chicago/Turabian StyleGeng, Xiaoxiao, Shunping Ji, Meng Lu, and Lingli Zhao. 2021. "Multi-Scale Attentive Aggregation for LiDAR Point Cloud Segmentation" Remote Sensing 13, no. 4: 691. https://doi.org/10.3390/rs13040691
APA StyleGeng, X., Ji, S., Lu, M., & Zhao, L. (2021). Multi-Scale Attentive Aggregation for LiDAR Point Cloud Segmentation. Remote Sensing, 13(4), 691. https://doi.org/10.3390/rs13040691