Visual Localization and Target Perception Based on Panoptic Segmentation
Abstract
:1. Introduction
- A visual localization method with semantic consistency is proposed to improve the localization accuracy in a different time phase.
- A semantic optimization matching method is proposed, which can improve the matching accuracy effectively.
- Extensive experimental results are provided to demonstrate the performance of the proposed method. The experimental results demonstrate its superiority compared with state-of-the-art methods.
2. Related Work
3. Methodology
3.1. Initial Positioning and 2D Image Rendering
3.2. Segmentation-Based Matching Optimization
3.2.1. Panoptic Segmentation
3.2.2. SuperGlue Matching
3.2.3. Matching Optimization
3.3. RANSAC with Semantic Consistency
3.4. Target Perception
4. Experimental Details
4.1. Experimental Data
4.2. Evaluation Measures
4.3. RobotCar Season Dataset
4.4. Self-Collected Dataset
4.5. Target Perception
4.6. Timing Analysis
5. Discussion
5.1. Matching Optimization
5.2. RANSAC with Semantic Consistency
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, Y.; Zhang, Z.; Tong, X.; Ji, S.; Yu, Y.; Lai, G. Progress and challenges of geospatial artificial intelligence. Acta Geod. Et Cartogr. Sin. 2021, 50, 1137–1146. [Google Scholar]
- Yu, K.; Eck, U.; Pankratz, F.; Lazarovici, M.; Wilhelm, D.; Navab, N. Duplicated Reality for Co-located Augmented Reality Collaboration. IEEE Trans. Vis. Comput. Graph. 2022, 28, 2190–2200. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y. Multi-Sensor Fusion Tracking Algorithm Based on Augmented Reality System. IEEE Sens. J. 2021, 21, 25010–25017. [Google Scholar] [CrossRef]
- Herb, M.; Lemberger, M.; Schmitt, M.M.; Kurz, A.; Weiherer, T.; Navab, N.; Tombari, F. Semantic Image Alignment for Vehicle Localization. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 1124–1131. [Google Scholar]
- Guo, C.; Lin, M.; Guo, H.; Liang, P.; Cheng, E. Coarse-to-fine Semantic Localization with HD Map for Autonomous Driving in Structural Scenes. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 1146–1153. [Google Scholar]
- Kälin, U.; Staffa, L.; Grimm, D.E.; Wendt, A. Highly Accurate Pose Estimation as a Reference for Autonomous Vehicles in Near-Range Scenarios. Remote Sens. 2022, 14, 90. [Google Scholar] [CrossRef]
- Chen, Z.; Jacobson, A.; Sünderhauf, N.; Upcroft, B.; Liu, L.; Shen, C.; Reid, I.D.; Milford, M. Deep learning features at scale for visual place recognition. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017. [Google Scholar]
- Sünderhauf, N.; Shirazi, S.; Jacobson, A.; Dayoub, F.; Pepperell, E.; Upcroft, B.; Milford, M. Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free. In Proceedings of the Robotics: Science and Systems Conference, Rome, Italy, 13–17 July 2015. [Google Scholar]
- Sattler, T.; Torii, A.; Sivic, J.; Pollefeys, M.; Taira, H.; Okutomi, M.; Pajdla, T. Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 814–829. [Google Scholar]
- Arandjelović, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1437–1451. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Wan, W.; Di, K.; Chen, R.; Feng, X. A High-Accuracy Indoor-Positioning Method with Automated RGB-D Image Database Construction. Remote Sens. 2019, 11, 2572. [Google Scholar] [CrossRef]
- Wilson, D.; Alshaabi, T.; Van Oort, C.; Zhang, X.; Nelson, J.; Wshah, S. Object Tracking and Geo-Localization from Street Images. Remote Sens. 2022, 14, 2575. [Google Scholar] [CrossRef]
- Svärm, L.; Enqvist, O.; Kahl, F.; Oskarsson, M. City-Scale Localization for Cameras with Known Vertical Direction. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1455–1461. [Google Scholar] [CrossRef]
- Liu, L.; Li, H.; Dai, Y. Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2391–2400. [Google Scholar]
- Sattler, T.; Leibe, B.; Kobbelt, L. Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1744–1756. [Google Scholar]
- Germain, H.; Bourmaud, G.; Lepetit, V. Sparse-to-Dense Hypercolumn Matching for Long-Term Visual Localization. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 16–19 September 2019; pp. 513–523. [Google Scholar]
- Shi, T.; Shen, S.; Gao, X.; Zhu, L. Visual Localization Using Sparse Semantic 3D Map. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 315–319. [Google Scholar]
- Gridseth, M.; Barfoot, T.D. Keeping an Eye on Things: Deep Learned Features for Long-Term Visual Localization. IEEE Robot. Autom. Lett. 2022, 7, 1016–1023. [Google Scholar] [CrossRef]
- Spencer, J.; Bowden, R.; Hadfield, S. Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6458–6467. [Google Scholar]
- Georges, B.; Olivier, S.; Kevin, K.; Marc, P. Large Scale Visual Geo-Localization of Images in Mountainous Terrain. In The European Conference on Computer Vision; Springer: Berlin, Germany, 2012; pp. 517–530. [Google Scholar]
- Deng, C.; You, X.; Zhi, M. Evaluation on Geo-registration Accuracy of Outdoor Augmented Reality. J. Syst. Simul. 2020, 32, 1693–1704. [Google Scholar]
- He, X.; Pan, S.; Gao, W.; Lu, X. LiDAR-Inertial-GNSS Fusion Positioning System in Urban Environment: Local Accurate Registration and Global Drift-Free. Remote Sens. 2022, 14, 2104. [Google Scholar] [CrossRef]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8918–8927. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 337–33712. [Google Scholar]
- Carl, T.; Erik, S.; Lars, H.; Lucas, B.; Marc, P.; Torsten, S.; Fredrik, K. Semantic match consistency for long-term visual localization. In European Conference on Computer Vision; Springer: Berlin, Germany, 2018; pp. 391–408. [Google Scholar]
- Stenborg, E.; Toft, C.; Hammarstrand, L. Long-Term Visual Localization Using Semantically Segmented Images. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 6484–6490. [Google Scholar] [CrossRef]
- Sattler, T.; Maddern, W.; Toft, C.; Torii, A.; Hammarstrand, L.; Stenborg, E.; Safari, D.; Okutomi, M.; Pollefeys, M.; Sivic, J.; et al. Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8601–8610. [Google Scholar]
- Cheng, B.; Schwing, A.G.; Kirillov, A. Per-Pixel Classification is Not All You Need for Semantic Segmentation. Adv. Neural Inf. Processing Syst. 2021, 34, 17864–17875. [Google Scholar]
- Sarlin, P.-E.; Detone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching with Graph Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4937–4946. [Google Scholar]
- Toft, C.; Maddern, W.; Torii, A.; Hammarstrand, L.; Stenborg, E.; Safari, D.; Okutomi, M.; Pollefeys, M.; Sivic, J.; Pajdla, T.; et al. Long-Term Visual Localization Revisited. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2074–2088. [Google Scholar] [CrossRef]
- Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-Net: A Trainable CNN for Joint Description and Detection of Local Features. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8084–8093. [Google Scholar]
- Revaud, J.; De Souza, C.; Humenberger, M.; Weinzaepfel, P. R2d2: Reliable and repeatable detector and descriptor. Adv. Neural Inf. Process. Syst. 2019, 32, 12405–12415. [Google Scholar]
- Hu, H.; Wang, H.; Liu, Z.; Chen, W. Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization. IEEE/CAA J. Autom. Sin. 2022, 9, 313–328. [Google Scholar] [CrossRef]
- Shi, T.; Cui, H.; Song, Z.; Shen, S. Dense Semantic 3D Map Based Long-Term Visual Localization with Hybrid Features. arXiv 2020, arXiv:2005.10766. [Google Scholar]
- Fan, H.; Zhou, Y.; Li, A.; Gao, S.; Li, J.; Guo, Y. Visual Localization Using Semantic Segmentation and Depth Prediction. arXiv 2020, arXiv:2005.11922. [Google Scholar]
- Larsson, M.; Stenborg, E.; Toft, C.; Hammarstrand, L.; Sattler, T.; Kahl, F. Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 31–41. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- Zhai, R.; Yuan, Y. A Method of Vision Aided GNSS Positioning Using Semantic Information in Complex Urban Environment. Remote Sens. 2022, 14, 869. [Google Scholar] [CrossRef]
- Kirillov, A.; He, K.; Girshick, R.; Rother, C.; Girshick, P. Panoptic Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9396–9405.43. [Google Scholar]
- Cheng, B.; Collins, M.D.; Zhu, Y.; Liu, T.; Huang, T.S.; Adam, H.; Chen, L.C. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 12472–12482. [Google Scholar]
- Tian, Z.; Zhang, B.; Chen, H.; Shen, C. Instance and Panoptic Segmentation Using Conditional Convolutions. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 2022, 3145407. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Diaz-Zapata, M.; Erkent, Ö.; Laugier, C. YOLO-based Panoptic Segmentation Network. In Proceedings of the 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), Madrid, Spain, 12–16 July 2021; pp. 1230–1234. [Google Scholar]
- Mur-Artal, R.; Mur-Artal, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
Dawn | Dusk | Overcast Summer | Overcast Winter | Rain | Snow | Sun | Average | |
---|---|---|---|---|---|---|---|---|
m deg | 0.25/0.50/5.0 2/5/10 | 0.25/0.50/5.0 2/5/10 | 0.25/0.50/5.0 2/5/10 | 0.25/0.50/5.0 2/5/10 | 0.25/0.50/5.0 2/5/10 | 0.25/0.50/5.0 2/5/10 | 0.25/0.50/5.0 2/5/10 | 0.25/0.50/5.0 2/5/10 |
Image-retrieval based | ||||||||
DenseVLAND | 15.4/45.8/97.4 | 7.6/35.5/98.5 | 9.0/30.3/88.2 | 2.4/28.0/97.6 | 13.7/50.7/100 | 10.2/38.1/93.5 | 8.0/22.3/78.1 | 9.5/35.8/93.3 |
NetVLAD | 10.1/28.2/87.7 | 4.6/25.4/97.5 | 10.0/35.1/97.6 | 2.4/28.0/100 | 12.2/46.8/100 | 8.8/32.6/95.3 | 8.5/22.8/88.8 | 8.1/31.3/95.1 |
Structure-based methods | ||||||||
Active Search v1.1 | 53.7/94.7/100 | 72.1/94.9/100 | 40.3/90.0/100 | 43.9/98.2/100 | 78.5/93.7/100 | 63.7/97.2/100 | 50.0/76.3/98.2 | 57.4/92.1/99.7 |
CityScaleLocalization | 54.2/89.4/96.9 | 75.1/95.4/100 | 37.4/82.9/91.5 | 48.2/96.3/100 | 73.7/94.6/100 | 61.4/94.9/97.2 | 33.9/52.7/71.0 | 54.8/86.6/93.8 |
Hierarchical methods | ||||||||
Hierarchical-Localization | 60.4/91.6/99.6 | 70.1/95.9/100 | 52.1/93.4/99.5 | 54.3/98.8/100 | 77.1/93.7/100 | 69.3/97.7/100 | 60.3/82.6/96.0 | 63.4/93.4/99.3 |
DenseVLAD & D2-Net | 56.4/93.8/99.6 | 77.2/95.9/100 | 43.6/91.0/98.1 | 53.7/96.3/100 | 77.1/94.6/100 | 64.7/95.8/99.1 | 56.2/79.0/92.4 | 62.3/92.3/95.5 |
Methods with semantic | ||||||||
VLU Sparse Semantic 3D Map | 58.1/93.8/99.1 | 76.6/95.4/100 | 47.4/94.3/100 | 51.2/98.8/100 | 78.5/94.6/100 | 65.1/97.7/100 | 55.4/83.0/99.1 | 61.8/93.9/99.7 |
Semantic Match Consistency | 56.4/94.7/100 | 72.6/94.9/100 | 44.5/93.8/100 | 47.6/95.7/100 | 78.0/94.6/100 | 60.5/97.7/100 | 52.2/80.8/100 | 58.8/93.2/100 |
Semantic 3D Map & Hybrid Features | 59.1/83.2/95.9 | 55.1/80.3/95.4 | 43.8/79.3/99.8 | 47.9/83.1/96.7 | 63.2/84.1/98.6 | 61.6/85.9/96.3 | 51.5/77.6/95.7 | 47.2/81.9/96.9 |
Ours | 63.5/94.9/100 | 79.9/95.9/100 | 52.7/95.4/99.8 | 79.1/98.1/100 | 81.1/95.7/100 | 67.9/97.9/100 | 62.7/83.3/99.0 | 69.6/94.4/99.8 |
Stage | Panoptic Segmentation Superglue Matching | Matching Point Filtering | RANSAC with Semantic Consistency | Pose Estimation Target Perception |
---|---|---|---|---|
Average Time | 0.47 s | 0.94 s | 3.49 s | 0.17 s |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lv, K.; Zhang, Y.; Yu, Y.; Zhang, Z.; Li, L. Visual Localization and Target Perception Based on Panoptic Segmentation. Remote Sens. 2022, 14, 3983. https://doi.org/10.3390/rs14163983
Lv K, Zhang Y, Yu Y, Zhang Z, Li L. Visual Localization and Target Perception Based on Panoptic Segmentation. Remote Sensing. 2022; 14(16):3983. https://doi.org/10.3390/rs14163983
Chicago/Turabian StyleLv, Kefeng, Yongsheng Zhang, Ying Yu, Zhenchao Zhang, and Lei Li. 2022. "Visual Localization and Target Perception Based on Panoptic Segmentation" Remote Sensing 14, no. 16: 3983. https://doi.org/10.3390/rs14163983
APA StyleLv, K., Zhang, Y., Yu, Y., Zhang, Z., & Li, L. (2022). Visual Localization and Target Perception Based on Panoptic Segmentation. Remote Sensing, 14(16), 3983. https://doi.org/10.3390/rs14163983