Unified Depth-Guided Feature Fusion and Reranking for Hierarchical Place Recognition
Abstract
1. Introduction
2. Related Works
2.1. Visual Place Recognition
2.2. Graph Matching
3. Method
3.1. Acquisition of Depth Images
3.2. Acquisition of Global Features
3.3. Discrete Wavelet Transform Fusion (DWTF) Module
3.4. Spiking Neuron Graph Matching (SNGM) Module
4. Experiment
4.1. Experimental Preparation
4.1.1. Datasets
4.1.2. Compared Methods
4.1.3. Evaluation Metrics
4.1.4. Implementation Details
4.2. Comparisons with State-of-the-Art Methods
4.2.1. VPR Benchmarks
4.2.2. Latency and Memory
4.2.3. Loop Closure Detection Performance
4.2.4. Ablation Studies
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lowry, S.; Sünderhauf, N.; Newman, P.; Leonard, J.J.; Cox, D.; Corke, P.; Milford, M.J. Visual place recognition: A survey. IEEE Trans. Robot. 2015, 32, 1–19. [Google Scholar] [CrossRef]
- Zaffar, M.; Garg, S.; Milford, M.; Kooij, J.; Flynn, D.; McDonald-Maier, K.; Ehsan, S. Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change. Int. J. Comput. Vis. 2021, 129, 2136–2174. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, L.; Su, Y. Visual place recognition: A survey from deep learning perspective. Pattern Recognit. 2021, 113, 107760. [Google Scholar] [CrossRef]
- Masone, C.; Caputo, B. A survey on deep visual place recognition. IEEE Access 2021, 9, 19516–19547. [Google Scholar] [CrossRef]
- Schubert, S.; Neubert, P. What makes visual place recognition easy or hard? arXiv 2021, arXiv:2106.12671. [Google Scholar]
- Garg, S.; Fischer, T.; Milford, M. Where is your place, visual place recognition? arXiv 2021, arXiv:2103.06443. [Google Scholar]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
- Jin Kim, H.; Dunn, E.; Frahm, J.M. Learned contextual feature reweighting for image geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2136–2145. [Google Scholar]
- Ge, Y.; Wang, H.; Zhu, F.; Zhao, R.; Li, H. Self-supervising fine-grained region similarities for large-scale image localization. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 369–386. [Google Scholar]
- Peng, G.; Zhang, J.; Li, H.; Wang, D. Attentional pyramid pooling of salient visual residuals for place recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 885–894. [Google Scholar]
- Peng, G.; Yue, Y.; Zhang, J.; Wu, Z.; Tang, X.; Wang, D. Semantic reinforced attention learning for visual place recognition. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13415–13422. [Google Scholar]
- Ali-Bey, A.; Chaib-Draa, B.; Giguere, P. Mixvpr: Feature mixing for visual place recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 2998–3007. [Google Scholar]
- Ali-bey, A.; Chaib-draa, B.; Giguère, P. Gsv-cities: Toward appropriate supervised visual place recognition. Neurocomputing 2022, 513, 194–203. [Google Scholar] [CrossRef]
- Ali-Bey, A.; Chaib-draa, B.; Giguère, P. BoQ: A place is worth a bag of learnable queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17794–17803. [Google Scholar]
- Hausler, S.; Garg, S.; Xu, M.; Milford, M.; Fischer, T. Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14141–14152. [Google Scholar]
- Wang, R.; Shen, Y.; Zuo, W.; Zhou, S.; Zheng, N. TransVPR: Transformer-based place recognition with multi-level attention aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13648–13657. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Zhu, S.; Yang, L.; Chen, C.; Shah, M.; Shen, X.; Wang, H. R2former: Unified retrieval and reranking transformer for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–27 June 2023; pp. 19370–19380. [Google Scholar]
- Lu, F.; Zhang, L.; Lan, X.; Dong, S.; Wang, Y.; Yuan, C. Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Lu, F.; Dong, S.; Zhang, L.; Liu, B.; Lan, X.; Jiang, D.; Yuan, C. Deep homography estimation for visual place recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 10341–10349. [Google Scholar]
- Dharmasiri, T.; Spek, A.; Drummond, T. Eng: End-to-end neural geometry for robust depth and pose estimation using cnns. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 625–642. [Google Scholar]
- Sun, L.C.; Bhatt, N.P.; Liu, J.C.; Fan, Z.; Wang, Z.; Humphreys, T.E.; Topcu, U. Mm3dgs slam: Multi-modal 3d gaussian splatting for slam using vision, depth, and inertial measurements. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; IEEE: Piscataway, NY, USA, 2024; pp. 10159–10166. [Google Scholar]
- Yuan, J.; Zhu, S.; Tang, K.; Sun, Q. ORB-TEDM: An RGB-D SLAM approach fusing ORB triangulation estimates and depth measurements. IEEE Trans. Instrum. Meas. 2022, 71, 1–15. [Google Scholar] [CrossRef]
- Piasco, N.; Sidibé, D.; Gouet-Brunet, V.; Demonceaux, C. Learning scene geometry for visual localization in challenging conditions. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NY, USA, 2019; pp. 9094–9100. [Google Scholar]
- Piasco, N.; Sidibé, D.; Gouet-Brunet, V.; Demonceaux, C. Improving image description with auxiliary modality for visual localization in challenging conditions. Int. J. Comput. Vis. 2021, 129, 185–202. [Google Scholar] [CrossRef]
- Garg, S.; Babu, M.; Dharmasiri, T.; Hausler, S.; Suenderhauf, N.; Kumar, S.; Drummond, T.; Milford, M. Look no deeper: Recognizing places from opposing viewpoints under varying scene appearance using single-view depth estimation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NY, USA, 2019; pp. 4916–4923. [Google Scholar]
- Hu, H.; Qiao, Z.; Cheng, M.; Liu, Z.; Wang, H. Dasgil: Domain adaptation for semantic and geometric-aware image-based localization. IEEE Trans. Image Process. 2020, 30, 1342–1353. [Google Scholar] [CrossRef]
- Khaliq, A.; Xu, M.; Hausler, S.; Milford, M.; Garg, S. VLAD-BuFF: Burst-aware fast feature aggregation for visual place recognition. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 447–466. [Google Scholar]
- Berton, G.; Masone, C.; Caputo, B. Rethinking visual geo-localization for large-scale applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4878–4888. [Google Scholar]
- Ali-Bey, A.; Chaib-draa, B.; Giguère, P. Global proxy-based hard mining for visual place recognition. arXiv 2023, arXiv:2302.14217. [Google Scholar]
- Berton, G.; Trivigno, G.; Caputo, B.; Masone, C. Eigenplaces: Training viewpoint robust models for visual place recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 11080–11090. [Google Scholar]
- Leyva-Vallina, M.; Strisciuglio, N.; Petkov, N. Data-efficient large scale place recognition with graded similarity supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 23487–23496. [Google Scholar]
- Izquierdo, S.; Civera, J. Optimal transport aggregation for visual place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17658–17668. [Google Scholar]
- Lu, F.; Lan, X.; Zhang, L.; Jiang, D.; Wang, Y.; Yuan, C. Cricavpr: Cross-image correlation-aware representation learning for visual place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16772–16782. [Google Scholar]
- Cao, B.; Araujo, A.; Sim, J. Unifying deep local and global features for image search. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XX 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 726–743. [Google Scholar]
- Berton, G.; Masone, C.; Paolicelli, V.; Caputo, B. Viewpoint invariant dense matching for visual geolocalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12169–12178. [Google Scholar]
- Hausler, S.; Moghadam, P. Pair-vpr: Place-aware pre-training and contrastive pair classification for visual place recognition with vision transformers. IEEE Robot. Autom. Lett. 2025, 10, 4013–4020. [Google Scholar] [CrossRef]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Wang, R.; Yan, J.; Yang, X. Learning combinatorial embedding networks for deep graph matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3056–3065. [Google Scholar]
- Yu, T.; Wang, R.; Yan, J.; Li, B. Learning deep graph matching with channel-independent embedding and hungarian attention. In Proceedings of the International Conference on Learning Representations 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Babai, L. Group, graphs, algorithms: The graph isomorphism problem. In Proceedings of the International Congress of Mathematicians: Rio de Janeiro 2018, Rio de Janeiro, Brazil, 1–9 August 2018; World Scientific: Singapore, 2018; pp. 3319–3336. [Google Scholar]
- Yan, J.; Cho, M.; Zha, H.; Yang, X.; Chu, S.M. Multi-graph matching via affinity optimization with graduated consistency regularization. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 1228–1242. [Google Scholar] [CrossRef]
- Yan, J.; Wang, J.; Zha, H.; Yang, X.; Chu, S. Consistency-driven alternating optimization for multigraph matching: A unified approach. IEEE Trans. Image Process. 2015, 24, 994–1009. [Google Scholar] [CrossRef] [PubMed]
- He, J.; Huang, Z.; Wang, N.; Zhang, Z. Learnable Graph Matching: A Practical Paradigm for Data Association. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4880–4895. [Google Scholar] [CrossRef]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar]
- Mena, G.; Belanger, D.; Linderman, S.; Snoek, J. Learning latent permutations with gumbel-sinkhorn networks. arXiv 2018, arXiv:1802.08665. [Google Scholar]
- Lin, Y.; Yang, M.; Yu, J.; Hu, P.; Zhang, C.; Peng, X. Graph matching with bi-level noisy correspondence. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 23362–23371. [Google Scholar]
- Wang, R.; Guo, Z.; Jiang, S.; Yang, X.; Yan, J. Deep learning of partial graph matching via differentiable top-k. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6272–6281. [Google Scholar]
- Gao, P.; Zhang, H. Long-term place recognition through worst-case graph matching to integrate landmark appearances and spatial relationships. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NY, USA, 2020; pp. 1070–1076. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Firman, M.; Brostow, G.J. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3828–3838. [Google Scholar]
- Zhang, N.; Nex, F.; Vosselman, G.; Kerle, N. Lite-mono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18537–18546. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- He, H.; Zhang, J.; Cai, Y.; Chen, H.; Hu, X.; Gan, Z.; Wang, Y.; Wang, C.; Wu, Y.; Xie, L. Mobilemamba: Lightweight multi-receptive visual mamba network. arXiv 2024, arXiv:2411.15941. [Google Scholar]
- Yin, D.; Hu, L.; Li, B.; Zhang, Y.; Yang, X. 5% > 100%: Breaking performance shackles of full fine-tuning on visual recognition tasks. arXiv 2024, arXiv:2408.08345. [Google Scholar]
- Duan, Y.; Liu, F.; Jiao, L.; Zhao, P.; Zhang, L. SAR image segmentation based on convolutional-wavelet neural network and Markov random field. Pattern Recognit. 2017, 64, 255–267. [Google Scholar] [CrossRef]
- Li, Z.; Kuang, Z.S.; Zhu, Z.L.; Wang, H.P.; Shao, X.L. Wavelet-based texture reformation network for image super-resolution. IEEE Trans. Image Process. 2022, 31, 2647–2660. [Google Scholar] [CrossRef] [PubMed]
- Pu, T.; Ni, G. Contrast-based image fusion using the discrete wavelet transform. Opt. Eng. 2000, 39, 2075–2082. [Google Scholar] [CrossRef]
- Lai, H.; Yin, P.; Scherer, S. Adafusion: Visual-lidar fusion with adaptive weights for place recognition. IEEE Robot. Autom. Lett. 2022, 7, 12038–12045. [Google Scholar] [CrossRef]
- Zhang, Y.; Wu, C.; Zhang, T.; Zheng, Y. Full-scale feature aggregation and grouping feature reconstruction based uav image target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–11. [Google Scholar] [CrossRef]
- Phung, H.; Dao, Q.; Tran, A. Wavelet diffusion models are fast and scalable image generators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–27 June 2023; pp. 10199–10208. [Google Scholar]
- Shi, X.; Hao, Z.; Yu, Z. SpikingResformer: Bridging ResNet and vision transformer in spiking neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5610–5619. [Google Scholar]
- Warburg, F.; Hauberg, S.; Lopez-Antequera, M.; Gargallo, P.; Kuang, Y.; Civera, J. Mapillary street-level sequences: A dataset for lifelong place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2626–2635. [Google Scholar]
- Torii, A.; Arandjelovic, R.; Sivic, J.; Okutomi, M.; Pajdla, T. 24/7 place recognition by view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1808–1817. [Google Scholar]
- Torii, A.; Sivic, J.; Pajdla, T.; Okutomi, M. Visual place recognition with repetitive structures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 883–890. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 224–236. [Google Scholar]
- Zhong, H.; Chen, Z.; Qin, C.; Huang, Z.; Zheng, V.W.; Xu, T.; Chen, E. Adam revisited: A weighted past gradients perspective. Front. Comput. Sci. 2020, 14, 145309. [Google Scholar] [CrossRef]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Smith, M.; Baldwin, I.; Churchill, W.; Paul, R.; Newman, P. The new college vision and laser data set. Int. J. Robot. Res. 2009, 28, 595–599. [Google Scholar] [CrossRef]
- Cummins, M.; Newman, P. FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 2008, 27, 647–665. [Google Scholar] [CrossRef]
- Ning, J.; Zhang, Y.; Zhao, X.; Coleman, S.; Li, K.; Kerr, D. Samloc: Structure-aware constraints with multi-task distillation for long-term visual localization. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NY, USA, 2023; pp. 11719–11725. [Google Scholar]
- Li, K.; Zhang, Y.; Ning, J.; Zhao, X.; Wang, G.; Liu, W. Neighborhood Consensus Guided Matching Based Place Recognition with Spatial-Channel Embedding. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; IEEE: Piscataway, NY, USA, 2024; pp. 3291–3296. [Google Scholar]
Dataset | Environment | Variation | Number | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Urban | Suburban | Natural | Viewpoint | Day/Night | Weather | Seasonal | Dynamic | Database | Queries | |
MSLS val [62] | ✔ | ✔ | ✔ | + | + | + | + | + | 19k | 740 |
MSLS Challenge [62] | ✔ | ✔ | ✔ | + | + | + | + | + | 39k | 27,092 |
Pitts30k [7] | ✔ | + | − | − | − | + | 10k | 6816 | ||
Tokyo 24/7 [63] | ✔ | + | + | − | − | + | 76k | 315 |
Method | MSLS val | MSLS Challenge | Pitts30k Test | Tokyo 24/7 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |
NetVLAD [7] | 53.1 | 66.5 | 71.1 | 35.1 | 47.4 | 51.7 | 81.9 | 91.2 | 93.7 | 64.4 | 78.4 | 81.6 |
SFRS [9] | 69.2 | 80.3 | 83.1 | 41.5 | 52.0 | 56.3 | 89.4 | 94.7 | 95.9 | 85.4 | 91.1 | 93.3 |
GCL [32] | 80.9 | 90.7 | 92.6 | 62.3 | 76.2 | 81.1 | 79.2 | 90.4 | 93.2 | 58.1 | 74.3 | 78.1 |
SelaVPR (global) [19] | 87.7 | 95.8 | 96.6 | 69.6 | 86.9 | 90.1 | 90.2 | 96.1 | 97.1 | 81.9 | 94.9 | 96.5 |
Ours (global) | 90.2 | 96.2 | 96.8 | 73.2 | 87.3 | 90.4 | 92.7 | 95.4 | 96.8 | 87.5 | 95.3 | 96.2 |
SP-SuperGlue [45,65] | 78.1 | 81.9 | 84.3 | 50.6 | 56.9 | 58.3 | 87.2 | 94.8 | 96.4 | 88.2 | 90.2 | 90.2 |
Patch-NetVLAD [15] | 79.5 | 86.2 | 87.7 | 48.1 | 57.6 | 60.5 | 88.7 | 94.5 | 95.9 | 86.0 | 88.6 | 90.5 |
TransVPR [16] | 86.8 | 91.2 | 92.4 | 63.9 | 74.0 | 77.5 | 89.0 | 94.9 | 96.2 | - | - | - |
[18] | 89.7 | 95.0 | 96.2 | 73.0 | 85.9 | 88.8 | 91.1 | 95.2 | 96.3 | 88.6 | 91.4 | 91.7 |
SelaVPR [19] | 90.8 | 96.4 | 97.2 | 73.5 | 87.5 | 90.6 | 92.8 | 96.8 | 97.7 | 94.0 | 96.8 | 97.5 |
Ours | 93.8 | 97.3 | 97.5 | 80.9 | 89.3 | 91.3 | 94.3 | 97.2 | 98.1 | 97.8 | 98.3 | 99.1 |
Method | Extraction Latency (ms) | Matching Latency (s) | Memory (MB) |
---|---|---|---|
SP-SuperGlue [45,65] | 163 | 7.6 | 1.93 |
Patch-NetVLAD [15] | 1330 | 7.5 | 44.14 |
TransVPR [16] | 45 | 3.2 | 1.17 |
[18] | 27.4 | 0.59 | 0.27 |
SelaVPR [19] | 36.2 | 0.15 | 0.52 |
Ours | 19.3 | 0.09 | 0.29 |
Method | New College | City Centre | ||
---|---|---|---|---|
Recall (%) | Time (ms) | Recall (%) | Time (ms) | |
SP-SuperGlue [45,65] | 85.2 | 2367.2 | 79.1 | 2354.8 |
Patch-NetVLAD [15] | 93.4 | 2819.4 | 86.3 | 2823.9 |
[18] | 91.5 | 193.5 | 85.4 | 186.4 |
SelaVPR [19] | 94.7 | 45.6 | 89.5 | 44.5 |
Ours | 98.1 | 17.3 | 95.0 | 15.2 |
RGB | Depth | MM-MB | DWTF | SNGM | MSLS val | ||
---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | |||||
✔ | × | × | × | × | 64.2 | 69.4 | 75.1 |
✔ | ✔ | × | × | × | 73.4 | 77.7 | 79.5 |
✔ | ✔ | ✔ | × | × | 84.3 | 87.5 | 89.9 |
✔ | ✔ | ✔ | ✔ | × | 90.2 | 96.2 | 96.8 |
✔ | ✔ | ✔ | ✔ | ✔ | 93.8 | 97.3 | 97.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, K.; Ou, Y.; Ning, J.; Kong, F.; Cai, H.; Li, H. Unified Depth-Guided Feature Fusion and Reranking for Hierarchical Place Recognition. Sensors 2025, 25, 4056. https://doi.org/10.3390/s25134056
Li K, Ou Y, Ning J, Kong F, Cai H, Li H. Unified Depth-Guided Feature Fusion and Reranking for Hierarchical Place Recognition. Sensors. 2025; 25(13):4056. https://doi.org/10.3390/s25134056
Chicago/Turabian StyleLi, Kunmo, Yongsheng Ou, Jian Ning, Fanchang Kong, Haiyang Cai, and Haoyang Li. 2025. "Unified Depth-Guided Feature Fusion and Reranking for Hierarchical Place Recognition" Sensors 25, no. 13: 4056. https://doi.org/10.3390/s25134056
APA StyleLi, K., Ou, Y., Ning, J., Kong, F., Cai, H., & Li, H. (2025). Unified Depth-Guided Feature Fusion and Reranking for Hierarchical Place Recognition. Sensors, 25(13), 4056. https://doi.org/10.3390/s25134056