Cross-Detector Visual Localization with Coplanarity Constraints for Indoor Environments
Abstract
1. Introduction
- Formalizing the novel problem of cross-detector visual localization and analyzing its key challenge, the cross-detector error.
- Proposing CoplaMatch, the first approach to solve cross-detector VL by leveraging geometric coplanarity constraints over descriptor similarity.
- Providing an open-source implementation of our method, publicly available at https://github.com/EricssonResearch/copla-match (accessed on 11 November 2025).
2. Related Work
2.1. Local Keypoint Detectors
2.2. Local Feature Matching
2.3. Coplanarity Constraints in Computer Vision Tasks
3. Analyzing Keypoint Cross-Detection in Visual Localization
3.1. Cross-Detector Feature Matching
3.2. Cross-Detector Error
4. CoplaMatch: Coplanarity-Constrained Feature Matching
| Algorithm 1: CoplaMatch feature matching |
![]() |
- Rotation Verification. The estimated homography must be orientation-preserving; that is, the keypoints of any must keep the spatial ordering of their counterparts in . Otherwise, these CFGs belong to different physical planes. This property is verified through the determinant of the rotational submatrix of the estimated homography H ( upper-left submatrix):
- Translation Verification. The decomposition of a homography, H [35], gives an up-to-scale translation vector, t, between the plane in the map and the camera image plane. Yet, this scale can be approximated from the depth information contained in the reference map, therefore obtaining an estimation of t. This step is critical for filtering degenerate homographies caused by the spatial noise inherent to cross-detector keypoints. Since such noise often results in ill-conditioned homographies that yield physically implausible (over-scaled) translations, we set a threshold, , for the maximum acceptable translation.
5. Experimental Validation
5.1. Experimental Setup
5.2. Cross-Detector Error in CoplaMatch Correspondences
5.3. Analyzing the Computational Cost of CoplaMatch
5.4. Visual Localization Experiments
6. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Piasco, N.; Sidibé, D.; Demonceaux, C.; Gouet-Brunet, V. A survey on Visual-Based Localization: On the benefit of heterogeneous data. Pattern Recognit. 2018, 74, 90–109. [Google Scholar] [CrossRef]
- Edstedt, J.; Bökman, G.; Wadenbäck, M.; Felsberg, M. DeDoDe: Detect, don’t describe—Describe, don’t detect for local feature matching. In Proceedings of the 2024 International Conference on 3D Vision (3DV), Davos, Switzerland, 18–21 March 2024; IEEE: New York, NY, USA, 2024; pp. 148–157. [Google Scholar]
- Dusmanu, M.; Miksik, O.; Schönberger, J.L.; Pollefeys, M. Cross-Descriptor Visual Localization and Mapping. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 6038–6047. [Google Scholar] [CrossRef]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching with Graph Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4937–4946. [Google Scholar] [CrossRef]
- Tan, X.; Sun, C.; Sirault, X.; Furbank, R.; Pham, T.D. Feature matching in stereo images encouraging uniform spatial distribution. Pattern Recognit. 2015, 48, 2530–2542. [Google Scholar] [CrossRef]
- Kabalar, J.; Wu, S.C.; Wald, J.; Tateno, K.; Navab, N.; Tombari, F. Towards long-term retrieval-based visual localization in indoor environments with changes. IEEE Robot. Autom. Lett. 2023, 8, 1975–1982. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Rosten, E.; Drummond, T. Machine Learning for High-Speed Corner Detection. In Proceedings of the Computer Vision—ECCV 2006, Graz, Austria, 7–13 May 2006; pp. 430–443. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
- Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary Robust invariant scalable keypoints. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2548–2555. [Google Scholar] [CrossRef]
- Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. LIFT: Learned Invariant Feature Transform. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 467–483. [Google Scholar] [CrossRef]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Toward Geometric Deep SLAM. arXiv 2017, arXiv:1707.07410. [Google Scholar] [CrossRef]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake, UT, USA, 18–22 June 2018; pp. 337–33712. [Google Scholar] [CrossRef]
- Čech, J.; Matas, J.; Perdoch, M. Efficient Sequential Correspondence Selection by Cosegmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1568–1581. [Google Scholar] [CrossRef] [PubMed]
- Bian, J.; Lin, W.Y.; Matsushita, Y.; Yeung, S.K.; Nguyen, T.D.; Cheng, M.M. GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2828–2837. [Google Scholar] [CrossRef]
- Lindenberger, P.; Sarlin, P.E.; Pollefeys, M. LightGlue: Local Feature Matching at Light Speed. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 17581–17592. [Google Scholar] [CrossRef]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8918–8927. [Google Scholar] [CrossRef]
- Edstedt, J.; Sun, Q.; Bökman, G.; Wadenbäck, M.; Felsberg, M. RoMa: Robust dense feature matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 19790–19800. [Google Scholar]
- Zhou, Q.; Agostinho, S.; Ošep, A.; Leal-Taixé, L. Is Geometry Enough for Matching in Visual Localization? In Proceedings of the Computer Vision—ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; pp. 407–425. [Google Scholar] [CrossRef]
- Taira, H.; Okutomi, M.; Sattler, T.; Cimpoi, M.; Pollefeys, M.; Sivic, J.; Pajdla, T.; Torii, A. InLoc: Indoor visual localization with dense matching and view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7199–7209. [Google Scholar]
- Shi, Y.; Xu, K.; Nießner, M.; Rusinkiewicz, S.; Funkhouser, T. PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 767–784. [Google Scholar] [CrossRef]
- Matez-Bandera, J.L.; Monroy, J.; Gonzalez-Jimenez, J. Sigma-FP: Robot Mapping of 3D Floor Plans with an RGB-D Camera Under Uncertainty. IEEE Robot. Autom. Lett. 2022, 7, 12539–12546. [Google Scholar] [CrossRef]
- Arndt, C.; Sabzevari, R.; Civera, J. Do Planar Constraints Improve Camera Pose Estimation in Monocular SLAM? In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 2–6 October 2023; pp. 2213–2222. [Google Scholar] [CrossRef]
- Kaess, M. Simultaneous localization and mapping with infinite planes. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 4605–4611. [Google Scholar] [CrossRef]
- Hou, Z.; Ding, Y.; Wang, Y.; Yang, H.; Kong, H. Visual Odometry for Indoor Mobile Robot by Recognizing Local Manhattan Structures. In Proceedings of the Computer Vision—ACCV 2018, Perth, Australia, 2–6 December 2019; pp. 168–182. [Google Scholar] [CrossRef]
- Frohlich, R.; Tamas, L.; Kato, Z. Absolute Pose Estimation of Central Cameras Using Planar Regions. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 377–391. [Google Scholar] [CrossRef] [PubMed]
- Reyes-Aviles, F.; Fleck, P.; Schmalstieg, D.; Arth, C. Bag of World Anchors for Instant Large-Scale Localization. IEEE Trans. Vis. Comput. Graph. 2023, 29, 4730–4739. [Google Scholar] [CrossRef] [PubMed]
- Chuan, Z.; Long, T.D.; Feng, Z.; Li, D.Z. A planar homography estimation method for camera calibration. In Proceedings of the Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694), Kobe, Japan, 16–20 July 2003; Volume 1, pp. 424–429. [Google Scholar] [CrossRef]
- Knorr, M.; Niehsen, W.; Stiller, C. Online extrinsic multi-camera calibration using ground plane induced homographies. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), Gold Coast, Australia, 23–26 June 2013; pp. 236–241. [Google Scholar] [CrossRef]
- Zhao, Y.; Huang, X.; Zhang, Z. Deep Lucas-Kanade Homography for Multimodal Image Alignment. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15945–15954. [Google Scholar] [CrossRef]
- Hong, M.; Lu, Y.; Ye, N.; Lin, C.; Zhao, Q.; Liu, S. Unsupervised Homography Estimation with Coplanarity-Aware GAN. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17642–17651. [Google Scholar] [CrossRef]
- Freund, R.; Wilson, W.; Sa, P. Regression Analysis; Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
- Liu, C.; Kim, K.; Gu, J.; Furukawa, Y.; Kautz, J. PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 15–20 June 2019; pp. 4445–4454. [Google Scholar] [CrossRef]
- Hughes, N.; Chang, Y.; Carlone, L. Hydra: A Real-time Spatial Perception System for 3D Scene Graph Construction and Optimization. arXiv 2022, arXiv:2201.13360. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]
- Cuturi, M. Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances. arXiv 2013, arXiv:1306.0895. [Google Scholar] [CrossRef]
- Crouse, D.F. On implementing 2D rectangular assignment algorithms. IEEE Trans. Aerosp. Electron. Syst. 2016, 52, 1679–1696. [Google Scholar] [CrossRef]
- Sarlin, P.; Cadena, C.; Siegwart, R.; Dymczyk, M. From Coarse to Fine: Robust Hierarchical Localization at Large Scale. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 15–20 June 2019; pp. 12708–12717. [Google Scholar] [CrossRef]
- Schönberger, J.L.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar] [CrossRef]
- Arandjelović, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1437–1451. [Google Scholar] [CrossRef]
- Xie, Y.; Shu, F.; Rambach, J.; Pagani, A.; Stricker, D. PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image. arXiv 2022, arXiv:2110.11219. [Google Scholar] [CrossRef]
- Alqobali, R.; Alshmrani, M.; Alnasser, R.; Rashidi, A.; Alhmiedat, T.; Alia, O.M. A survey on robot semantic navigation systems for indoor environments. Appl. Sci. 2023, 14, 89. [Google Scholar] [CrossRef]
- Alqobali, R.; Alnasser, R.; Rashidi, A.; Alshmrani, M.; Alhmiedat, T. A Real-Time Semantic Map Production System for Indoor Robot Navigation. Sensors 2024, 24, 6691. [Google Scholar] [CrossRef] [PubMed]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar] [CrossRef]
- Pan, X.; Charron, N.; Yang, Y.; Peters, S.; Whelan, T.; Kong, C.; Parkhi, O.; Newcombe, R.; Ren, Y. Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 20076–20086. [Google Scholar] [CrossRef]
- Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary Robust Independent Elementary Features. In Proceedings of the Computer Vision—ECCV 2010, Crete, Greece, 5–11 September 2010; pp. 778–792. [Google Scholar] [CrossRef]
- Hazem, Z.B.; Saidi, F.; Guler, N.; Altaif, A.H. Reinforcement learning-based intelligent trajectory tracking for a 5-DOF Mitsubishi robotic arm: Comparative evaluation of DDPG, LC-DDPG, and TD3-ADX. Int. J. Intell. Robot. Appl. 2025, 9, 1982–2002. [Google Scholar] [CrossRef]








| Method | Average Time (ms) |
|---|---|
| PlaneRecNet (RGB-based) | |
| Sigma-FP (depth-based) |
| Case | Method | Map Detector | Query Detector | % localized Queries | Med. Match. per Query | ||
|---|---|---|---|---|---|---|---|
| 0.1 m, 1° | 0.25 m, 2° | 0.5 m, 5° | |||||
| Standard | Descriptor similarity | SIFT | SIFT | 98.28 | 100.00 | 100.00 | 345 |
| ORB | ORB | 94.83 | 100.00 | 100.00 | 330 | ||
| CoplaMatch | SIFT | SIFT | 98.28 | 100.00 | 100.00 | 324 | |
| ORB | ORB | 87.93 | 100.00 | 100.00 | 547 | ||
| Cross-Detector (Common Descriptor: BRIEF) | Descriptor similarity | SIFT | SIFT | 63.79 | 82.76 | 93.10 | 33 |
| BRISK | 53.45 | 75.86 | 91.38 | 36 | |||
| FAST | 25.86 | 39.66 | 72.41 | 15 | |||
| ORB | 39.66 | 65.52 | 87.93 | 56 | |||
| ORB | SIFT | 0.00 | 10.34 | 13.79 | 11 | ||
| BRISK | 6.90 | 18.97 | 27.59 | 15 | |||
| FAST | 5.17 | 12.07 | 17.24 | 7 | |||
| ORB | 15.52 | 24.14 | 29.31 | 25 | |||
| CoplaMatch | SIFT | SIFT | 84.48 | 96.55 | 100.00 | 126 | |
| BRISK | 81.03 | 98.28 | 100.00 | 130 | |||
| FAST | 56.90 | 79.31 | 98.28 | 88 | |||
| ORB | 58.62 | 89.66 | 100.00 | 135 | |||
| ORB | SIFT | 34.48 | 62.07 | 77.59 | 77 | ||
| BRISK | 55.17 | 77.59 | 87.93 | 105 | |||
| FAST | 50.00 | 68.97 | 74.14 | 80 | |||
| ORB | 51.72 | 68.97 | 86.21 | 139 | |||
| Common Descriptor | Query Configuration | Feature Matching Method | TUM RGB-D fr3/Structure_Texture | ARIA Digital Twin Apartment_Release | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| % Localized Queries |
Median Correct
Matches/Query | % Localized Queries |
Median Correct
Matches/Query | |||||||
| 0.1 m, 1° | 0.25 m, 2° | 0.5 m, 5° | 0.1 m, 1° | 0.25 m, 2° | 0.5 m, 5° | |||||
| BRIEF | Same Detector | Descriptor similarity | 53.45 | 65.52 | 70.69 | 45 | 30.00 | 53.33 | 76.67 | 64 |
| SuperGlue | – | – | – | – | – | – | – | – | ||
| CoplaMatch | 89.66 | 96.55 | 100.00 | 163 | 26.67 | 63.33 | 78.33 | 82 | ||
| Cross-Detector | Descriptor similarity | 33.19 | 55.17 | 65.08 | 49 | 17.92 | 39.17 | 57.09 | 42 | |
| SuperGlue | – | – | – | – | – | – | – | – | ||
| CoplaMatch | 59.48 | 90.09 | 98.71 | 165 | 14.17 | 47.50 | 70.00 | 69 | ||
| SuperPoint | Same Detector | Descriptor similarity | 87.93 | 96.55 | 100.00 | 74 | 43.33 | 75.00 | 88.33 | 133 |
| SuperGlue | 94.83 | 100.00 | 100.00 | 1047 | 58.33 | 86.67 | 93.33 | 455 | ||
| CoplaMatch | 93.10 | 100.00 | 100.00 | 363 | 53.33 | 78.33 | 91.67 | 205 | ||
| Cross-Detector | Descriptor similarity | 21.38 | 69.40 | 87.07 | 71 | 20.00 | 51.25 | 76.25 | 81 | |
| SuperGlue | 43.97 | 50.00 | 50.00 | 223 | 14.59 | 35.41 | 47.50 | 107 | ||
| CoplaMatch | 79.74 | 98.27 | 100.00 | 356 | 30.00 | 65.00 | 85.42 | 162 | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Matez-Bandera, J.-L.; Jaenal, A.; Gomez, C.; Hernandez, A.C.; Monroy, J.; Araújo, J.; Gonzalez-Jimenez, J. Cross-Detector Visual Localization with Coplanarity Constraints for Indoor Environments. Sensors 2025, 25, 7593. https://doi.org/10.3390/s25247593
Matez-Bandera J-L, Jaenal A, Gomez C, Hernandez AC, Monroy J, Araújo J, Gonzalez-Jimenez J. Cross-Detector Visual Localization with Coplanarity Constraints for Indoor Environments. Sensors. 2025; 25(24):7593. https://doi.org/10.3390/s25247593
Chicago/Turabian StyleMatez-Bandera, Jose-Luis, Alberto Jaenal, Clara Gomez, Alejandra C. Hernandez, Javier Monroy, José Araújo, and Javier Gonzalez-Jimenez. 2025. "Cross-Detector Visual Localization with Coplanarity Constraints for Indoor Environments" Sensors 25, no. 24: 7593. https://doi.org/10.3390/s25247593
APA StyleMatez-Bandera, J.-L., Jaenal, A., Gomez, C., Hernandez, A. C., Monroy, J., Araújo, J., & Gonzalez-Jimenez, J. (2025). Cross-Detector Visual Localization with Coplanarity Constraints for Indoor Environments. Sensors, 25(24), 7593. https://doi.org/10.3390/s25247593


