Hybrid Deep–Geometric Approach for Efficient Consistency Assessment of Stereo Images
Abstract
Highlights
- We propose a self-contained, single-pair stereo consistency check that fuses epipolar geometry with Transformer-based object detection.
- Our method flags both global camera misalignments and localized semantic or geometric anomalies without external calibration data.
- Enables on-the-fly quality assurance of stereo rigs in applications from robotics to 3D cinematography.
- Lays groundwork for combining semantic scene understanding with classical stereo geometry.
Abstract
1. Introduction
- We propose a hybrid geometric consistency assessment method (HGC-Net) that fuses classical epipolar geometry analysis with deep learning techniques.
- The core metric, a scalar score A, quantifies stereo alignment from a single image pair without requiring calibration or depth ground truth.
- We enhance robustness and accuracy by integrating deep feature matchers (e.g., LoFTR), Transformer-based disparity estimators (e.g., STTR), and attention-based diagnostic modules.
- Our method outperforms baseline techniques in detecting both global miscalibration and local geometric anomalies, achieving correlation with simulated distortion levels.
2. Related Work
2.1. Stereo Geometric Consistency and Calibration Methods
2.2. Disparity-Based Consistency Metrics
2.3. No-Reference Stereoscopic Image Quality (NR-SIQA)
2.4. Learning-Based Approaches for Stereo Geometry
3. Materials and Methods
3.1. Classical Epipolar Geometry Method (Base Algorithm)
- Lowe’s ratio test—to ensure unique best matches;
- symmetry check—keeping only matches that are mutual;
- robust estimation via RANSAC—to filter any remaining incorrect matches.
- Non-zero slope ( or ) means the epipolar line is tilted, suggesting the cameras are rotated relative to the horizontal baseline (e.g., pitch or roll).
- A difference in intercept () indicates vertical offset between left and right epipolar lines, often caused by camera height mismatch or vertical misalignment.
- Slope error :
- Offset error :
- A scalar score quantifying stereo geometric consistency;
- No need for ground truth depth or camera calibration;
- High sensitivity to tilt and vertical shift misalignments;
- Deterministic and fast execution suitable for real-time use.
3.2. Integration of Deep Learning Techniques (Hybrid Approach)
- Input enhancement: Deep feature matchers improve the quality of correspondences for F estimation.
- Auxiliary validation: Networks like STTR provide an independent check on stereo geometry quality.
- Interpretability: Attention-based diagnostics localize misalignment sources.
4. Results
Quantitative Evaluation on Benchmark Data
- Mild: 0.5° tilt or 2 px vertical shift.
- Moderate: 1° tilt or 5 px shift.
- Severe: 2° tilt or 10 px shift.
5. Discussion
- Advantages of the Hybrid Approach.
- Comparison to Prior Work.
- Limitations.
- Threshold Selection.
- Perfect calibration: .
- Minor errors: .
- Clear misalignment: .
- Future Improvements.
- Implications.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Butt, M.Z.; Nasir, N.; Arashid, R. A review of perception sensors, techniques, and hardware architectures for autonomous low-altitude UAVs in non-cooperative local obstacle avoidance. Robot. Auton. Syst. 2024, 173, 104629. [Google Scholar] [CrossRef]
- Zhao, W.; Nandhakumar, N. Effects of camera alignment errors on stereoscopic depth estimates. Pattern Recognit. 1996, 29, 2115–2126. [Google Scholar] [CrossRef]
- Gao, Z.; Hwang, A.; Zhai, G.; Peli, E. Correcting geometric distortions in stereoscopic 3D imaging. PLoS ONE 2018, 13, e0205032. [Google Scholar] [CrossRef] [PubMed]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 3rd ed.; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
- Howard, I.P.; Rogers, B.J. Binocular Vision and Stereopsis; Oxford Psychology Series; Oxford University Press: New York, NY, USA, 1995. [Google Scholar]
- Napieralski, P.; Kowalczyk, M. Detection of vertical disparity in three-dimensional visualizations. Open Phys. 2017, 15, 1028–1033. [Google Scholar] [CrossRef]
- Li, Z.; Liu, X.; Drenkow, N.; Ding, A.; Creighton, F.X.; Taylor, R.H.; Unberath, M. Revisiting Stereo Depth Estimation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 6177–6186. [Google Scholar] [CrossRef]
- Poursaeed, O.; Yang, G.; Prakash, A.; Fang, Q.; Jiang, H.; Hariharan, B.; Belongie, S. Deep Fundamental Matrix Estimation without Correspondences. arXiv 2018, arXiv:1810.01575. [Google Scholar] [CrossRef]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 21–25 June 2021. [Google Scholar]
- Zheng, H.; Duan, F.; Fu, X.; Liu, C.; Li, T.; Yan, M. A Non-Coplanar High-Precision Calibration Method for Cameras Based on Affine Coordinate Correction Model. Meas. Sci. Technol. 2023, 34, 095018. [Google Scholar] [CrossRef]
- Felipe, J.; Sigut, M.; Acosta, L. Calibration of a stereoscopic vision system in the presence of errors in pitch angle. Sensors 2023, 23, 212. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Cho, S.; Chung, M.; Kim, Y. Improving Disparity Consistency With Self-Refined Cost Volumes for Deep Learning-Based Satellite Stereo Matching. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 9262–9278. [Google Scholar] [CrossRef]
- Li, F.; Li, Q.; Zhang, T.; Niu, Y.; Shi, G. Depth acquisition with the combination of structured light and deep learning stereo matching. Signal Process. Image Commun. 2019, 75, 111–117. [Google Scholar] [CrossRef]
- Hu, J.; Wang, X.; Chai, X.; Shao, F.; Jiang, Q. Deep network based stereoscopic image quality assessment via binocular summing and differencing. J. Vis. Commun. Image Represent. 2022, 82, 103420. [Google Scholar] [CrossRef]
- Wang, H.; Ke, X.; Guo, W.; Zheng, W. No-reference stereoscopic image quality assessment based on binocular collaboration. Neural Netw. 2024, 180, 106752. [Google Scholar] [CrossRef] [PubMed]
- Morra, L.; Famouri, S.; Karakus, H.C.; Lamberti, F. Automatic detection of canonical image orientation by convolutional neural networks. In Proceedings of the 2019 IEEE 23rd International Symposium on Consumer Technologies (ISCT), Ancona, Italy, 19–21 June 2019; pp. 118–123. [Google Scholar] [CrossRef]
- Liu, L.; Zhang, F.; Su, W.; Qi, Y.; Tao, W. Geometric Prior-Guided Self-Supervised Learning for Multi-View Stereo. Remote Sens. 2023, 15, 2109. [Google Scholar] [CrossRef]
- Yang, W.; Yang, R.; Li, X. A Canonical Correlation Analysis Study on the Association Between Neighborhood Green Space and Residents’Mental Health. J. Urban Health 2023, 100, 696–710. [Google Scholar] [CrossRef] [PubMed]
- Huynh, P.; Ross, R.; Martchenko, A.; Devlin, J. Anomaly inspection in sewer pipes using stereo vision. In Proceedings of the 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 19–21 October 2015; pp. 60–64. [Google Scholar] [CrossRef]
- Le, Q.V.; Ng, A.Y. Joint calibration of multiple sensors. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009; pp. 3651–3658. [Google Scholar] [CrossRef]
- Scharstein, D.; Hirschmüller, H.; Kitajima, Y.; Krathwohl, G.; Nešić, N.; Wang, X.; Westling, P. High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. In Lecture Notes in Computer Science, Proceedings of the 36th German Conference on Pattern Recognition, GCPR 2014, Münster, Germany, 2–5 September 2014; Jiang, X., Hornegger, J., Koch, R., Eds.; Springer: Cham, Switzerland, 2014; Volume 8753, pp. 31–42. [Google Scholar]
Type | Level | Ours (A) | A Det. | Disp. Det. | Calib. Det. |
---|---|---|---|---|---|
Tilt | Mild 0.5° | 0.94 (±0.03) | 80% | 20% | 30% |
Mod. 1° | 0.85 (±0.05) | 100% | 70% | 75% | |
Sev. 2° | 0.60 (±0.10) | 100% | 100% | 100% | |
Shift | Mild 2 px | 0.96 (±0.02) | 70% | 10% | 40% |
Mod. 5 px | 0.88 (±0.04) | 100% | 60% | 80% | |
Sev. 10 px | 0.70 (±0.08) | 100% | 100% | 100% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kowalczyk, M.; Napieralski, P.; Szajerman, D. Hybrid Deep–Geometric Approach for Efficient Consistency Assessment of Stereo Images. Sensors 2025, 25, 4507. https://doi.org/10.3390/s25144507
Kowalczyk M, Napieralski P, Szajerman D. Hybrid Deep–Geometric Approach for Efficient Consistency Assessment of Stereo Images. Sensors. 2025; 25(14):4507. https://doi.org/10.3390/s25144507
Chicago/Turabian StyleKowalczyk, Michał, Piotr Napieralski, and Dominik Szajerman. 2025. "Hybrid Deep–Geometric Approach for Efficient Consistency Assessment of Stereo Images" Sensors 25, no. 14: 4507. https://doi.org/10.3390/s25144507
APA StyleKowalczyk, M., Napieralski, P., & Szajerman, D. (2025). Hybrid Deep–Geometric Approach for Efficient Consistency Assessment of Stereo Images. Sensors, 25(14), 4507. https://doi.org/10.3390/s25144507