Next Article in Journal
An Imaging System with Robust Spot Detection for Space Laser Communication ATP Systems
Previous Article in Journal
A Proof-of-Concept Free-Flight Photogrammetric Framework Based on Monocular Vision and Sensor-Group Displacement Fusion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Sparse Self-Prompt-Guided Stereo Matching for Real-World Generalization

1
School of Information and Engineering, Nanchang Hangkong University, Nanchang 330063, China
2
North Lian Chuang Communication Co., Ltd., Nanchang 330096, China
3
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
4
School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(10), 3173; https://doi.org/10.3390/s26103173
Submission received: 4 April 2026 / Revised: 8 May 2026 / Accepted: 14 May 2026 / Published: 17 May 2026
(This article belongs to the Section Sensing and Imaging)

Abstract

Stereo matching has witnessed rapid advances on curated benchmarks, yet deploying models in unconstrained real-world environments remains a fundamental challenge. This paper presents a sparse self-prompt-guided network (SSPGNet) for stereo matching with strong generalization across diverse environments. Our core innovation lies in a sparse self-prompt guidance mechanism: (1) a sparse disparity map, used as a prompt, is self-estimated from visual foundation model features via cost aggregation; (2) the sparse disparity is progressively refined into dense disparity maps through cross-attention-based stereo feature interaction, enabling sparse-to-dense disparity prediction. Additionally, we collected a diverse set of indoor and outdoor stereo pairs by using a ZED 2 camera to assess the real-world performance of our model. Extensive experiments demonstrate that the proposed sparse-to-dense prompt mechanism not only preserves the semantic awareness of visual foundation models but also enhances stereo correspondence reasoning, achieving strong performance on public benchmarks and our in-the-wild dataset. Specifically, under the cross-domain (zero-shot) protocol, the proposed SSPGNet achieves bad-pixel error rates of 3.6% on KITTI 2012 (>3 px), 4.4% on KITTI 2015 (>3 px), 7.6% on Middlebury (>2 px), and 2.1% on ETH3D (>1 px), ranking first on three of the four public benchmarks. These results highlight the potential of SSPGNet for direct deployment in real-world stereo perception systems. The code is publicly available at GitHub.
Keywords: stereo matching; domain generalization; vision foundation models; sparse prompt; real-world perception; disparity estimation stereo matching; domain generalization; vision foundation models; sparse prompt; real-world perception; disparity estimation

Share and Cite

MDPI and ACS Style

Li, H.; Mo, H.; Li, X.; Fang, T.; Liu, S.; Yu, S.; Rao, Z. Sparse Self-Prompt-Guided Stereo Matching for Real-World Generalization. Sensors 2026, 26, 3173. https://doi.org/10.3390/s26103173

AMA Style

Li H, Mo H, Li X, Fang T, Liu S, Yu S, Rao Z. Sparse Self-Prompt-Guided Stereo Matching for Real-World Generalization. Sensors. 2026; 26(10):3173. https://doi.org/10.3390/s26103173

Chicago/Turabian Style

Li, Hangbiao, Haojun Mo, Xing Li, Tao Fang, Sikun Liu, Shuzhen Yu, and Zhibo Rao. 2026. "Sparse Self-Prompt-Guided Stereo Matching for Real-World Generalization" Sensors 26, no. 10: 3173. https://doi.org/10.3390/s26103173

APA Style

Li, H., Mo, H., Li, X., Fang, T., Liu, S., Yu, S., & Rao, Z. (2026). Sparse Self-Prompt-Guided Stereo Matching for Real-World Generalization. Sensors, 26(10), 3173. https://doi.org/10.3390/s26103173

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop