You are currently on the new version of our website. Access the old version .
SensorsSensors
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

22 January 2026

PriorSAM-DBNet: A SAM-Prior-Enhanced Dual-Branch Network for Efficient Semantic Segmentation of High-Resolution Remote Sensing Images

,
,
,
and
1
College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
2
State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
*
Author to whom correspondence should be addressed.
Sensors2026, 26(2), 749;https://doi.org/10.3390/s26020749 
(registering DOI)
This article belongs to the Special Issue Artificial Intelligence-Based Target Recognition and Remote Sensing Data Processing

Abstract

Semantic segmentation of high-resolution remote sensing imagery is a critical technology for the intelligent interpretation of sensor data, supporting automated environmental monitoring and urban sensing systems. However, processing data from dense urban scenarios remains challenging due to sensor signal occlusions (e.g., shadows) and the complexity of parsing multi-scale targets from optical sensors. Existing approaches often exhibit a trade-off between the accuracy of global semantic modeling and the precision of complex boundary recognition. While the Segment Anything Model (SAM) offers powerful zero-shot structural priors, its direct application to remote sensing is hindered by domain gaps and the lack of inherent semantic categorization. To address these limitations, we propose a dual-branch cooperative network, PriorSAM-DBNet. The main branch employs a Densely Connected Swin (DC-Swin) Transformer to capture cross-scale global features via a hierarchical shifted window attention mechanism. The auxiliary branch leverages SAM’s zero-shot capability to exploit structural universality, generating object-boundary masks as robust signal priors while bypassing semantic domain shifts. Crucially, we introduce a parameter-efficient Scaled Subsampling Projection (SSP) module that employs a weight-sharing mechanism to align cross-modal features, freezing the massive SAM backbone to ensure computational viability for practical sensor applications. Furthermore, a novel Attentive Cross-Modal Fusion (ACMF) module is designed to dynamically resolve semantic ambiguities by calibrating the global context with local structural priors. Extensive experiments on the ISPRS Vaihingen, Potsdam, and LoveDA-Urban datasets demonstrate that PriorSAM-DBNet outperforms state-of-the-art approaches. By fine-tuning only 0.91 million parameters in the auxiliary branch, our method achieves mIoU scores of 82.50%, 85.59%, and 53.36%, respectively. The proposed framework offers a scalable, high-precision solution for remote sensing semantic segmentation, particularly effective for disaster emergency response where rapid feature recognition from sensor streams is paramount.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.