An Intelligent Gated Fusion Network for Waterbody Recognition in Multispectral Remote Sensing Imagery
Highlights
- This study proposes a novel Intelligent Gated Fusion Network (IGF-Net). The dual-branch feature encoder is designed to alleviate the input channel mismatch between pre-trained RGB models and multi-band data. Its core Intelligent Gated Fusion Module (IGFM) facilitates adaptive fusion of spectral and visual features.
- Extensive experiments indicate that IGF-Net achieves highly competitive performance (IoU: 0.8742, Dice: 0.9239) on the newly constructed dataset and shows favorable generalization on an independent Sentinel-2 dataset, performing competitively compared with mainstream segmentation models.
- The work provides an effective and robust deep learning solution for accurate water body recognition from multispectral imagery, which can directly benefit practical applications such as hydrological monitoring, environmental management, and disaster assessment.
- We construct and publicly release the “Tiangong-2 Remote Sensing Image Water Body Semantic Segmentation Dataset” along with the complete model implementation code, offering a valuable benchmark and reproducible research foundation for the community.
Abstract
1. Introduction
2. Dataset
2.1. Tiangong-2 Remote Sensing Image Water Body Semantic Segmentation Dataset
2.2. Sentinel-2 Water Segmentation Dataset
- (1)
- Spatial Resolution Unification: Given the lower spatial resolution of SWIR bands (typically 20 m for Sentinel-2) compared to RGB/NIR bands (10 m), bilinear interpolation was applied to upsample SWIR data to 10 m, ensuring consistent spatial scale across all bands.
- (2)
- Image Cropping and Tiling: A grid-based cropping method was employed to divide the resolution-unified multi-band images into non-overlapping square patches of size 256 × 256 pixels.
- (3)
- Dataset Splitting: In accordance with the 9:1 ratio established for the custom-built dataset, all valid image patches were randomly partitioned into training and test sets.
2.3. Comparison of the Two Datasets
3. Methodology
3.1. Intelligent Gated Fusion Network Architecture
- (1)
- Dual-Branch Feature Encoder Module: This module employs a parallel dual-branch structure to process distinct band subsets of the multispectral data. The visual branch takes the red, green, and blue bands as input and loads weights pre-trained on ImageNet to extract general visual features. The spectral branch processes the remaining bands with randomly initialized weights, specializing in learning spectral-specific features from the multispectral data. Both branches adopt the ResNet-50 backbone [26], producing feature maps at 1/4 of the input spatial resolution with 256 channels.
- (2)
- Intelligent Gated Fusion Module: As the core innovation of this study, this module receives outputs from the dual-branch feature encoder. Through a gating-based adaptive mechanism, it dynamically learns branch-wise attention weights via trainable parameters, enabling selective fusion of visual and spectral features to enhance the model’s focus on key discriminative characteristics.
- (3)
- Atrous Spatial Pyramid Pooling (ASPP) Module [27]: Following further processing by the deep encoder, the fused features are passed to the ASPP module. This module aggregates multi-scale contextual information through parallel atrous convolutions with varying dilation rates, thereby strengthening the model’s capacity to capture water bodies across diverse spatial scales.
3.2. Dual-Branch Feature Encoder Module
3.3. Intelligent Gated Fusion Module
3.3.1. Preprocessing Module
3.3.2. Weight Generation Module
3.3.3. Residual Fidelity Mechanism
4. Experiments
4.1. Experimental Setup
4.1.1. Implementation Details
4.1.2. Evaluation Metrics
4.1.3. Model Complexity and Inference Efficiency
4.2. Comparative Experiments
4.2.1. Comparative Experimental Setup
4.2.2. Quantitative Comparison Results
4.2.3. Qualitative Comparison Results and Discussion
4.2.4. Cross-Temporal Migration Experiment and Result Analysis
4.3. Ablation Study: Replacing the Fusion Module
4.3.1. Ablation Experiment Setup
4.3.2. Quantitative Ablation Results
4.3.3. Qualitative Ablation Results and Discussion
5. Discussion
6. Conclusions
- (1)
- The Tiangong-2 Remote Sensing Image Water Body Semantic Segmentation Dataset was constructed and made publicly available. This dataset provides 3776 multispectral image groups with pixel-level fine annotations for water body segmentation. To systematically validate model generalization, the public Sentinel-2 Water Segmentation Dataset was also introduced, enabling comprehensive performance evaluation across different sensors, band combinations, and scene complexities.
- (2)
- This study proposes the IGF-Net architecture to address the challenges of poor transferability in pre-trained models and inadequate multi-source feature fusion. The network employs a dual-branch encoder as its backbone, centered around the IGFM. Through a cascaded mechanism integrating difference-co-occurrence parallel modeling, channel-context prior, and adaptive temperature control, this module achieves adaptive deep fusion of visual and spectral features. This design not only promotes training stability but also improves water body segmentation accuracy.
- (3)
- Extensive experiments validate the effectiveness and generalization capability of IGF-Net and its core module, IGFM. On TG2-WaterSeg, IGF-Net achieves highly competitive performance (IoU 0.8742), demonstrating its ability to further refine segmentation details in scenarios with relatively limited room for improvement. On the more challenging S2-WaterSeg, IGF-Net shows clearer advantages in key metrics such as IoU (0.5009) and Dice (0.6370), indicating that IGFM is more effective at coordinating complementary information from the visual and spectral branches and achieves a better balance between segmentation completeness and false-positive suppression. Ablation studies comparing five alternative fusion modules further confirm that IGFM provides stronger overall stability, particularly in complex scenarios. Visualization results further show that IGF-Net concentrates cross-modal discrepancies on true boundaries and critical structural regions while suppressing background interference, producing more continuous and clearer responses to complex boundaries and elongated water bodies. Cross-temporal experiments demonstrate that IGF-Net maintains the continuity of major channels despite seasonal variations and imaging condition differences, indicating its ability to learn relatively robust features based on structural and contextual cues.
- (4)
- Complexity analysis reveals that IGF-Net’s dual-branch design leads to higher computational costs compared to lightweight methods, which may limit deployment in resource-constrained scenarios. The model has not been validated under extreme atmospheric conditions or with auxiliary data such as DEM or SAR. Additionally, ablation results suggest that while IGFM achieves the best overall balance, optimal fusion strategies may vary with specific precision-recall requirements.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- McFeeters, S.K. The Use of the Normalized Difference Water Index (NDWI) in the Delineation of Open Water Features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
- Cao, R.; Li, C.; Liu, L. Extracting Miyun Reservoir’s Water Area and Monitoring Its Change Based on a Revised Normalized Difference Water Index. Sci. Surv. Mapp. 2008, 33, 158–160. (In Chinese) [Google Scholar]
- Chen, W.; Ding, J.; Li, Y.; Niu, Z. Extraction of Water Information Based on China-Made GF-1 Remote Sensing Image. Resour. Sci. 2015, 37, 1166–1172. (In Chinese) [Google Scholar]
- Wang, S.; Baig, M.H.A.; Zhang, L.; Jiang, H.; Ji, Y.; Zhao, H.; Tian, J. A Simple Enhanced Water Index (EWI) for Percent Surface Water Estimation Using Landsat Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 90–97. [Google Scholar] [CrossRef]
- Chen, C.; Fu, J.Q.; Sui, X.X.; Lu, X.; Tan, A.H. Construction and Application of Knowledge Decision Tree after a Disaster for Water Body Information Extraction from Remote Sensing Images. J. Remote Sens. 2018, 22, 792–801. (In Chinese) [Google Scholar] [CrossRef]
- Wang, Z.; Liu, J.; Li, J.; Zhang, D.D. Multi-Spectral Water Index (MuWI): A Native 10-m Multi-Spectral Water Index for Accurate Water Mapping on Sentinel-2. Remote Sens. 2018, 10, 1643. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Zhang, L.; Fan, Y.; Yan, R.; Shao, Y.; Wang, G.; Wu, J. Fine-Grained Tidal Flat Waterbody Extraction Method (FYOLOv3) for High-Resolution Remote Sensing Images. Remote Sens. 2021, 13, 2594. [Google Scholar] [CrossRef]
- Weng, Y.; Li, Z.; Tang, G.; Wang, Y. OCNet-Based Water Body Extraction from Remote Sensing Images. Water 2023, 15, 3557. [Google Scholar] [CrossRef]
- Li, L.; Yan, Z.; Shen, Q.; Cheng, G.; Gao, L.; Zhang, B. Water Body Extraction from Very High Spatial Resolution Remote Sensing Data Based on Fully Convolutional Networks. Remote Sens. 2019, 11, 1162. [Google Scholar] [CrossRef]
- Isikdogan, F.; Bovik, A.C.; Passalacqua, P. Surface Water Mapping by Deep Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4909–4918. [Google Scholar] [CrossRef]
- Liao, D.; Sun, J.; Deng, Z.; Zhao, Y.; Zhang, J.; Ou, D. A Lightweight Network for Water Body Segmentation in Agricultural Remote Sensing Using Learnable Kalman Filters and Attention Mechanisms. Appl. Sci. 2025, 15, 6292. [Google Scholar] [CrossRef]
- Cao, H.; Tian, Y.; Liu, Y.; Wang, R. Water body extraction from high spatial resolution remote sensing images based on enhanced U-Net and multi-scale information fusion. Sci. Rep. 2024, 14, 16132. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Zhou, P.; Wang, Y.; Li, X.; Zhang, Y.; Li, X. Deep Learning Small Water Body Mapping by Transfer Learning from Sentinel-2 to PlanetScope. Remote Sens. 2025, 17, 2738. [Google Scholar] [CrossRef]
- Ngo, P.L.; Pham, V.H.; Bui, N.L.; Phan, H.A.T.; Vo, H.B.; Velavan, T.P.; Tran, D.K. Detection of small water bodies for vector control using deep learning on multispectral imagery from unmanned aerial vehicles. Discov. Artif. Intell. 2025, 5, 170. [Google Scholar] [CrossRef]
- Weng, Z.; Li, Q.; Zheng, Z.; Wang, L. SCR-Net: A Dual-Channel Water Body Extraction Model Based on Multi-Spectral Remote Sensing Imagery—A Case Study of Daihai Lake, China. Sensors 2025, 25, 763. [Google Scholar] [CrossRef]
- Yuan, K.; Zhuang, X.; Schaefer, G.; Feng, J.; Guan, L.; Fang, H. Deep-Learning-Based Multispectral Satellite Image Segmentation for Water Body Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 7422–7434. [Google Scholar] [CrossRef]
- Hu, H.; He, Z.; Zheng, H. An Algorithm for Multispectral Water Body Detection in Complex Environments. J. Beijing Univ. Aeronaut. Astronaut. 2025. early access (In Chinese) [Google Scholar] [CrossRef]
- Yang, S.; Wang, L.; Yuan, Y.; Fan, L.; Wu, Y.; Sun, W.; Yang, G. Recognition of Small Water Bodies under Complex Terrain Based on SAR and Optical Image Fusion Algorithm. Sci. Total Environ. 2024, 946, 174329. [Google Scholar] [CrossRef]
- Wang, R.; Zhang, C.; Chen, C.; Hao, H.; Li, W.; Jiao, L. A Multi-Modality Fusion and Gated Multi-Filter U-Net for Water Area Segmentation in Remote Sensing. Remote Sens. 2024, 16, 419. [Google Scholar] [CrossRef]
- Song, W.; Zhao, Y.; Tu, J.; Chen, M.; Xie, Y.; Cui, X. A Visual Attention-Guided Approach for Concrete Crack Detection in Complex Environments. Eng. Appl. Artif. Intell. 2026, 173, 114439. [Google Scholar] [CrossRef]
- Zhou, Z.; Li, S.; Wu, W.; Guo, W.; Li, X.; Xia, G.; Zhao, Z. NaSC-TG2 (Tiangong-2 Remote Sensing Image Natural Scene Classification Dataset), v1.0; National Basic Science Data Center (NBSDC). 2021. Available online: https://cstr.cn/CSTR:16666.11.nbsdc.tfpbwtqf (accessed on 1 January 2026).
- Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder–Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder–Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar] [CrossRef]
- Fu, Y.; Lou, M.; Yu, Y. SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 19077–19087. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Proceedings of the Computer Vision—ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; pp. 205–218. [Google Scholar] [CrossRef]
- Dai, Y.; Giesecke, F.; Oehmcke, S.; Wu, Y.; Barnard, M.; Xing, Y. Attentional Feature Fusion. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Las Vegas, NV, USA, 5–9 January 2021; pp. 3559–3568. [Google Scholar] [CrossRef]
- Kim, J.-H.; Jun, J.; Zhang, B.-T. Bilinear Attention Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada, 3–8 December 2018; pp. 1583–1593. [Google Scholar]
- Fang, Q.Y.; Wang, Z.K. Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery. Pattern Recognit. 2022, 130, 108786. [Google Scholar] [CrossRef]
- Ates, G.C.; Mohan, P.; Celik, E. Dual Cross-Attention for Medical Image Segmentation. Eng. Appl. Artif. Intell. 2023, 126, 107139. [Google Scholar] [CrossRef]











| Representative Method | Input Type | Key Architecture | Main Strengths |
|---|---|---|---|
| Weng et al.: SCR-Net [17] | RGB + NIR | ConvFormer branch + ResNet-50 branch + GAM attention module | Balances global context and local details for effective multispectral fusion and accurate water body segmentation. |
| Yang et al.: Multispectral and SAR Fusion algorithm [20] | Multispectral + SAR | MASF + multi-scale segmentation + random forest | Fuses complementary multispectral and SAR information to identify fragmented small water bodies in complex terrain. |
| Wang et al.: MFGF-UNet [21] | SAR and the seven water indexes | U-Net + gated multi-filter inception module + GCT skip connection | Leverages multimodal and multiscale features for strong, robust, and low-complexity performance on the WIPI, Chengdu, and GF2020 datasets. |
| Liao et al.: LKF-DCANet [13] | RGB | Channel attention-enhanced deformable convolution module + convolutional additive token mixer + learnable Kalman filter | Achieves precise boundary delineation and strong robustness to noise and appearance ambiguity with only 0.22 M parameters. |
| Cao et al.: EU-Net [14] | RGB + NIR | Improved residual connections + multi-scale dilated convolution module + multi-scale feature fusion module + channel and spatial attention mechanisms | Maintains water-body geometry and clear boundaries, especially in small water bodies, narrow channels, and complex scenes. |
| Li et al.: transfer learning framework from Sentinel-2 to PlanetScope [15] | RGB + NIR | Transfer learning framework from Sentinel-2 to PlanetScope + assessment of VMamba for small water-body mapping | Reduces manual annotation effort and improves small water body mapping in cross-sensor transfer learning. |
| Channel Number | Spectral Range (μm) |
|---|---|
| V1 | 0.970–0.990 |
| V2 | 0.930–0.950 |
| V3 | 0.895–0.915 |
| V4 | 0.845–0.885 |
| V5 | 0.810–0.830 |
| V6 | 0.740–0.760 |
| V7 | 0.6775–0.6875 |
| V8 | 0.655–0.675 |
| V9 | 0.610–0.630 |
| V10 | 0.555–0.575 |
| V11 | 0.510–0.530 |
| V12 | 0.480–0.500 |
| V13 | 0.433–0.453 |
| V14 | 0.403–0.423 |
| Aspect | Tiangong-2 Remote Sensing Image Water Body Semantic Segmentation Dataset | Sentinel-2 Water Segmentation Dataset |
|---|---|---|
| Data source | Custom-built from the Tiangong-2 Remote Sensing Image Natural Scene Classification Dataset. | Public dataset released by Yuan et al. |
| Spectral characteristics | 14 discrete spectral bands spanning 0.403–0.990 μm. | Multispectral imagery including RGB, NIR, and SWIR bands. |
| Annotation strategy | Manually annotated in this study using LabelMe. | Labels provided in the original public benchmark dataset. |
| Data preparation | TIFF images converted to NPY format; binary labels generated by a custom script. | SWIR bands upsampled from 20 m to 10 m and cropped into 256 × 256 patches. |
| Temporal setting in this study | Used for supervised training and quantitative evaluation. | April 2018 data used for training, validation, and quantitative testing; December 2018 and February 2019 data used for qualitative visual analysis. |
| Primary role in this study | Task-specific evaluation on a custom-built multispectral dataset. | Benchmark evaluation and cross-temporal generalization analysis on a public dataset. |
| Stage/Module | Input Size | Output Size | Details | |
|---|---|---|---|---|
| Input Data | C × H × W | (C-3) × H × W and 3 × H × W | Split the input into RGB and multispectral data by channel indices. | |
| Encoder | Visual branch Encoder | 3 × H × W | 256 × H/4 × W/4 | Extract RGB features using a shallow ResNet50 encoder with pretrained initialization. |
| Spectral branch Encoder | (C-3) × H × W | 256 × H/4 × W/4 | Extract multispectral features using a shallow ResNet50 encoder with random initialization. | |
| IGFM | 512 × H/4 × W/4 | 256 × H/4 × W/4 | Adaptively fuse dual-branch features with dynamic weights while preserving residual information. | |
| Deep Encoder | 256 × H/4 × W/4 | 2048 × H/8 × W/8 | Further extract high-level semantic features using the deep ResNet50 encoder. | |
| ASPP | 2048 × H/8 × W/8 | 256 × H/8 × W/8 | Extract multi-scale contextual features through parallel convolutions with different dilation rates. | |
| Decoder | ASPP: 256 × H/8 × W/8; IGFM: 256 × H/4 × W/4; RGB Encoder Stage2: 256 × H/4 × W/4; MS Encoder Stage2: 256 × H/4 × W/4 | Num classes × H × W | Generate the final prediction result via transposed convolution upsampling, channel concatenation, and convolutional refinement. | |
| Model | Params (M) | FLOPs (G) | FPS | Avg-Time (ms) | Dice | Precision | Recall | IoU | |
|---|---|---|---|---|---|---|---|---|---|
| RGB mode | DeepLabv3+ | 40.35 | 8.68 | 207.95 | 4.81 | 0.8936 | 0.9103 | 0.8934 | 0.8335 |
| FCN | 9.41 | 7.34 | 807.11 | 1.24 | 0.6668 | 0.7370 | 0.6493 | 0.6086 | |
| PSPNet | 48.94 | 23.51 | 202.25 | 4.94 | 0.8746 | 0.8928 | 0.8727 | 0.8094 | |
| U-Net | 31.04 | 27.37 | 258.05 | 3.88 | 0.8927 | 0.9204 | 0.8865 | 0.8324 | |
| Swin-Unet | 27.15 | 3.85 | 107.05 | 9.34 | 0.8085 | 0.8765 | 0.7903 | 0.7415 | |
| LKF-DCANet | 4.30 | 10.15 | 334.81 | 2.99 | 0.8987 | 0.9056 | 0.9056 | 0.8414 | |
| SegMAN | 26.25 | 15.75 | 15.38 | 65.00 | 0.9021 | 0.9212 | 0.8982 | 0.8445 | |
| Full-band mode | DeepLabv3+ | 40.38 | 8.96 | 202.52 | 4.94 | 0.9096 | 0.9208 | 0.9109 | 0.8542 |
| FCN | 9.41 | 7.54 | 806.30 | 1.24 | 0.6801 | 0.7304 | 0.6691 | 0.6213 | |
| PSPNet | 48.98 | 23.79 | 203.62 | 4.91 | 0.8891 | 0.8991 | 0.8928 | 0.8281 | |
| U-Net | 31.04 | 27.58 | 256.77 | 3.89 | 0.8941 | 0.9180 | 0.8907 | 0.8388 | |
| Swin-Unet | 27.17 | 3.89 | 102.23 | 9.78 | 0.8834 | 0.9123 | 0.8752 | 0.8237 | |
| LKF-DCANet | 4.31 | 10.25 | 324.26 | 3.08 | 0.9122 | 0.9223 | 0.9125 | 0.8585 | |
| SegMAN | 26.25 | 15.78 | 15.10 | 66.22 | 0.9138 | 0.9264 | 0.9123 | 0.8605 | |
| IGF-Net (ours) | 45.15 | 51.48 | 97.92 | 10.21 | 0.9239 | 0.9331 | 0.9235 | 0.8742 | |
| Model | Params (M) | FLOPs (G) | FPS | Avg-Time (ms) | Dice | Precision | Recall | IoU | |
|---|---|---|---|---|---|---|---|---|---|
| RGB mode | DeepLabv3+ | 40.35 | 34.73 | 160.03 | 6.25 | 0.4464 | 0.6188 | 0.4031 | 0.3403 |
| FCN | 9.41 | 29.35 | 296.13 | 3.38 | 0.1930 | 0.3492 | 0.1641 | 0.1368 | |
| PSPNet | 48.94 | 93.71 | 90.99 | 10.99 | 0.4230 | 0.5849 | 0.3782 | 0.3188 | |
| U-Net | 31.04 | 109.48 | 89.53 | 11.17 | 0.4800 | 0.6567 | 0.4320 | 0.3660 | |
| Swin-Unet | 27.15 | 15.41 | 73.99 | 13.51 | 0.2966 | 0.5391 | 0.2456 | 0.2172 | |
| LKF-DCANet | 4.30 | 40.60 | 115.37 | 8.67 | 0.3079 | 0.5509 | 0.2578 | 0.2270 | |
| SegMAN | 26.25 | 61.78 | 9.15 | 109.24 | 0.4411 | 0.6316 | 0.3903 | 0.3327 | |
| Full-band mode | DeepLabv3+ | 40.35 | 34.93 | 160.05 | 6.25 | 0.6152 | 0.7225 | 0.5798 | 0.4785 |
| FCN | 9.41 | 29.50 | 300.05 | 3.33 | 0.2484 | 0.4249 | 0.2141 | 0.1735 | |
| PSPNet | 48.95 | 93.92 | 90.17 | 11.09 | 0.5937 | 0.7063 | 0.5616 | 0.4577 | |
| U-Net | 31.04 | 109.63 | 90.69 | 11.03 | 0.6272 | 0.7395 | 0.5877 | 0.4909 | |
| Swin-Unet | 27.15 | 15.44 | 78.22 | 12.78 | 0.5192 | 0.6977 | 0.4683 | 0.3916 | |
| LKF-DCANet | 4.30 | 40.68 | 115.78 | 8.64 | 0.6346 | 0.7343 | 0.5949 | 0.4969 | |
| SegMAN | 26.25 | 61.80 | 9.11 | 109.81 | 0.6236 | 0.7123 | 0.5949 | 0.4855 | |
| IGF-Net (ours) | 45.12 | 205.00 | 55.87 | 17.90 | 0.6370 | 0.7090 | 0.6198 | 0.5009 | |
| Models | IoU | Dice | Precision | Recall | |
|---|---|---|---|---|---|
| TG2-WaterSeg | Model 1 | 0.8712 | 0.9224 | 0.9269 | 0.9287 |
| Model 2 | 0.8713 | 0.9227 | 0.9271 | 0.9284 | |
| Model 3 | 0.8674 | 0.9207 | 0.9339 | 0.9166 | |
| Model 4 | 0.8734 | 0.9239 | 0.9221 | 0.9367 | |
| Model 5 | 0.8686 | 0.9210 | 0.9336 | 0.9184 | |
| IGF-Net (ours) | 0.8742 | 0.9239 | 0.9331 | 0.9235 | |
| S2-WaterSeg | Model 1 | 0.4918 | 0.6283 | 0.7194 | 0.6024 |
| Model 2 | 0.4893 | 0.6263 | 0.7158 | 0.5957 | |
| Model 3 | 0.4774 | 0.6145 | 0.7036 | 0.5926 | |
| Model 4 | 0.4971 | 0.6322 | 0.7006 | 0.6154 | |
| Model 5 | 0.4601 | 0.5946 | 0.6914 | 0.5740 | |
| IGF-Net (ours) | 0.5009 | 0.6370 | 0.7090 | 0.6198 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhao, T.; Hou, C.; Zhang, Z.; Zhou, Z. An Intelligent Gated Fusion Network for Waterbody Recognition in Multispectral Remote Sensing Imagery. Remote Sens. 2026, 18, 1088. https://doi.org/10.3390/rs18071088
Zhao T, Hou C, Zhang Z, Zhou Z. An Intelligent Gated Fusion Network for Waterbody Recognition in Multispectral Remote Sensing Imagery. Remote Sensing. 2026; 18(7):1088. https://doi.org/10.3390/rs18071088
Chicago/Turabian StyleZhao, Tong, Chuanxun Hou, Zhili Zhang, and Zhaofa Zhou. 2026. "An Intelligent Gated Fusion Network for Waterbody Recognition in Multispectral Remote Sensing Imagery" Remote Sensing 18, no. 7: 1088. https://doi.org/10.3390/rs18071088
APA StyleZhao, T., Hou, C., Zhang, Z., & Zhou, Z. (2026). An Intelligent Gated Fusion Network for Waterbody Recognition in Multispectral Remote Sensing Imagery. Remote Sensing, 18(7), 1088. https://doi.org/10.3390/rs18071088
