MMA-Net: A Semantic Segmentation Network for High-Resolution Remote Sensing Images Based on Multimodal Fusion and Multi-Scale Multi-Attention Mechanisms
Highlights
- We propose MMA-Net, a novel dual-branch network that integrates CLMF and MSMA modules to achieve state-of-the-art performance, with mIoU scores of 88.74% and 84.92% on the Potsdam and Vaihingen datasets, respectively.
- The CLMF module uses a two-stage fusion strategy to effectively preserve spatial details and suppress DSM noise. The MSMA module innovatively incorporates multi-scale depthwise separable convolutions into the attention block, which significantly boosts the model’s ability to perceive and segment ground objects of vastly different sizes.
- MMA-Net provides a robust solution for accurate semantic segmentation of high-resolution remote sensing images, particularly improving boundary clarity and small-object recognition, which is critical for applications such as urban planning and disaster monitoring.
- The proposed modular design offers a generalizable framework for multimodal data fusion, demonstrating significant potential for extending to other remote sensing tasks involving multi-source data.
Abstract
1. Introduction
- (1)
- We propose MMA-Net, a novel dual-branch encoder–decoder architecture that systematically integrates the Cross-Layer Multimodal Fusion (CLMF) module and the Multi-Scale Multi-Attention (MSMA) module. This design enables complementary feature enhancement across both spatial and semantic dimensions, leading to significant improvements in semantic segmentation accuracy. In experiments on the ISPRS Vaihingen and Potsdam datasets, the model achieved mIoU scores of 84.92% and 88.74%, respectively, outperforming state-of-the-art methods by 0.76% to 1.43%.
- (2)
- This paper proposes a novel Cross-Layer Multimodal Fusion (CLMF) module, which adopts a two-stage fusion strategy of “cross-layer fusion followed by multi-modal fusion” to effectively address the loss of spatial detail (shallow features) during the encoder’s downsampling process. In the Cross-Layer Fusion (CLF) stage, adjacent hierarchical features are integrated via concatenation and channel attention weighting, enabling a dynamic combination of shallow details and high-level semantics. The Multi-Modal Fusion (MMF) stage incorporates a redundancy filtering mechanism to suppress noise and interference caused by directly integrating raw DSM features.
- (3)
- This paper proposes an MSMA, which integrates Self-Attention (SA) and Cross-Attention (CA) to enable simultaneous capture of intra-modal local context and inter-modal complementary features. Adopting a three-layer structural design, this block effectively models the long-range correlations between RGB and DSM modalities. Moreover, we innovatively incorporate multi-scale depthwise separable convolutions into the attention block, which enhances the network’s ability to model ground objects of varying scales without significantly increasing computational cost.
2. Related Work
2.1. Single-Modal Remote Sensing Image Semantic Segmentation Methods
2.2. Multimodal Remote Sensing Image Semantic Segmentation Methods
- ①
- Data-level fusion: This method involves directly concatenating or enhancing raw data before feeding it into the network [23]. However, this approach easily introduces multi-modal redundant information and fails to account for differences in physical properties between modalities, resulting in inefficient feature learning.
- ②
- Decision-level fusion: This approach first generates classification results from single-modal models separately, then fuses decisions via voting, weighting, or other methods [24]. This approach offers high flexibility but neglects early feature interactions between modalities, making it difficult to capture deep semantic correlations.
- ③
- Feature-level fusion: Fuses features across diverse modalities within the network’s intermediate layers, which is currently the mainstream strategy. Early methods such as FuseNet [25] and vFuseNet [26] adopted simple concatenation or summation for feature fusion, failing to fully explore inter-modal dependencies. In recent years, attention mechanisms have found extensive application in feature-level fusion. For example, ASFFuse [27] dynamically adjusts modal weights through spatial and channel attention, while SA-Gate [15] and CFNet [28] use gating mechanisms to filter effective features, significantly improving fusion efficiency.
3. Methods
3.1. Overview
3.2. Cross-Layer Multimodal Fusion Module (CLMF)
- (1)
- Cross-Layer Feature Fusion (CLF)
- (2)
- Multi-Modal Feature Fusion (MMF)
3.3. Multi-Scale Multi-Attention (MSMA)
3.4. Loss Function
4. Results
4.1. Dataset
4.2. Evaluation Metrics
4.3. Experimental Details
4.4. Performance Comparison
4.4.1. Metrics Comparison
- (1)
- Vaihingen Dataset
- (2)
- Potsdam Dataset
4.4.2. Visualization Results Comparison
4.4.3. Few-Shot Class Segmentation Performance Validation
4.5. Ablation Experiments
- (1)
- Impact of Individual Modules
- (2)
- Number of Multi-scale Multi-attention Layers
4.6. Complexity Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| DL | Deep learning |
| HRRSI | high-resolution remote sensing imagery |
| RGB | Red-green-blue |
| FCN | Fully convolutional network |
| nDSM | Normalized digital surface model |
| MMA-Net | Multi-scale Multimodal Fusion Network |
| CLMF | Cross-layer Multimodal Fusion |
| CLF | Cross-layer fusion |
| MMF | Multi-modal fusion |
| MSMA | Multi-Scale Multi-Attention |
| ReLu | Rectified linear unit |
| GAP | Global Average Pooling |
| CA | Cross Attention |
| SA | Self-Attention |
References
- Ye, C.; Li, Y.; Cui, P.; Liang, L.; Pirasteh, S.; Marcato, J.; Goncalves, W.N.; Li, J. Landslide Detection of Hyperspectral Remote Sensing Data Based on Deep Learning with Constrains. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 99, 5047–5060. [Google Scholar] [CrossRef]
- Zhang, X.; Yu, W.; Pun, M.-O.; Shi, W. Cross-domain landslide mapping from large-scale remote sensing images using prototype-guided domain-aware progressive representation learning. ISPRS J. Photogramm. Remote Sens. 2023, 197, 1–17. [Google Scholar] [CrossRef]
- Xu, Z.; Shen, Z.; Li, Y.; Xia, L.; Wang, H.; Li, S.; Jiao, S.; Lei, Y. Road extraction in mountainous regions from high-resolution images based on DSDNet and terrain optimization. Remote Sens. 2020, 13, 90. [Google Scholar] [CrossRef]
- Meng, Y.; Chen, S.; Liu, Y.; Li, L.; Zhang, Z.; Ke, T.; Hu, X. Unsupervised building extraction from multimodal aerial data based on accurate vegetation removal and image feature consistency constraint. Remote Sens. 2022, 14, 1912. [Google Scholar] [CrossRef]
- Li, R.; Zheng, S.; Duan, C.; Wang, L.; Zhang, C. Land cover classification from remote sensing images based on multi-scale fully convolutional network. Geo-Spatial Inf. Sci. 2022, 25, 278–294. [Google Scholar] [CrossRef]
- Liu, H.; Li, W.; Xia, X.-G.; Zhang, M.; Gao, C.-Z.; Tao, R. Central attention network for hyperspectral imagery classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 8989–9003. [Google Scholar] [CrossRef] [PubMed]
- Zhang, M.; Li, W.; Zhang, Y.; Tao, R.; Du, Q. Hyperspectral and LiDAR data classification based on structural optimization transmission. IEEE Trans. Cybern. 2023, 53, 3153–3164. [Google Scholar] [CrossRef]
- Liu, Y.; Fan, B.; Wang, L.; Bai, J.; Xiang, S.; Pan, C. Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS J. Photogramm. Remote Sens. 2018, 145, 78–95. [Google Scholar] [CrossRef]
- Yao, H.; Qin, R.; Chen, X. Unmanned aerial vehicle for remote sensing applications—A review. Remote Sens. 2019, 11, 1443. [Google Scholar] [CrossRef]
- Yan, L.; Fan, B.; Liu, H.; Huo, C.; Xiang, S.; Pan, C. Triplet adversarial domain adaptation for pixel-level classification of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3558–3573. [Google Scholar] [CrossRef]
- Yang, X.; Li, S.; Chen, Z.; Chanussot, J.; Jia, X.; Zhang, B.; Li, B.; Chen, P. An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2021, 177, 238–262. [Google Scholar] [CrossRef]
- Zhou, W.; Jin, J.; Lei, J.; Yu, L. CIMFNet: Cross-layer interaction and multiscale fusion network for semantic segmentation of high-resolution remote sensing images. IEEE J. Sel. Top. Signal Process. 2022, 16, 666–676. [Google Scholar] [CrossRef]
- Ma, J.; Tang, L.; Fan, F.; Huang, J.; Mei, X.; Ma, Y. SwinFusion: Cross-domain long-range learning for general image fusion via Swin transformer. IEEE/CAA J. Autom. Sin. 2022, 9, 1200–1217. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Pun, M.O.; Liu, M. A multilevel multimodal fusion transformer for remote sensing semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5403215. [Google Scholar] [CrossRef]
- Chen, X.; Lin, K.-Y.; Wang, J.; Wu, W.; Qian, C.; Li, H.; Zeng, G. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 561–577. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional net-works for biomedical image segmentation. In Bildverarbeitung für die Medizin 2017; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar] [CrossRef]
- Xing, H.; Wei, W.; Zhang, L.; Zhang, Y. Multi-scale feature extraction and fusion with attention interaction for RGB-T tracking. Pattern Recognit. 2025, 157, 110917. [Google Scholar] [CrossRef]
- Hou, J.; Guo, Z.; Wu, Y.; Diao, W.; Xu, T. BSNet: Dynamic hybrid gradient convolution based boundary-sensitive network for remote sensing image segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5624022. [Google Scholar] [CrossRef]
- Niu, R.; Sun, X.; Tian, Y.; Diao, W.; Chen, K.; Fu, K. Hybrid multiple attention network for semantic segmentation in aerial images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5603018. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Pun, M.-O. A crossmodal multiscale fusion network for semantic segmentation of remote sensing data. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2022, 15, 3463–3474. [Google Scholar] [CrossRef]
- Hosseinpour, H.; Samadzadegan, F.; Javan, F.D. CMGFNet: A deep cross-modal gated fusion network for building extraction from very high-resolution remote sensing images. ISPRS J. Photogramm. Remote Sens. 2022, 184, 96–115. [Google Scholar] [CrossRef]
- Mohammadi, H.; Samadzadegan, F. An object based framework for building change analysis using 2D and 3D information of high resolution satellite images. Adv. Space Res. 2020, 66, 1386–1404. [Google Scholar] [CrossRef]
- Hazirbas, C.; Ma, L.; Domokos, C.; Cremers, D. Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 213–228. [Google Scholar] [CrossRef]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS J. Photogramm. Remote Sens. 2018, 140, 20–32. [Google Scholar] [CrossRef]
- Liu, K.; Li, M.; Zuo, E.; Chen, C.; Chen, C.; Wang, B.; Wang, Y.; Lv, X. ASFFuse:Infrared and visible image fusion model based on adaptive selection feature maps. Pattern Recognit. 2024, 149, 110226. [Google Scholar] [CrossRef]
- Xing, M.; Liu, G.; Tang, H.; Qian, Y.; Zhang, J. CFNet: An infrared and visible image compression fusion network. Pattern Recognit 2024, 156, 110774. [Google Scholar] [CrossRef]
- Prakash, A.; Chitta, K.; Geiger, A. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. arXiv 2021, arXiv:2104.09224. [Google Scholar] [CrossRef]
- Wang, T.; Chen, G.; Zhang, X.; Liu, C.; Wang, J.; Tan, X.; Zhou, W.; He, C. LMFNet: Lightweight Multimodal Fusion Network for high-resolution remote sensing image segmentation. Pattern Recognit. 2025, 164, 111579. [Google Scholar] [CrossRef]
- Yan, L.; Huang, J.; Xie, H.; Wei, P.; Gao, Z. Efficient depth fusion transformer for aerial image semantic segmentation. Remote Sens. 2022, 14, 1294–1304. [Google Scholar] [CrossRef]
- He, S.; Yang, H.; Zhang, X.; Li, X. MFTransNet: A multi-modal fusion with CNN-transformer network for semantic segmentation of HSR remote sensing images. Mathematics 2023, 11, 722–735. [Google Scholar] [CrossRef]
- Feng, H.; Hu, Q.; Zhao, P.; Wang, S.; Ai, M.; Zheng, D.; Liu, T. FTransDeepLab: Multimodal Fusion Transformer-Based DeepLabv3+ for Remote Sensing Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4406618. [Google Scholar] [CrossRef]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar] [CrossRef]
- Xu, Z.; Zhang, W.; Zhang, T.; Li, J. HRCNet: High-resolution context extraction network for semantic segmentation of remote sensing images. Remote Sens. 2020, 13, 71. [Google Scholar] [CrossRef]
- Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Su, J.; Wang, L.; Atkinson, P.M. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5607713. [Google Scholar] [CrossRef]
- Yao, M.; Zhang, Y.; Liu, G.; Pang, D. SSNet: A novel transformer and CNN hybrid network for remote sensing semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3023–3037. [Google Scholar] [CrossRef]
- Zhu, L.; Kang, Z.; Zhou, M.; Yang, X.; Wang, Z.; Cao, Z.; Ye, C. CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation. Sensors 2022, 22, 8520. [Google Scholar] [CrossRef] [PubMed]
- Pan, C.; Fan, X.; Tjahjadi, T.; Guan, H.; Fu, L.; Ye, Q.; Wang, R. Vision foundation model guided multi-modal fusion network for remote sensing semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 9409–9431. [Google Scholar] [CrossRef]










| Method | Muti-Modal | mIoU (%) | mF1 (%) | OA (%) | |||||
|---|---|---|---|---|---|---|---|---|---|
| Imp.Surf | Building | Low Veg. | Tree | Car | Total | ||||
| HRCNet (2020) | × | 67.69 | 79.75 | 79.16 | 85.90 | 64.36 | 74.94 | 60.93 | 73.06 |
| MANet (2022) | × | 79.56 | 85.94 | 91.57 | 93.77 | 76.29 | 89.91 | 86.03 | 87.51 |
| SSNet (2024) | × | 76.68 | 85.91 | 80.10 | 87.53 | 69.17 | 71.76 | 76.34 | 76.98 |
| CMANet (2022) | √ | 77.94 | 85.94 | 92.02 | 95.29 | 77.35 | 90.48 | 73.56 | 85.94 |
| CIMFNet (2022) | √ | 81.44 | 89.31 | 92.19 | 95.33 | 77.96 | 90.32 | 88.76 | 88.91 |
| VSGNet (2025) | √ | 84.16 | 91.58 | 93.61 | 98.16 | 83.02 | 88.78 | 91.07 | 91.09 |
| FTransUNet (2024) | √ | 83.69 | 90.84 | 92.48 | 97.73 | 80.68 | 90.49 | 91.27 | 90.59 |
| LMFNet (2025) | √ | 84.09 | 91.27 | 92.61 | 96.75 | 85.61 | 86.98 | 90.34 | 90.46 |
| Ours (ResNet18) | √ | 84.25 | 91.75 | 92.34 | 97.59 | 84.86 | 91.30 | 91.63 | 91.34 |
| Ours (ResNet50) | √ | 84.92 | 92.53 | 93.44 | 98.92 | 85.32 | 91.79 | 92.73 | 92.77 |
| Ours (ResNet101) | √ | 84.42 | 91.92 | 93.19 | 98.01 | 84.80 | 90.93 | 92.70 | 91.62 |
| Ours (VGG-16) | √ | 84.74 | 92.09 | 92.78 | 98.32 | 84.95 | 91.68 | 91.59 | 91.86 |
| Method | Muti-Modal | mIoU (%) | mF1 (%) | OA (%) | |||||
|---|---|---|---|---|---|---|---|---|---|
| Imp.Surf | Building | Low Veg. | Tree | Car | Total | ||||
| HRCNet (2020) | × | 85.05 | 89.62 | 91.27 | 92.17 | 86.34 | 86.94 | 91.40 | 89.62 |
| MANet (2022) | × | 72.63 | 81.11 | 90.48 | 87.58 | 88.78 | 82.69 | 90.96 | 88.09 |
| SSNet (2024) | × | 77.67 | 85.92 | 80.35 | 88.09 | 69.58 | 72.81 | 77.54 | 77.67 |
| CMANet (2022) | √ | 74.99 | 84.20 | 91.75 | 93.64 | 86.42 | 84.70 | 92.74 | 89.85 |
| CIMFNet (2022) | √ | 75.67 | 84.80 | 91.72 | 93.48 | 87.56 | 85.02 | 85.42 | 88.64 |
| VSGNet (2025) | √ | 87.31 | 92.29 | 85.40 | 98.14 | 91.63 | 90.47 | 93.42 | 91.81 |
| FTransUNet (2024) | √ | 86.14 | 91.83 | 93.67 | 97.78 | 87.92 | 87.84 | 94.31 | 92.30 |
| LMFNet (2025) | √ | 86.09 | 91.36 | 89.38 | 95.31 | 82.11 | 79.07 | 86.08 | 86.39 |
| Ours (ResNet18) | √ | 87.97 | 92.73 | 91.75 | 97.44 | 92.23 | 91.74 | 94.74 | 93.57 |
| Ours (ResNet50) | √ | 88.74 | 93.67 | 93.25 | 98.95 | 93.22 | 91.92 | 95.68 | 94.65 |
| Ours (ResNet101) | √ | 87.68 | 92.65 | 93.23 | 98.56 | 91.68 | 90.19 | 94.40 | 92.89 |
| Ours (VGG-16) | √ | 88.63 | 93.29 | 92.84 | 98.63 | 92.51 | 90.85 | 95.27 | 94.27 |
| Baseline | CLMF (No CLF) | CLMF | MSMA (No Multiscale) | MSMA (No Multimodal Attention) | MSMA | Potsdam | Vaihingen | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| mF1 (%) | mIoU (%) | OA (%) | mF1 (%) | mIoU (%) | OA (%) | ||||||
| √ | × | × | × | × | × | 84.67 | 78.18 | 87.58 | 83.53 | 77.08 | 86.84 |
| √ | √ | × | × | × | × | 88.77 | 82.67 | 89.94 | 88.32 | 81.60 | 89.26 |
| √ | × | √ | × | × | × | 89.75 | 84.30 | 90.86 | 89.51 | 82.06 | 90.53 |
| √ | × | × | √ | × | × | 91.95 | 86.25 | 92.52 | 90.05 | 83.52 | 91.39 |
| √ | × | × | × | √ | × | 90.42 | 84.83 | 91.28 | 89.23 | 82.41 | 90.17 |
| √ | × | × | × | × | √ | 92.38 | 87.16 | 93.24 | 91.28 | 84.35 | 91.89 |
| √ | × | √ | × | × | √ | 93.67 | 88.74 | 94.65 | 92.53 | 84.92 | 92.77 |
| MSMA Layers | mIoU (%) | mF1 (%) | OA (%) | Params (M) | FLOPs (G) |
|---|---|---|---|---|---|
| 1 | 85.21 | 90.17 | 91.35 | 42.63 | 38.72 |
| 2 | 87.35 | 92.04 | 93.12 | 48.95 | 45.26 |
| 3 | 88.74 | 93.67 | 94.65 | 53.28 | 51.83 |
| 4 | 88.62 | 93.51 | 94.52 | 61.57 | 63.41 |
| Method | Multimodal | FLOPs (G) | Params (M) | Memory (MB) | Speed (FPS) | mIoU (%) |
|---|---|---|---|---|---|---|
| HRCNet | × | 49.03 | 46.72 | 3124 | 66.01 | 85.05 |
| MANet | × | 42.58 | 38.96 | 2890 | 58.37 | 72.63 |
| SSNet | × | 39.21 | 35.42 | 2650 | 52.14 | 77.67 |
| CMANet | √ | 68.35 | 59.87 | 3420 | 22.63 | 74.99 |
| CIMFNet | √ | 72.16 | 63.54 | 3610 | 20.89 | 75.67 |
| VSGNet | √ | 58.72 | 72.31 | 3850 | 18.42 | 87.31 |
| FTransUNet | √ | 65.38 | 160.88 | 4210 | 10.25 | 86.14 |
| LMFNet | √ | 55.62 | 68.93 | 3520 | 16.73 | 86.09 |
| Ours (ResNet50) | √ | 56.82 | 68.45 | 3280 | 19.63 | 88.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, X.; Zhang, X.; Wang, L.; Yuan, D.; Xu, S.; Zhou, F.; Zhou, Z. MMA-Net: A Semantic Segmentation Network for High-Resolution Remote Sensing Images Based on Multimodal Fusion and Multi-Scale Multi-Attention Mechanisms. Remote Sens. 2025, 17, 3572. https://doi.org/10.3390/rs17213572
Huang X, Zhang X, Wang L, Yuan D, Xu S, Zhou F, Zhou Z. MMA-Net: A Semantic Segmentation Network for High-Resolution Remote Sensing Images Based on Multimodal Fusion and Multi-Scale Multi-Attention Mechanisms. Remote Sensing. 2025; 17(21):3572. https://doi.org/10.3390/rs17213572
Chicago/Turabian StyleHuang, Xuanxuan, Xuejie Zhang, Longbao Wang, Dandan Yuan, Shufang Xu, Fengguang Zhou, and Zhijun Zhou. 2025. "MMA-Net: A Semantic Segmentation Network for High-Resolution Remote Sensing Images Based on Multimodal Fusion and Multi-Scale Multi-Attention Mechanisms" Remote Sensing 17, no. 21: 3572. https://doi.org/10.3390/rs17213572
APA StyleHuang, X., Zhang, X., Wang, L., Yuan, D., Xu, S., Zhou, F., & Zhou, Z. (2025). MMA-Net: A Semantic Segmentation Network for High-Resolution Remote Sensing Images Based on Multimodal Fusion and Multi-Scale Multi-Attention Mechanisms. Remote Sensing, 17(21), 3572. https://doi.org/10.3390/rs17213572

