MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion
Abstract
Highlights
- Through multi-temporal compositing and cloud detection methods, high-quality cloud-free images are obtained while effectively reducing temporal uncertainty from seasonal harvesting.
- Through Spatial–frequency adaptive fusion design, deep fusion of MSI and SAR data is achieved with full utilization of multi-sensor complementary advantages.
- Provides temporal continuity guarantee for remote sensing monitoring in complex marine environments, overcoming single-temporal imagery limitations.
- Establishes a new multi-modal fusion framework for all-weather high-precision aquaculture area identification.
Abstract
1. Introduction
- We developed a pixel-level multi-modal dataset integrating Sentinel-1 SAR and Sentinel-2 MSI data from five representative Chinese coastal regions, encompassing floating raft aquaculture (FRA) and cage aquaculture (CA) with detailed pixel-level annotations to establish a baseline for multi-modal marine aquaculture extraction.
- We proposed a multi-dimensional cloud detection methodology that integrates Sentinel-2 QA60 quality bands and Cloud Probability data with multi-temporal mean compositing techniques, effectively mitigating interference from varying harvest cycles on data consistency while addressing limitations of single-temporal imagery susceptible to cloud obstruction, thereby enhancing spatiotemporal continuity and data reliability for aquaculture monitoring.
- We designed the Multi-modal Spatial–Frequency Adaptive Fusion Network (MSAFNet), a computationally efficient multi-modal fusion architecture that enhances aquaculture extraction accuracy in complex marine environments through multi-scale feature extraction and frequency-domain adaptive fusion mechanisms, enabling precise all-weather aquaculture area identification while maintaining computational efficiency.
- We developed Multi-scale Dual-path Feature Module (MDFM) and Dynamic Frequency-domain Adaptive Fusion Module (DFAFM) architectures that integrate CNN and Transformer capabilities to enhance multi-modal feature representation, addressing limitations inherent in single-modal approaches within complex marine environments.
2. Materials and Methods
2.1. Study Area
2.2. Multi-Modal Dataset and Data Processing Strategies
2.3. Method
2.3.1. Overall MSAFNet Architecture
2.3.2. Multi-Scale Dual-Path Feature Module
2.3.3. Dynamic Frequency-Domain Adaptive Fusion Module
3. Results
3.1. Experimental Setups
3.2. Evaluation Metrics
3.3. Comparative Experiments
3.4. Ablation Experiments
4. Discussion
4.1. Impact of Combining Different Spectral Bands
4.2. Application of the Model
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
FRA | Floating raft aquaculture |
CA | Cage aquaculture |
MSAFNet | Multi-modal Spatial–Frequency Adaptive Fusion Network |
MDFM | Multi-scale Dual-path Feature Module |
DFAFM | Dynamic Frequency-domain Adaptive Fusion Module |
MSI | Multispectral image |
SAR | Synthetic aperture radar |
GEE | Google Earth Engine |
IRB | Inverted residual block |
Appendix A
References
- Belton, B.; Little, D.C.; Zhang, W.; Edwards, P.; Skladany, M.; Thilsted, S.H. Farming fish in the sea will not nourish the world. Nat. Commun. 2020, 11, 5804. [Google Scholar] [CrossRef]
- Cao, L.; Chen, Y.; Dong, S.; Hanson, A.; Huang, B.; Leadbitter, D.; Little, D.C.; Pikitch, E.K.; Qiu, Y.; Sadovy de Mitcheson, Y. Opportunity for marine fisheries reform in China. Proc. Natl. Acad. Sci. USA 2017, 114, 435–442. [Google Scholar] [CrossRef]
- Naylor, R.L.; Hardy, R.W.; Buschmann, A.H.; Bush, S.R.; Cao, L.; Klinger, D.H.; Little, D.C.; Lubchenco, J.; Shumway, S.E.; Troell, M. A 20-year retrospective review of global aquaculture. Nature 2021, 591, 551–563. [Google Scholar] [CrossRef]
- Costello, C.; Cao, L.; Gelcich, S.; Cisneros-Mata, M.Á.; Free, C.M.; Froehlich, H.E.; Golden, C.D.; Ishimura, G.; Maier, J.; Macadam-Somer, I. The future of food from the sea. Nature 2020, 588, 95–100. [Google Scholar] [CrossRef]
- Long, L.; Liu, H.; Cui, M.; Zhang, C.; Liu, C. Offshore aquaculture in China. Rev. Aquacult. 2024, 16, 254–270. [Google Scholar]
- Zhang, C.; Meng, Q.; Chu, J.; Liu, G.; Wang, C.; Zhao, Y.; Zhao, J. Analysis on the status of mariculture in China and the effectiveness of mariculture management in the Bohai Sea. Mar. Environ. Sci. 2021, 40, 887–894. [Google Scholar]
- Liu, Y.; Wang, Z.; Yang, X.; Wang, S.; Liu, X.; Liu, B.; Zhang, J.; Meng, D.; Ding, K.; Gao, K. Changes in the spatial distribution of mariculture in China over the past 20 years. J. Geogr. Sci. 2023, 33, 2377–2399. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, K. Dynamic evolution of aquaculture along the Bohai sea coastline and implications for ECO-coastal vegetation restoration based on remote sensing. Plants 2024, 13, 160. [Google Scholar] [CrossRef]
- Lu, Y.; Shao, W.; Sun, J. Extraction of offshore aquaculture areas from medium-resolution remote sensing images based on deep learning. Remote Sens. 2021, 13, 3854. [Google Scholar]
- Cheng, B.; Liang, C.; Liu, X.; Liu, Y.; Ma, X.; Wang, G. Research on a novel extraction method using Deep Learning based on GF-2 images for aquaculture areas. Int. J. Remote Sens. 2020, 41, 3575–3591. [Google Scholar]
- Xu, Y.; Lu, L. An attention-fused deep learning model for accurately monitoring cage and raft aquaculture at large-scale using sentinel-2 data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 9099–9109. [Google Scholar] [CrossRef]
- Wang, J.; Fan, J.; Wang, J. MDOAU-Net: A lightweight and robust deep learning model for SAR image segmentation in aquaculture raft monitoring. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4504505. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, C.; Chen, J.; Wang, F. Shape-constrained method of remote sensing monitoring of marine raft aquaculture areas on multitemporal synthetic sentinel-1 imagery. Remote Sens. 2022, 14, 1249. [Google Scholar] [CrossRef]
- Li, J.; Hong, D.; Gao, L.; Yao, J.; Zheng, K.; Zhang, B.; Chanussot, J. Deep learning in multimodal remote sensing data fusion: A comprehensive review. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102926. [Google Scholar] [CrossRef]
- Mena, F.; Arenas, D.; Nuske, M.; Dengel, A. Common practices and taxonomy in deep multiview fusion for remote sensing applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4797–4818. [Google Scholar] [CrossRef]
- Samadzadegan, F.; Toosi, A.; Dadrass Javan, F. A critical review on multi-sensor and multi-platform remote sensing data fusion approaches: Current status and prospects. Int. J. Remote Sens. 2025, 46, 1327–1402. [Google Scholar] [CrossRef]
- Vivone, G.; Deng, L.-J.; Deng, S.; Hong, D.; Jiang, M.; Li, C.; Li, W.; Shen, H.; Wu, X.; Xiao, J.-L. Deep learning in remote sensing image fusion: Methods, protocols, data, and future perspectives. IEEE Geosci. Remote Sens. Mag. 2024, 13, 269–310. [Google Scholar] [CrossRef]
- Liu, Y.; Gao, K.; Wang, H.; Yang, Z.; Wang, P.; Ji, S.; Huang, Y.; Zhu, Z.; Zhao, X. A Transformer-based multi-modal fusion network for semantic segmentation of high-resolution remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104083. [Google Scholar] [CrossRef]
- Wu, H.; Huang, P.; Zhang, M.; Tang, W.; Yu, X. CMTFNet: CNN and multiscale transformer fusion network for remote-sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 2004612. [Google Scholar] [CrossRef]
- Zhou, W.; Jin, J.; Lei, J.; Yu, L. CIMFNet: Cross-layer interaction and multiscale fusion network for semantic segmentation of high-resolution remote sensing images. IEEE J. Sel. Top. Signal Process. 2022, 16, 666–676. [Google Scholar] [CrossRef]
- Wei, K.; Dai, J.; Hong, D.; Ye, Y. MGFNet: An MLP-dominated gated fusion network for semantic segmentation of high-resolution multi-modal remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2024, 135, 104241. [Google Scholar] [CrossRef]
- Zhang, C.; Chang, Y.; Wu, Y.; Shui, Y.; Wang, Z.; Zhu, J. Semantic information guided diffusion posterior sampling for remote sensing image fusion. Sci. Rep. 2024, 14, 27259. [Google Scholar] [CrossRef]
- Bi, H.; Feng, Y.; Tong, B.; Wang, M.; Yu, H.; Mao, Y.; Chang, H.; Diao, W.; Wang, P.; Yu, Y. RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation. arXiv 2025, arXiv:2504.03166. [Google Scholar]
- Liu, C.; Sun, Y.; Zhang, X.; Xu, Y.; Lei, L.; Kuang, G. OSHFNet: A heterogeneous dual-branch dynamic fusion network of optical and SAR images for land use classification. Int. J. Appl. Earth Obs. Geoinf. 2025, 141, 104609. [Google Scholar] [CrossRef]
- Wei, Q.; Liu, Y.; Jiang, X.; Zhang, B.; Su, Q.; Yu, M. DDFNet-A: Attention-Based Dual-Branch Feature Decomposition Fusion Network for Infrared and Visible Image Fusion. Remote Sens. 2024, 16, 1795. [Google Scholar] [CrossRef]
- Liu, X.; Wang, Z.; Yang, X.; Liu, Y.; Liu, B.; Zhang, J.; Gao, K.; Meng, D.; Ding, Y. Mapping China’s offshore mariculture based on dense time-series optical and radar data. Int. J. Digit. Earth 2022, 15, 1326–1349. [Google Scholar] [CrossRef]
- Wang, S.; Huang, C.; Li, H.; Liu, Q. Synergistic integration of time series optical and SAR satellite data for mariculture ex-traction. Remote Sens. 2023, 15, 2243. [Google Scholar] [CrossRef]
- Liu, J.; Lu, Y.; Guo, X.; Ke, W. A deep learning method for offshore raft aquaculture extraction based on medium-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6296–6309. [Google Scholar] [CrossRef]
- Yu, H.; Wang, F.; Hou, Y.; Wang, J.; Zhu, J.; Guo, J. MSARG-Net: A multimodal offshore floating raft aquaculture area extraction network for remote sensing images based on multiscale SAR guidance. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 18319–18334. [Google Scholar] [CrossRef]
- Yu, H.; Wang, F.; Hou, Y.; Wang, J.; Zhu, J.; Cui, Z. CMFPNet: A cross-modal multidimensional frequency perception network for extracting offshore aquaculture areas from MSI and SAR images. Remote Sens. 2024, 16, 2825. [Google Scholar] [CrossRef]
- Sun, Z.; Xu, R.; Du, W.; Wang, L.; Lu, D. High-resolution urban land mapping in China from sentinel 1A/2 imagery based on Google Earth Engine. Remote Sens. 2019, 11, 752. [Google Scholar] [CrossRef]
- Wang, M.; Liu, Z.; Chen, Y. Comparsions of image cloud detection effect based on Sentinel-2 bands/products. Remote Sens. Technol. Appl. 2020, 35, 1167–1177. [Google Scholar]
- Gao, B.-C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
- Pettorelli, N.; Vik, J.O.; Mysterud, A.; Gaillard, J.-M.; Tucker, C.J.; Stenseth, N.C. Using the satellite-derived NDVI to assess ecological responses to environmental change. Trends Ecol. Evol. 2005, 20, 503–510. [Google Scholar] [CrossRef] [PubMed]
- Lu, Y.; Li, Q.; Du, X.; Wang, H.; Liu, J. A method of coastal aquaculture area automatic extraction with high spatial resolution images. Remote Sens. Technol. Appl. 2015, 30, 486–494. [Google Scholar]
- Gao, L.; Wang, C.; Liu, K.; Chen, S.; Dong, G.; Su, H. Extraction of floating raft aquaculture areas from sentinel-1 SAR images by a dense residual U-Net model with pre-trained Resnet34 as the encoder. Remote Sens. 2022, 14, 3003. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, C.; Ji, Y.; Chen, J.; Deng, Y.; Chen, J.; Jie, Y. Combining segmentation network and nonsubsampled contourlet transform for automatic marine raft aquaculture area extraction from sentinel-1 images. Remote Sens. 2020, 12, 4182. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA,, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Hu, X.; Zhang, P.; Zhang, Q.; Yuan, F. GLSANet: Global-local self-attention network for remote sensing image semantic segmentation. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6000105. [Google Scholar] [CrossRef]
- Ma, X.; Lian, R.; Wu, Z.; Guo, H.; Yang, F.; Ma, M.; Wu, S.; Du, Z.; Zhang, W.; Song, S. Logcan++: Adaptive local-global class-aware network for semantic segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4404216. [Google Scholar] [CrossRef]
- Zhu, S.; Zhao, L.; Xiao, Q.; Ding, J.; Li, X. Glffnet: Global–local feature fusion network for high-resolution remote sensing image semantic segmentation. Remote Sens. 2025, 17, 1019. [Google Scholar] [CrossRef]
- Bhatt, D.; Patel, C.; Talsania, H.; Patel, J.; Vaghela, R.; Pandya, S.; Modi, K.; Ghayvat, H. CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics 2021, 10, 2470. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
- Nie, J.; Sun, H.; Sun, X.; Ni, L.; Gao, L. Cross-modal feature fusion and interaction strategy for CNN-transformer-based object detection in visual and infrared remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2023, 21, 5000405. [Google Scholar] [CrossRef]
- Ren, K.; Chen, X.; Wang, Z.; Liang, X.; Chen, Z.; Miao, X. HAM-Transformer: A hybrid adaptive multi-scaled transformer net for remote sensing in complex scenes. Remote Sens. 2023, 15, 4817. [Google Scholar] [CrossRef]
- Xu, X.; Feng, Z.; Cao, C.; Li, M.; Wu, J.; Wu, Z.; Shang, Y.; Ye, S. An improved swin transformer-based model for remote sensing object detection and instance segmentation. Remote Sens. 2021, 13, 4779. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Hafner, S. Multi-Modal Deep Learning with Sentinel-1 and Sentinel-2 Data for Urban Mapping and Change Detection. Licentiate Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2022. [Google Scholar]
- Hafner, S.; Nascetti, A.; Azizpour, H.; Ban, Y. Sentinel-1 and sentinel-2 data fusion for urban change detection using a dual stream u-net. IEEE Geosci. Remote Sens. Lett. 2021, 19, 4019805. [Google Scholar] [CrossRef]
- Fan, X.; Zhou, W.; Qian, X.; Yan, W. Progressive adjacent-layer coordination symmetric cascade network for semantic segmentation of multimodal remote sensing images. Expert Syst. Appl. 2024, 238, 121999. [Google Scholar] [CrossRef]
- Xiao, T.; Liu, Y.; Huang, Y.; Li, M.; Yang, G. Enhancing multiscale representations with transformer for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5605116. [Google Scholar] [CrossRef]
- Ye, Z.; Li, Y.; Li, Z.; Liu, H.; Zhang, Y.; Li, W. Attention-Multi-Scale Network for Semantic Segmentation of Multi-Modal Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5610315. [Google Scholar] [CrossRef]
- Guo, M.-H.; Lu, C.-Z.; Liu, Z.-N.; Cheng, M.-M.; Hu, S.-M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
- Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Qin, Z.; Zhang, P.; Wu, F.; Li, X. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 783–792. [Google Scholar]
- Li, X.; Xu, F.; Yu, A.; Lyu, X.; Gao, H.; Zhou, J. A frequency decoupling network for semantic segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5607921. [Google Scholar] [CrossRef]
- Duhamel, P.; Vetterli, M. Fast Fourier transforms: A tutorial review and a state of the art. Signal Process. 1990, 19, 259–299. [Google Scholar] [CrossRef]
- Jiang, X.; Zhang, X.; Gao, N.; Deng, Y. When fast fourier transform meets transformer for image restoration. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 381–402. [Google Scholar]
- Ho, Y.; Wookey, S. The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE Access 2019, 8, 4806–4813. [Google Scholar] [CrossRef]
- Jadon, S. A survey of loss functions for semantic segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Via del Mar, Chile, 27–29 October 2020; pp. 1–7. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 205–218. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Parameter | Value |
---|---|
Optimize | SGD |
Initial learning rate | 0.01 |
Number of epochs | 80 |
Batch size | 8 |
Momentum | 0.9 |
Data Augmentation | Horizontal flipping, vertical flipping, diagonal mirroring |
Learning Rate Scheduler | Step decay |
Methods | FRA | CA | Mean | Model Complexity | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
IoU | F1 | Kappa | IoU | F1 | Kappa | IoU | F1 | Kappa | Params (M) | FLOPS (G) | |
DLV3+ (R50) [64] | 71.40 | 83.32 | 82.58 | 65.06 | 78.83 | 98.94 | 68.23 | 81.08 | 90.76 | 40.36 | 35.14 |
DLV3+ (R101) [64] | 71.98 | 83.71 | 83.00 | 68.57 | 81.35 | 99.01 | 70.28 | 82.53 | 91.01 | 59.35 | 44.89 |
ResNet50 [63] | 74.04 | 85.08 | 84.34 | 32.76 | 49.35 | 98.00 | 53.40 | 67.22 | 91.17 | 28.79 | 69.34 |
ResNet101 [63] | 72.94 | 84.36 | 83.78 | 60.75 | 75.58 | 98.74 | 66.85 | 79.97 | 91.26 | 47.79 | 108.34 |
SwinUnet [65] | 70.13 | 82.44 | 81.44 | 33.36 | 50.03 | 98.11 | 51.75 | 66.24 | 89.78 | 27.15 | 15.52 |
TransUnet [66] | 78.19 | 87.76 | 87.23 | 72.80 | 84.26 | 99.16 | 75.50 | 86.01 | 93.20 | 93.23 | 64.50 |
SwinT [49] | 75.19 | 85.84 | 85.24 | 64.61 | 78.50 | 98.82 | 69.9 | 82.17 | 92.03 | 34.47 | 22.65 |
SRUnet [28] | 72.98 | 84.38 | 83.59 | 38.61 | 55.71 | 98.14 | 55.80 | 70.05 | 90.87 | 30.61 | 18.12 |
CMFPNet [30] | 78.15 | 87.73 | 87.23 | 68.37 | 81.22 | 99.05 | 73.26 | 84.48 | 93.14 | 21.28 | 22.82 |
MSAFNet (ours) | 78.22 | 87.78 | 87.25 | 75.64 | 86.13 | 99.26 | 76.93 | 86.96 | 93.26 | 24.10 | 20.70 |
Methods | FRA | CA | Mean | Model Complexity | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
IoU | F1 | Kappa | IoU | F1 | Kappa | IoU | F1 | Kappa | Params (M) | FLOPS (G) | |
Baseline | 76.94 | 86.97 | 86.44 | 67.01 | 80.25 | 99.01 | 71.98 | 83.61 | 92.73 | 13.824 | 20.778 |
+MDFM | 78.12 | 87.82 | 87.17 | 72.85 | 84.29 | 99.20 | 75.49 | 86.01 | 93.19 | 17.524 | 21.253 |
+MDFM+DFAFM | 78.22 | 87.78 | 87.25 | 75.64 | 86.13 | 99.26 | 76.93 | 86.96 | 93.26 | 24.104 | 20.695 |
Methods | FRA | CA | Mean | Model Complexity | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
IoU | F1 | Kappa | IoU | F1 | Kappa | IoU | F1 | Kappa | Params (M) | FLOPS (G) | |
Branch (a) | 75.81 | 86.24 | 85.49 | 64.81 | 78.65 | 99.02 | 70.31 | 82.45 | 92.26 | 16.839 | 18.428 |
Branch (b) | 75.41 | 85.98 | 85.39 | 74.80 | 85.58 | 99.25 | 75.11 | 85.78 | 92.32 | 19.703 | 18.606 |
Branch (a + b + c) | 78.22 | 87.78 | 87.25 | 75.64 | 86.13 | 99.26 | 76.93 | 86.96 | 93.26 | 24.104 | 20.695 |
Groups | FRA | CA | Mean | ||||||
---|---|---|---|---|---|---|---|---|---|
IoU | F1 | Kappa | IoU | F1 | Kappa | IoU | F1 | Kappa | |
Group 1 | 75.09 | 85.77 | 85.12 | 70.99 | 83.03 | 99.15 | 73.04 | 84.40 | 92.14 |
Group 2 | 76.73 | 86.83 | 86.20 | 71.44 | 83.34 | 99.15 | 74.09 | 85.09 | 92.68 |
Group 3 | 77.69 | 87.44 | 86.92 | 72.97 | 84.37 | 99.16 | 75.33 | 85.91 | 93.04 |
Group 4 | 78.22 | 87.78 | 87.25 | 75.64 | 86.13 | 99.26 | 76.93 | 86.96 | 93.26 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, G.; Lu, Y. MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion. Remote Sens. 2025, 17, 3425. https://doi.org/10.3390/rs17203425
Wu G, Lu Y. MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion. Remote Sensing. 2025; 17(20):3425. https://doi.org/10.3390/rs17203425
Chicago/Turabian StyleWu, Guolong, and Yimin Lu. 2025. "MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion" Remote Sensing 17, no. 20: 3425. https://doi.org/10.3390/rs17203425
APA StyleWu, G., & Lu, Y. (2025). MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion. Remote Sensing, 17(20), 3425. https://doi.org/10.3390/rs17203425