MambaSegNet: A Fast and Accurate High-Resolution Remote Sensing Imagery Ship Segmentation Network
Abstract
Highlights
- This study generated a novel ship semantic segmentation dataset for high-resolution remote sensing images.
- In order to improve the small objects segmentation precision, a novel mamba-style segmentation model was proposed.
- Large-scale real-world scenarios experiment has proven the excellent performance of the proposed model.
- The generated dataset and advanced model have a significant impact on ocean monitoring and other intelligent interpretation fields.
Abstract
1. Introduction
2. Related Work
2.1. Object Detection and Ship Orientation Handling
2.2. Semantic Segmentation for Ship Extraction
2.3. Mamba Long-Range Dependency Modeling
3. Data and Methods
3.1. Data
3.2. MambaSegNet
3.2.1. Framework Overview
3.2.2. Convolutional Layers
3.2.3. Mamba Layer and ResMamba Block
3.2.4. Encoder
3.2.5. Decoder
3.2.6. Bottleneck and Skip Connections
3.3. Loss Function
3.4. Evaluation Metrics
4. Results
4.1. Comparative Model Selection
4.2. Results Presentation
4.3. Quantitative Accuracy Comparison
4.4. Ablation Study
5. Discussion
5.1. MambaSegNet’s Outstanding Performance and Overall Advantages in Semantic Segmentation
5.2. The Broad Applicability and Future Expansion Potential of the Dataset
5.3. Evaluation of MambaSegNet’s Performance on Real-World Remote Sensing Imagery
5.4. Robust Performance of MambaSegNet in Complex Maritime Environments
5.5. Impact of Training Set Sample Size on Model Weights and Performance
5.6. Limitations
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, X.N.; Chen, P.; Yang, J.S.; An, W.T.; Zheng, G.; Luo, D.; Lu, A.Y.; Wang, Z.M. TKP-Net: A Three Keypoint Detection Network for Ships Using SAR Imagery. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2024, 17, 364–376. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. A SAR dataset of ship detection for deep learning under complex backgrounds. Remote Sens. 2019, 11, 765. [Google Scholar] [CrossRef]
- Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
- Li, C.; LI, L.; Wang, S.; Gao, S.; Ye, X. MMShip: Medium resolution multispectral satellite imagery ship dataset. Opt. Precis. Eng. 2023, 31, 1962–1972. [Google Scholar] [CrossRef]
- Li, J.; Li, Z.; Chen, M.; Wang, Y.; Luo, Q. A New Ship Detection Algorithm in Optical Remote Sensing Images Based on Improved R3Det. Remote Sens. 2022, 14, 5048. [Google Scholar] [CrossRef]
- Ali, S.; Siddique, A.; Ateş, H.F.; Güntürk, B.K. Improved YOLOv4 for aerial object detection. In Proceedings of the 2021 29th Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 9–11 June 2021; pp. 1–4. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
- Liu, Y.; Gao, K.; Wang, H.; Yang, Z.; Wang, P.; Ji, S.; Huang, Y.; Zhu, Z.; Zhao, X. A Transformer-based multi-modal fusion network for semantic segmentation of high-resolution remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104083. [Google Scholar] [CrossRef]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. arXiv 2019, arXiv:1904.04514. [Google Scholar] [CrossRef]
- Xie, J.; Pan, B.; Xu, X.; Shi, Z. MiSSNet: Memory-Inspired Semantic Segmentation Augmentation Network for Class-Incremental Learning in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5607913. [Google Scholar] [CrossRef]
- Gao, T.; Ao, W.; Wang, X.-a.; Zhao, Y.; Ma, P.; Xie, M.; Fu, H.; Ren, J.; Gao, Z. Enrich Distill and Fuse: Generalized Few-Shot Semantic Segmentation in Remote Sensing Leveraging Foundation Model’s Assistance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 2771–2780. [Google Scholar]
- Chen, Y.; Wang, J.; Zhang, Y.; Liu, Y. Arbitrary-oriented ship detection based on Kullback-Leibler divergence regression in remote sensing images. Earth Sci. Inform. 2023, 16, 3243–3255. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, J.; Zhang, Y.; Liu, Y.; Wang, J. P2RNet: Fast Maritime Object Detection From Key Points to Region Proposals in Large-Scale Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2024, 17, 9294–9308. [Google Scholar] [CrossRef]
- Chen, Y.; Yan, J.; Liu, Y.; Gao, Z. LRS2-DM: Small Ship Target Detection in Low-Resolution Remote Sensing Images Based on Diffusion Models. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5628615. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Kautz, J. MambaVision: A Hybrid Mamba-Transformer Vision Backbone. arXiv 2024, arXiv:2407.08083. [Google Scholar] [CrossRef]
- Ma, J.; Li, F.; Wang, B. U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation. arXiv 2024, arXiv:2401.04722. [Google Scholar] [CrossRef]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. VMamba: Visual State Space Model. arXiv 2024, arXiv:2401.10166. [Google Scholar] [CrossRef]
- Wang, Z.; Zheng, J.-Q.; Zhang, Y.; Cui, G.; Li, L. Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation. arXiv 2024, arXiv:2402.05079. [Google Scholar] [CrossRef]
- Zhang, H.; Chen, K.; Liu, C.; Chen, H.; Zou, Z.; Shi, Z. CDMamba: Remote sensing image change detection with mamba. arXiv 2024, arXiv:2406.04207. [Google Scholar] [CrossRef]
- Zhao, S.; Chen, H.; Zhang, X.; Xiao, P.; Bai, L.; Ouyang, W. RS-Mamba for Large Remote Sensing Image Dense Prediction. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5633314. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Pun, M.O. RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6011405. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Zou, Z.; Shi, Z. Random access memories: A new paradigm for target detection in high resolution aerial remote sensing images. IEEE Trans. Image Process. 2017, 27, 1100–1111. [Google Scholar] [CrossRef] [PubMed]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4015–4026. [Google Scholar]
- Ravi, N.; Gabeur, V.; Hu, Y.-T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. SAM 2: Segment Anything in Images and Videos. arXiv 2024, arXiv:2408.00714. [Google Scholar] [CrossRef] [PubMed]
- Ma, N.; Liu, C. Multi-Feature FCM Segmentation Algorithm Combining Morphological Reconstruction and Superpixels. Comput. Syst. Appl. 2021, 30, 194–200. [Google Scholar] [CrossRef]
- Gu, A. Modeling Sequences with Structured State Spaces; Stanford University: Stanford, CA, USA, 2023. [Google Scholar]
- Mehta, H.; Gupta, A.; Cutkosky, A.; Neyshabur, B. Long range language modeling via gated state spaces. arXiv 2022, arXiv:2206.13947. [Google Scholar] [CrossRef]
- Wang, J.; Zhu, W.; Wang, P.; Yu, X.; Liu, L.; Omar, M.; Hamid, R. Selective structured state-spaces for long-form video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6387–6397. [Google Scholar]
- Howard, A.G. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Eckle, K.; Schmidt-Hieber, J. A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Netw. 2019, 110, 232–242. [Google Scholar] [CrossRef]
- Shazeer, N. Glu variants improve transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European conference on computer vision, Tel Aviv, Israel, 23–27 October 2022; pp. 205–218. [Google Scholar]
- Zhang, Y.; Zhao, W.; Sun, B.; Zhang, Y.; Wen, W. Point cloud upsampling algorithm: A systematic review. Algorithms 2022, 15, 124. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Li, X.; He, M.; Li, H.; Shen, H. A combined loss-based multiscale fully convolutional network for high-resolution remote sensing image change detection. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8017505. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Cheng, B.; Girshick, R.; Dollár, P.; Berg, A.C.; Kirillov, A. Boundary IoU: Improving object-centric image segmentation evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15334–15342. [Google Scholar]
- Yacouby, R.; Axman, D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Online, 20 November 2020; pp. 79–91. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Granada, Spain, 20 September 2018; pp. 3–11. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, M. LUN-BiSeNetV2: A lightweight unstructured network based on BiSeNetV2 for road scene segmentation. Comput. Sci. Inf. Syst. 2023, 20, 1749–1770. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar] [CrossRef]
- Guo, M.-H.; Lu, C.-Z.; Hou, Q.; Liu, Z.; Cheng, M.-M.; Hu, S.-M. Segnext: Rethinking convolutional attention design for semantic segmentation. Adv. Neural Inf. Process. Syst. 2022, 35, 1140–1156. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19529–19539. [Google Scholar]
- Xu, Q.; Ma, Z.; He, N.; Duan, W. DCSAU-Net: A deeper and more compact split-attention U-Net for medical image segmentation. Comput. Biol. Med. 2023, 154, 106626. [Google Scholar] [CrossRef]
- Chen, J.; Mei, J.; Li, X.; Lu, Y.; Yu, Q.; Wei, Q.; Luo, X.; Xie, Y.; Adeli, E.; Wang, Y.; et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med. Image Anal. 2024, 97, 103280. [Google Scholar] [CrossRef] [PubMed]
- Yu, W.; Wang, X. Mambaout: Do we really need mamba for vision? arXiv 2024, arXiv:2405.07992. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. HQ-ISNet: High-quality instance segmentation for remote sensing imagery. Remote Sens. 2020, 12, 989. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Yan, M.; Wang, C.; Shi, J.; Zhang, X. Object detection and instance segmentation in remote sensing imagery based on precise mask R-CNN. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1454–1457. [Google Scholar]
Datasets | Source | Number | Pixel | Class | Resolution/m | Band | Year |
---|---|---|---|---|---|---|---|
Dataset [2] | Sentinel-1 | 39,729 | 256 × 256 | 1 | 1.7–25 | RGB | 2019 |
HRSID [3] | Google Earth | 5604 | 800 × 800 | 1 | 0.5–3 | RGB | 2020 |
MMShip [4] | Sentinel-2 | 5016 | 512 × 512 | 1 | 10 | RGB/NIR | 2023 |
DIOR_SHIP | Google Earth | 1258 | 800 × 800 | 1 | 0.5–30 | RGB | This work |
LEVIR_SHIP | Google Earth | 1461 | 800 × 600 | 1 | 0.2–1 | RGB | This work |
Model Name | Extracted Features | Advantages |
---|---|---|
improved R3Det | Ships in optical remote sensing images | Detects ships with complex backgrounds and small targets |
improved YOLOv4 | Aerial objects like vehicles, buildings, and trees | High detection accuracy for aerial objects |
DANet | Scene segmentation, recognizing “objects” and “background” categories | Captures spatial and channel dependencies for precise segmentation |
ResUNet-a | Single-temporal high-resolution aerial data | High-resolution segmentation |
TMFNet | Multimodal remote sensing features, including DSM | Identifies complex target types |
HRNet | Semantic segmentation, facial keypoint detection, object detection | Maintains high-resolution representations via multi-scale fusion |
MiSSNet | Local semantic features, class-specific information | Solves class-incremental learning and catastrophic forgetting problems |
FoMA | General knowledge from base models supports images | Improves the few-shot learning performance using base model knowledge |
KRNet | Ships in arbitrary directions, densely arranged | Detects dense and arbitrarily directed ships |
P2RNet | Keypoint extraction, region proposal generation | Fast detection for large-scale images |
LRS2-DM | Small ship targets | Detects small ship targets in low-resolution images |
MambaVision | Visual data | Combines Mamba and Transformer advantages, suitable for various visual tasks |
U-Mamba | Medical images | Efficiently models local and long-range dependencies in medical images |
VMamba | Visual data | Efficient global context modeling for large-scale remote sensing image dense prediction tasks |
Mamba-UNet | Medical image segmentation | Efficient long-range dependency modeling in medical images |
CDMamba | Remote sensing image change detection | Combines global and local features for efficient remote sensing image change detection |
RS-Mamba | High-resolution remote sensing images, semantic segmentation, and change detection | Efficient global context modeling for large-scale remote sensing image processing |
RS3Mamba | Remote sensing images, land cover classification, and semantic segmentation | Combines Mamba and VSS advantages, suitable for complex remote sensing tasks |
Datasets | Small Ships | Medium Ships | Large Ships |
---|---|---|---|
DIOR_SHIP | 637 | 212 | 409 |
LEVIR_SHIP | 1222 | 192 | 10 |
Model | Basic Architecture | Advantages | Disadvantages |
---|---|---|---|
UNet [42] | Based on CNN’s convolutional neural network architecture, utilizing an encoder–decoder structure. | Simple structure, easy to train. Significant performance in medical image segmentation. Supports small sample networks. | Limited capability for modeling long-range dependencies. Poor performance in complex backgrounds or multi-scale targets. |
UNet++ [43] | Based on the U-Net, it utilizes more complex skip connections. | An improvement upon the U-Net enhances multi-scale feature fusion ability and provides better performance in complex segmentation tasks. | High computational cost and longer training times. Prone to overfitting due to a larger number of parameters. |
DeepLabV3+ [44,45] | Based on CNN, utilizing dilated convolutions and a CNN encoder–decoder structure. | Excellent performance in most segmentation tasks. Expands receptive field with dilated convolutions, and captures contextual information effectively. | Requires significant computational resources. Performance on small objects is suboptimal. |
SegNet [46] | Based on CNN, employing a CNN-encoder–decoder structure, with CNN layers utilizing max-pooling. | Strong at handling low-level features. Excellent feature extraction capabilities. | Performance may degrade in complex background segmentation tasks. |
SegFormer [49] | Based Transformer, it employs a self-attention mechanism. | Well-suited for modeling long-range dependencies and global information. Strong multi-scale feature expression. | Slow training process, high computational resource demands. Challenging to interpret model mechanisms. |
SegNext [50] | Based on Transformer and convolutional neural network architecture, using attention mechanisms and convolutional layers. | Combining the advantages of the Transformer and the CNN. Strong feature extraction capability and efficient model design with fewer parameters. | Long training times are highly dependent on data scale and quality. |
Swin Transformer [35,51] | Based on the Transformer architecture, it employs a hierarchical window attention mechanism. | Powerful long-range dependency modeling ability. Excellent performance in visual tasks, particularly in large-scale image segmentation. | Larger model size, extended training time. Performance may not outperform CNN on smaller datasets. |
BiseNetV2 [47,48] | Based on the CNN dual-path design, the spatial path preserves details, contextual path extracts semantics. Computationally efficient. | Real-time performance suitable for mobile deployment. Excellent edge detail retention. | Weak adaptability to complex backgrounds. Limited multi-scale feature fusion capability. |
PIDNet [52] | Based on the CNN three-branch structure. PID control-optimized feature fusion. | High boundary segmentation accuracy in complex scenarios. Effective multi-scale feature fusion. | Large model parameter size. Slow training convergence. Low sensitivity to small targets. |
DCSAUnet [53] | Based on CNN and attention mechanism. CA and SA dynamically enhance key features. | Enhanced feature selection in complex backgrounds. End-to-end training supported. High segmentation accuracy. | Significantly increased computational cost. Prone to overfitting in small-sample training. |
TransUnet [54] | Based on a Transformer + CNN hybrid architecture. Complementary global and local features. | Strong global information capture in medical image segmentation. Precise boundary recovery. | Dependent on large-scale training data. High computational resource consumption. Prone to overfitting with insufficient data. |
MambaSegNet | Based on a CNN and Transformer hybrid model, combining multi-scale feature extraction and visual state space modeling. | Excellent handling of small objects, complex scenes, and boundary details. Outstanding performance in high-precision segmentation tasks. | Requires substantial computational resources. Complex training process with difficult parameter tuning. |
Framework | Parameter | Inference Speed | Theo FLOPs | IOU | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|---|---|
Unet | 0.90 M | 0.0877 s | 44.18G | 0.7650 | 0.8320 | 0.8060 | 0.7820 | 0.7940 |
Unet++ | 9.16 M | 0.1252 s | 1397.13G | 0.7880 | 0.8470 | 0.8290 | 0.8430 | 0.8239 |
DeepLabV3+ | 58.04 M | 0.1191 s | 1655.55G | 0.7923 | 0.8520 | 0.8300 | 0.8509 | 0.8403 |
segNet | 29.5 M | 0.1183 s | 286.12G | 0.6700 | 0.8680 | 0.8380 | 0.8670 | 0.8609 |
segformer | 32 M | 0.0619 s | 390.67G | 0.7791 | 0.8776 | 0.8441 | 0.8776 | 0.8758 |
segNext | 29.5 M | 0.0399 s | 286.34G | 0.7771 | 0.9110 | 0.8479 | 0.9110 | 0.8745 |
swintransformer | 28 M | 0.1090 s | 330.22G | 0.7991 | 0.8928 | 0.8840 | 0.8928 | 0.8884 |
BiSeNetV2 | 55.3 M | 0.0210 s | 118.00G | 0.6902 | 0.9398 | 0.6653 | 0.7043 | 0.7551 |
PIDNet | 29.6 M | 0.5450 s | 41.10G | 0.5573 | 0.9703 | 0.7316 | 0.6789 | 0.6764 |
DCSAU-net | 9.4 M | 0.0769 s | 26.50G | 0.8240 | 0.9886 | 0.9061 | 0.9012 | 0.9099 |
Trans-Unet | 105.3 M | 0.1740 s | 56.70G | 0.8296 | 0.9903 | 0.9263 | 0.9114 | 0.9169 |
MambaSegNet | 22.43 M | 0.0912 s | 201.34G | 0.8208 | 0.9176 | 0.9276 | 0.9076 | 0.9176 |
Framework | Parameter | Inference Speed | Theo FLOPs | IOU | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|---|---|
Unet | 0.90 M | 0.0877 s | 44.18G | 0.5980 | 0.7810 | 0.7400 | 0.7250 | 0.7324 |
Unet++ | 9.16 M | 0.1252 s | 1397.13G | 0.7330 | 0.7970 | 0.8300 | 0.8450 | 0.8374 |
DeepLabV3+ | 58.04 M | 0.1191 s | 1655.55G | 0.7789 | 0.8030 | 0.8030 | 0.6830 | 0.7382 |
segNet | 29.5 M | 0.1183 s | 286.12G | 0.6850 | 0.8160 | 0.8130 | 0.7850 | 0.7988 |
segformer | 32 M | 0.0619 s | 390.67G | 0.5603 | 0.7606 | 0.7260 | 0.7106 | 0.7182 |
segNext | 29.5 M | 0.0399 s | 286.34G | 0.6319 | 0.7611 | 0.7882 | 0.7611 | 0.7744 |
swintransformer | 28 M | 0.1090 s | 330.22G | 0.7333 | 0.8089 | 0.8434 | 0.8089 | 0.8000 |
BiSeNetV2 | 55.3 M | 0.0210 s | 118.00G | 0.5524 | 0.9948 | 0.7081 | 0.7015 | 0.6900 |
PIDNet | 29.6 M | 0.5450 s | 41.10G | 0.4518 | 0.9954 | 0.6717 | 0.5329 | 0.5426 |
DCSAU-net | 9.4 M | 0.0769 s | 26.50G | 0.7091 | 0.9969 | 0.8312 | 0.8218 | 0.8091 |
Trans-Unet | 105.3 M | 0.1740 s | 56.70G | 0.7292 | 0.9970 | 0.8346 | 0.8335 | 0.8177 |
MambaSegNet | 22.43 M | 0.0912 s | 201.34G | 0.8094 | 0.8595 | 0.8695 | 0.8995 | 0.8795 |
Framework | IOU | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
MambaSegNet | 0.8094 | 0.8595 | 0.8695 | 0.8995 | 0.8795 |
MambaSegNet-C | 0.7945 | 0.8326 | 0.8499 | 0.8782 | 0.8594 |
MambSegNet-U | 0.8002 | 0.8423 | 0.8507 | 0.8812 | 0.8678 |
Framework | IOU | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
DIOR_SHIP | 0.8208 | 0.9176 | 0.9276 | 0.9076 | 0.9176 |
LEVIR_SHIP | 0.8094 | 0.8595 | 0.8695 | 0.8995 | 0.8795 |
Mixed dataset | 0.8453 | 0.9393 | 0.9493 | 0.9293 | 0.9293 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wen, R.; Yuan, Y.; Xu, X.; Yin, S.; Chen, Z.; Zeng, H.; Wang, Z. MambaSegNet: A Fast and Accurate High-Resolution Remote Sensing Imagery Ship Segmentation Network. Remote Sens. 2025, 17, 3328. https://doi.org/10.3390/rs17193328
Wen R, Yuan Y, Xu X, Yin S, Chen Z, Zeng H, Wang Z. MambaSegNet: A Fast and Accurate High-Resolution Remote Sensing Imagery Ship Segmentation Network. Remote Sensing. 2025; 17(19):3328. https://doi.org/10.3390/rs17193328
Chicago/Turabian StyleWen, Runke, Yongjie Yuan, Xingyuan Xu, Shi Yin, Zegang Chen, Haibo Zeng, and Zhipan Wang. 2025. "MambaSegNet: A Fast and Accurate High-Resolution Remote Sensing Imagery Ship Segmentation Network" Remote Sensing 17, no. 19: 3328. https://doi.org/10.3390/rs17193328
APA StyleWen, R., Yuan, Y., Xu, X., Yin, S., Chen, Z., Zeng, H., & Wang, Z. (2025). MambaSegNet: A Fast and Accurate High-Resolution Remote Sensing Imagery Ship Segmentation Network. Remote Sensing, 17(19), 3328. https://doi.org/10.3390/rs17193328