Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (27)

Search Parameters:
Keywords = Mamba U-Net

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 10839 KB  
Article
Multi-Pattern Scanning Mamba for Cloud Removal
by Xiaomeng Xin, Ye Deng, Wenli Huang, Yang Wu, Jie Fang and Jinjun Wang
Remote Sens. 2025, 17(21), 3593; https://doi.org/10.3390/rs17213593 - 30 Oct 2025
Viewed by 170
Abstract
Detection of changes in remote sensing relies on clean multi-temporal images, but cloud cover may considerably degrade image quality. Cloud removal, a critical image-restoration task, demands effective modeling of long-range spatial dependencies to reconstruct information under cloud occlusions. While Transformer-based models excel at [...] Read more.
Detection of changes in remote sensing relies on clean multi-temporal images, but cloud cover may considerably degrade image quality. Cloud removal, a critical image-restoration task, demands effective modeling of long-range spatial dependencies to reconstruct information under cloud occlusions. While Transformer-based models excel at handling such spatial modeling, their quadratic computational complexity limits practical application. The recently proposed Mamba, a state space model, offers a computationally efficient alternative for long-range modeling, but its inherent 1D sequential processing is ill-suited to capturing complex 2D spatial contexts in images. To bridge this gap, we propose the multi-pattern scanning Mamba (MPSM) block. Our MPSM block adapts the Mamba architecture for vision tasks by introducing a set of diverse scanning patterns that traverse features along horizontal, vertical, and diagonal paths. This multi-directional approach ensures that each feature aggregates comprehensive contextual information from the entire spatial domain. Furthermore, we introduce a dynamic path-aware (DPA) mechanism to adaptively recalibrate feature contributions from different scanning paths, enhancing the model’s focus on position-sensitive information. To effectively capture both global structures and local details, our MPSM blocks are embedded within a U-Net architecture enhanced with multi-scale supervision. Extensive experiments on the RICE1, RICE2, and T-CLOUD datasets demonstrate that our method achieves state-of-the-art performance while maintaining favorable computational efficiency. Full article
Show Figures

Figure 1

20 pages, 9830 KB  
Article
DB-YOLO: A Dual-Branch Parallel Industrial Defect Detection Network
by Ziling Fan, Yan Zhao, Chaofu Liu and Jinliang Qiu
Sensors 2025, 25(21), 6614; https://doi.org/10.3390/s25216614 - 28 Oct 2025
Viewed by 433
Abstract
Insulator defect detection in power inspection tasks faces significant challenges due to the large variations in defect sizes and complex backgrounds, which hinder the accurate identification of both small and large defects. To overcome these issues, we propose a novel dual-branch YOLO-based algorithm [...] Read more.
Insulator defect detection in power inspection tasks faces significant challenges due to the large variations in defect sizes and complex backgrounds, which hinder the accurate identification of both small and large defects. To overcome these issues, we propose a novel dual-branch YOLO-based algorithm (DB-YOLO), built upon the YOLOv11 architecture. The model introduces two dedicated branches, each tailored for detecting large and small defects, respectively, thereby enhancing robustness and precision across multiple scales. To further strengthen global feature representation, the Mamba mechanism is integrated, improving the detection of large defects in cluttered scenes. An adaptive weighted CIoU loss function, designed based on defect size, is employed to refine localization during training. Additionally, ShuffleNetV2 is embedded as a lightweight backbone to boost inference speed without compromising accuracy. We evaluate DB-YOLO on the following three datasets: the open source CPLID, a self-built insulator defect dataset, and GC-10. Experimental results demonstrate that DB-YOLO achieves superior performance in both accuracy and real-time efficiency compared to existing state-of-the-art methods. These findings suggest that the proposed approach offers strong potential for practical deployment in real-world power inspection applications. Full article
Show Figures

Figure 1

18 pages, 2632 KB  
Article
Adverse-Weather Image Restoration Method Based on VMT-Net
by Zhongmin Liu, Xuewen Yu and Wenjin Hu
J. Imaging 2025, 11(11), 376; https://doi.org/10.3390/jimaging11110376 - 26 Oct 2025
Viewed by 254
Abstract
To address global semantic loss, local detail blurring, and spatial–semantic conflict during image restoration under adverse weather conditions, we propose an image restoration network that integrates Mamba with Transformer architectures. We first design a Vision-Mamba–Transformer (VMT) module that combines the long-range dependency modeling [...] Read more.
To address global semantic loss, local detail blurring, and spatial–semantic conflict during image restoration under adverse weather conditions, we propose an image restoration network that integrates Mamba with Transformer architectures. We first design a Vision-Mamba–Transformer (VMT) module that combines the long-range dependency modeling of Vision Mamba with the global contextual reasoning of Transformers, facilitating the joint modeling of global structures and local details, thus mitigating information loss and detail blurring during restoration. Second, we introduce an Adaptive Content Guidance (ACG) module that employs dynamic gating and spatial–channel attention to enable effective inter-layer feature fusion, thereby enhancing cross-layer semantic consistency. Finally, we embed the VMT and ACG modules into a U-Net backbone, achieving efficient integration of multi-scale feature modeling and cross-layer fusion, significantly improving reconstruction quality under complex weather conditions. The experimental results show that on Snow100K-S/L, VMT-Net improves PSNR over the baseline by approximately 0.89 dB and 0.36 dB, with SSIM gains of about 0.91% and 0.11%, respectively. On Outdoor-Rain and Raindrop, it performs similarly to the baseline and exhibits superior detail recovery in real-world scenes. Overall, the method demonstrates robustness and strong detail restoration across diverse adverse-weather conditions. Full article
Show Figures

Figure 1

22 pages, 6497 KB  
Article
Semantic Segmentation of High-Resolution Remote Sensing Images Based on RS3Mamba: An Investigation of the Extraction Algorithm for Rural Compound Utilization Status
by Xinyu Fang, Zhenbo Liu, Su’an Xie and Yunjian Ge
Remote Sens. 2025, 17(20), 3443; https://doi.org/10.3390/rs17203443 - 15 Oct 2025
Viewed by 310
Abstract
In this study, we utilize Gaofen-2 satellite remote sensing images to optimize and enhance the extraction of feature information from rural compounds, addressing key challenges in high-resolution remote sensing analysis: traditional methods struggle to effectively capture long-distance spatial dependencies for scattered rural compounds. [...] Read more.
In this study, we utilize Gaofen-2 satellite remote sensing images to optimize and enhance the extraction of feature information from rural compounds, addressing key challenges in high-resolution remote sensing analysis: traditional methods struggle to effectively capture long-distance spatial dependencies for scattered rural compounds. To this end, we implement the RS3Mamba+ deep learning model, which introduces the Mamba state space model (SSM) into its auxiliary branching—leveraging Mamba’s sequence modeling advantage to efficiently capture long-range spatial correlations of rural compounds, a critical capability for analyzing sparse rural buildings. This Mamba-assisted branch, combined with multi-directional selective scanning (SS2D) and the enhanced STEM network framework (replacing single 7 × 7 convolution with two-stage 3 × 3 convolutions to reduce information loss), works synergistically with a ResNet-based main branch for local feature extraction. We further introduce a multiscale attention feature fusion mechanism that optimizes feature extraction and fusion, enhances edge contour extraction accuracy in courtyards, and improves the recognition and differentiation of courtyards from regions with complex textures. The feature information of courtyard utilization status is finally extracted using empirical methods. A typical rural area in Weifang City, Shandong Province, is selected as the experimental sample area. Results show that the extraction accuracy reaches an average intersection over union (mIoU) of 79.64% and a Kappa coefficient of 0.7889, improving the F1 score by at least 8.12% and mIoU by 4.83% compared with models such as DeepLabv3+ and Transformer. The algorithm’s efficacy in mitigating false alarms triggered by shadows and intricate textures is particularly salient, underscoring its potential as a potent instrument for the extraction of rural vacancy rates. Full article
Show Figures

Figure 1

21 pages, 6844 KB  
Article
MMFNet: A Mamba-Based Multimodal Fusion Network for Remote Sensing Image Semantic Segmentation
by Jingting Qiu, Wei Chang, Wei Ren, Shanshan Hou and Ronghao Yang
Sensors 2025, 25(19), 6225; https://doi.org/10.3390/s25196225 - 8 Oct 2025
Viewed by 793
Abstract
Accurate semantic segmentation of high-resolution remote sensing imagery is challenged by substantial intra-class variability, inter-class similarity, and the limitations of single-modality data. This paper proposes MMFNet, a novel multimodal fusion network that leverages the Mamba architecture to efficiently capture long-range dependencies for semantic [...] Read more.
Accurate semantic segmentation of high-resolution remote sensing imagery is challenged by substantial intra-class variability, inter-class similarity, and the limitations of single-modality data. This paper proposes MMFNet, a novel multimodal fusion network that leverages the Mamba architecture to efficiently capture long-range dependencies for semantic segmentation tasks. MMFNet adopts a dual-encoder design, combining ResNet-18 for local detail extraction and VMamba for global contextual modelling, striking a balance between segmentation accuracy and computational efficiency. A Multimodal Feature Fusion Block (MFFB) is introduced to effectively integrate complementary information from optical imagery and digital surface models (DSMs), thereby enhancing multimodal feature interaction and improving segmentation accuracy. Furthermore, a frequency-aware upsampling module (FreqFusion) is incorporated in the decoder to enhance boundary delineation and recover fine spatial details. Extensive experiments on the ISPRS Vaihingen and Potsdam benchmarks demonstrate that MMFNet achieves mean IoU scores of 83.50% and 86.06%, outperforming eight state-of-the-art methods while maintaining relatively low computational complexity. These results highlight MMFNet’s potential for efficient and accurate multimodal semantic segmentation in remote sensing applications. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

20 pages, 5150 KB  
Article
VSM-UNet: A Visual State Space Reconstruction Network for Anomaly Detection of Catenary Support Components
by Shuai Xu, Jiyou Fei, Haonan Yang, Xing Zhao, Xiaodong Liu and Hua Li
Sensors 2025, 25(19), 5967; https://doi.org/10.3390/s25195967 - 25 Sep 2025
Viewed by 486
Abstract
Anomaly detection of catenary support components (CSCs) is an important component in railway condition monitoring systems. However, because the abnormal features of CSCs loosening are not obvious, and the current CNN models and visual Transformer models have problems such as limited remote modeling [...] Read more.
Anomaly detection of catenary support components (CSCs) is an important component in railway condition monitoring systems. However, because the abnormal features of CSCs loosening are not obvious, and the current CNN models and visual Transformer models have problems such as limited remote modeling capabilities and secondary computational complexity, it is difficult for existing deep learning anomaly detection methods to effectively exert their performance. The state space model (SSM) represented by Mamba is not only good at long-range modeling, but also maintains linear computational complexity. In this paper, using the state space model (SSM), we proposed a new visual state space reconstruction network (VSM-UNet) for the detection of CSC loosening anomalies. First, based on the structure of UNet, a visual state space block (VSS block) is introduced to capture extensive contextual information and multi-scale features, and an asymmetric encoder–decoder structure is constructed through patch merging operations and patch expanding operations. Secondly, the CBAM attention mechanism is introduced between the encoder–decoder structure to enhance the model’s ability to focus on key abnormal features. Finally, a stable abnormality score calculation module is designed using MLP to evaluate the degree of abnormality of components. The experiment shows that the VSM-UNet model, learning strategy and anomaly score calculation method proposed in this article are effective and reasonable, and have certain advantages. Specifically, the proposed method framework can achieve an AUROC of 0.986 and an FPS of 26.56 in the anomaly detection task of looseness on positioning clamp nuts, U-shaped hoop nuts, and cotton pins. Therefore, the method proposed in this article can be effectively applied to the detection of CSCs abnormalities. Full article
(This article belongs to the Special Issue AI-Enabled Smart Sensors for Industry Monitoring and Fault Diagnosis)
Show Figures

Figure 1

22 pages, 3608 KB  
Article
A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction
by Zhiheng Yang, Hua Zhang and Nanshan Zheng
Appl. Sci. 2025, 15(18), 10063; https://doi.org/10.3390/app151810063 - 15 Sep 2025
Viewed by 625
Abstract
Automatically extracting landslide regions from remote sensing images plays a vital role in the landslide inventory compilation. However, this task remains challenging due to the considerable diversity of landslides in terms of morphology, triggering mechanisms, and internal structure. Thanks to its efficient long-sequence [...] Read more.
Automatically extracting landslide regions from remote sensing images plays a vital role in the landslide inventory compilation. However, this task remains challenging due to the considerable diversity of landslides in terms of morphology, triggering mechanisms, and internal structure. Thanks to its efficient long-sequence modeling, Mamba has emerged as a promising candidate for semantic segmentation tasks. This study adopts Mamba for landslide extraction to improve the recognition of complex geomorphic features. While Mamba demonstrates strong performance, it still faces challenges in capturing spatial dependencies and preserving fine-grained local information. To address these challenges, we propose a multi-scale spatial context-guided network (MSCG-Net). MSCG-Net features a dual-branch architecture, comprising a convolutional neural network (CNN) branch that captures detailed spatial features and an omnidirectional multi-scale Mamba (OMM) branch that models long-range contextual dependencies. We introduce an adaptive feature enhancement module (AFEM) to further enhance feature representation by effectively integrating global context with local details, which enhances both multiscale feature richness and boundary clarity. Additionally, we develop an omnidirectional multiscale scanning (OMSS) mechanism to improve contextual modeling and preserve computational efficiency by integrating omnidirectional attention with multi-scale feature extraction. Comprehensive evaluations on two benchmark datasets demonstrate that MSCG-Net outperforms existing approaches, achieving IoU scores of 78.04% on the Bijie dataset and 81.13% on the GVLM dataset. Furthermore, it exceeds the second-best methods by 2.28% and 4.25% in Boundary IoU, respectively. Full article
(This article belongs to the Section Environmental Sciences)
Show Figures

Figure 1

27 pages, 13123 KB  
Article
Symmetric Boundary-Enhanced U-Net with Mamba Architecture for Glomerular Segmentation in Renal Pathological Images
by Shengnan Zhang, Xinming Cui, Guangkun Ma and Ronghui Tian
Symmetry 2025, 17(9), 1506; https://doi.org/10.3390/sym17091506 - 10 Sep 2025
Viewed by 2784
Abstract
Accurate glomerular segmentation in renal pathological images is a key challenge for chronic kidney disease diagnosis and assessment. Due to the high visual similarity between pathological glomeruli and surrounding tissues in color, texture, and morphology, significant “camouflage phenomena” exist, leading to boundary identification [...] Read more.
Accurate glomerular segmentation in renal pathological images is a key challenge for chronic kidney disease diagnosis and assessment. Due to the high visual similarity between pathological glomeruli and surrounding tissues in color, texture, and morphology, significant “camouflage phenomena” exist, leading to boundary identification difficulties. To address this problem, we propose BM-UNet, a novel segmentation framework that embeds boundary guidance mechanisms into a Mamba architecture with a symmetric encoder–decoder design. The framework enhances feature transmission through explicit boundary detection, incorporating four core modules designed for key challenges in pathological image segmentation. The Multi-scale Adaptive Fusion (MAF) module processes irregular tissue morphology, the Hybrid Boundary Detection (HBD) module handles boundary feature extraction, the Boundary-guided Attention (BGA) module achieves boundary-aware feature refinement, and the Mamba-based Fused Decoder Block (MFDB) completes boundary-preserving reconstruction. By introducing explicit boundary supervision mechanisms, the framework achieves significant segmentation accuracy improvements while maintaining linear computational complexity. Validation on the KPIs2024 glomerular dataset and HuBMAP renal tissue samples demonstrates that BM-UNet achieves a 92.4–95.3% mean Intersection over Union across different CKD pathological conditions, with a 4.57% improvement over the Mamba baseline and a processing speed of 113.7 FPS. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

20 pages, 5187 KB  
Article
IceSnow-Net: A Deep Semantic Segmentation Network for High-Precision Snow and Ice Mapping from UAV Imagery
by Yulin Liu, Shuyuan Yang, Guangyang Zhang, Minghui Wu, Feng Xiong, Pinglv Yang and Zeming Zhou
Remote Sens. 2025, 17(17), 2964; https://doi.org/10.3390/rs17172964 - 27 Aug 2025
Viewed by 829
Abstract
Accurate monitoring of snow and ice cover is essential for climate research and disaster management, but conventional remote sensing methods often struggle in complex terrain and fog-contaminated conditions. To address the challenges of high-resolution UAV-based snow and ice segmentation—including visual similarity, fragmented spatial [...] Read more.
Accurate monitoring of snow and ice cover is essential for climate research and disaster management, but conventional remote sensing methods often struggle in complex terrain and fog-contaminated conditions. To address the challenges of high-resolution UAV-based snow and ice segmentation—including visual similarity, fragmented spatial distributions, and terrain shadow interference—we introduce IceSnow-Net, a U-Net-based architecture enhanced with three key components: (1) a ResNet50 backbone with atrous convolutions to expand the receptive field, (2) an Atrous Spatial Pyramid Pooling (ASPP) module for multi-scale context aggregation, and (3) an auxiliary path loss for deep supervision to enhance boundary delineation and training stability. The model was trained and validated on UAV-captured orthoimagery from Ganzi Prefecture, Sichuan, China. The experimental results demonstrate that IceSnow-Net achieved excellent performance compared to other models, attaining a mean Intersection over Union (mIoU) of 98.74%, while delivering 27% higher computational efficiency than U-Mamba. Ablation studies further validated the individual contributions of each module. Overall, IceSnow-Net provides an effective and accurate solution for cryosphere monitoring in topographically complex environments using UAV imagery. Full article
(This article belongs to the Special Issue Recent Progress in UAV-AI Remote Sensing II)
Show Figures

Figure 1

23 pages, 3410 KB  
Article
LinU-Mamba: Visual Mamba U-Net with Linear Attention to Predict Wildfire Spread
by Henintsoa S. Andrianarivony and Moulay A. Akhloufi
Remote Sens. 2025, 17(15), 2715; https://doi.org/10.3390/rs17152715 - 6 Aug 2025
Viewed by 1372
Abstract
Wildfires have become increasingly frequent and intense due to climate change, posing severe threats to ecosystems, infrastructure, and human lives. As a result, accurate wildfire spread prediction is critical for effective risk mitigation, resource allocation, and decision making in disaster management. In this [...] Read more.
Wildfires have become increasingly frequent and intense due to climate change, posing severe threats to ecosystems, infrastructure, and human lives. As a result, accurate wildfire spread prediction is critical for effective risk mitigation, resource allocation, and decision making in disaster management. In this study, we develop a deep learning model to predict wildfire spread using remote sensing data. We propose LinU-Mamba, a model with a U-Net-based vision Mamba architecture, with light spatial attention in skip connections, and an efficient linear attention mechanism in the encoder and decoder to better capture salient fire information in the dataset. The model is trained and evaluated on the two-dimensional remote sensing dataset Next Day Wildfire Spread (NDWS), which maps fire data across the United States with fire entries, topography, vegetation, weather, drought index, and population density variables. The results demonstrate that our approach achieves superior performance compared to existing deep learning methods applied to the same dataset, while showing an efficient training time. Furthermore, we highlight the impacts of pre-training and feature selection in remote sensing, as well as the impacts of linear attention use in our model. As far as we know, LinU-Mamba is the first model based on Mamba used for wildfire spread prediction, making it a strong foundation for future research. Full article
Show Figures

Graphical abstract

21 pages, 6892 KB  
Article
Enhanced Temporal Action Localization with Separated Bidirectional Mamba and Boundary Correction Strategy
by Xiangbin Liu and Qian Peng
Mathematics 2025, 13(15), 2458; https://doi.org/10.3390/math13152458 - 30 Jul 2025
Viewed by 1094
Abstract
Temporal action localization (TAL) is a research hotspot in video understanding, which aims to locate and classify actions in videos. However, existing methods have difficulties in capturing long-term actions due to focusing on local temporal information, which leads to poor performance in localizing [...] Read more.
Temporal action localization (TAL) is a research hotspot in video understanding, which aims to locate and classify actions in videos. However, existing methods have difficulties in capturing long-term actions due to focusing on local temporal information, which leads to poor performance in localizing long-term temporal sequences. In addition, most methods ignore the boundary importance for action instances, resulting in inaccurate localized boundaries. To address these issues, this paper proposes a state space model for temporal action localization, called Separated Bidirectional Mamba (SBM), which innovatively understands frame changes from the perspective of state transformation. It adapts to different sequence lengths and incorporates state information from the forward and backward for each frame through forward Mamba and backward Mamba to obtain more comprehensive action representations, enhancing modeling capabilities for long-term temporal sequences. Moreover, this paper designs a Boundary Correction Strategy (BCS). It calculates the contribution of each frame to action instances based on the pre-localized results, then adjusts weights of frames in boundary regression to ensure the boundaries are shifted towards the frames with higher contributions, leading to more accurate boundaries. To demonstrate the effectiveness of the proposed method, this paper reports mean Average Precision (mAP) under temporal Intersection over Union (tIoU) thresholds on four challenging benchmarks: THUMOS13, ActivityNet-1.3, HACS, and FineAction, where the proposed method achieves mAPs of 73.7%, 42.0%, 45.2%, and 29.1%, respectively, surpassing the state-of-the-art approaches. Full article
(This article belongs to the Special Issue Advances in Applied Mathematics in Computer Vision)
Show Figures

Figure 1

21 pages, 5527 KB  
Article
SGNet: A Structure-Guided Network with Dual-Domain Boundary Enhancement and Semantic Fusion for Skin Lesion Segmentation
by Haijiao Yun, Qingyu Du, Ziqing Han, Mingjing Li, Le Yang, Xinyang Liu, Chao Wang and Weitian Ma
Sensors 2025, 25(15), 4652; https://doi.org/10.3390/s25154652 - 27 Jul 2025
Viewed by 820
Abstract
Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based [...] Read more.
Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based on UNet or Transformer architectures, often face limitations in regard to fully exploiting lesion features and incur high computational costs, compromising precise lesion delineation. To overcome these challenges, we propose SGNet, a structure-guided network, integrating a hybrid CNN–Mamba framework for robust skin lesion segmentation. The SGNet employs the Visual Mamba (VMamba) encoder to efficiently extract multi-scale features, followed by the Dual-Domain Boundary Enhancer (DDBE), which refines boundary representations and suppresses noise through spatial and frequency-domain processing. The Semantic-Texture Fusion Unit (STFU) adaptively integrates low-level texture with high-level semantic features, while the Structure-Aware Guidance Module (SAGM) generates coarse segmentation maps to provide global structural guidance. The Guided Multi-Scale Refiner (GMSR) further optimizes boundary details through a multi-scale semantic attention mechanism. Comprehensive experiments based on the ISIC2017, ISIC2018, and PH2 datasets demonstrate SGNet’s superior performance, with average improvements of 3.30% in terms of the mean Intersection over Union (mIoU) value and 1.77% in regard to the Dice Similarity Coefficient (DSC) compared to state-of-the-art methods. Ablation studies confirm the effectiveness of each component, highlighting SGNet’s exceptional accuracy and robust generalization for computer-aided dermatological diagnosis. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

21 pages, 5917 KB  
Article
VML-UNet: Fusing Vision Mamba and Lightweight Attention Mechanism for Skin Lesion Segmentation
by Tang Tang, Haihui Wang, Qiang Rao, Ke Zuo and Wen Gan
Electronics 2025, 14(14), 2866; https://doi.org/10.3390/electronics14142866 - 17 Jul 2025
Viewed by 1372
Abstract
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks [...] Read more.
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks is crucial for accurate lesion localization and optimized clinical workflows. We propose the VML-UNet, a lightweight segmentation network with core innovations including the CPMamba module and the multi-scale local supervision module (MLSM). The CPMamba module integrates the visual state space (VSS) block and a channel prior attention mechanism to enable efficient modeling of spatial relationships with linear computational complexity through dynamic channel-space weight allocation, while preserving channel feature integrity. The MLSM enhances local feature perception and reduces the inference burden. Comparative experiments were conducted on three public datasets, including ISIC2017, ISIC2018, and PH2, with ablation experiments performed on ISIC2017. VML-UNet achieves 0.53 M parameters, 2.18 MB memory usage, and 1.24 GFLOPs time complexity, with its performance on the datasets outperforming comparative networks, validating its effectiveness. This study provides valuable references for developing lightweight, high-performance skin lesion segmentation networks, advancing the field of skin lesion segmentation. Full article
(This article belongs to the Section Bioelectronics)
Show Figures

Figure 1

18 pages, 1995 KB  
Article
A U-Shaped Architecture Based on Hybrid CNN and Mamba for Medical Image Segmentation
by Xiaoxuan Ma, Yingao Du and Dong Sui
Appl. Sci. 2025, 15(14), 7821; https://doi.org/10.3390/app15147821 - 11 Jul 2025
Cited by 2 | Viewed by 1453
Abstract
Accurate medical image segmentation plays a critical role in clinical diagnosis, treatment planning, and a wide range of healthcare applications. Although U-shaped CNNs and Transformer-based architectures have shown promise, CNNs struggle to capture long-range dependencies, whereas Transformers suffer from quadratic growth in computational [...] Read more.
Accurate medical image segmentation plays a critical role in clinical diagnosis, treatment planning, and a wide range of healthcare applications. Although U-shaped CNNs and Transformer-based architectures have shown promise, CNNs struggle to capture long-range dependencies, whereas Transformers suffer from quadratic growth in computational cost as image resolution increases. To address these issues, we propose HCMUNet, a novel medical image segmentation model that innovatively combines the local feature extraction capabilities of CNNs with the efficient long-range dependency modeling of Mamba, enhancing feature representation while reducing computational cost. In addition, HCMUNet features a redesigned skip connection and a novel attention module that integrates multi-scale features to recover spatial details lost during down-sampling and to promote richer cross-dimensional interactions. HCMUNet achieves Dice Similarity Coefficients (DSC) of 90.32%, 81.52%, and 92.11% on the ISIC 2018, Synapse multi-organ, and ACDC datasets, respectively, outperforming baseline methods by 0.65%, 1.05%, and 1.39%. Furthermore, HCMUNet consistently outperforms U-Net and Swin-UNet, achieving average Dice score improvements of approximately 5% and 2% across the evaluated datasets. These results collectively affirm the effectiveness and reliability of the proposed model across different segmentation tasks. Full article
Show Figures

Figure 1

20 pages, 3616 KB  
Article
A Mamba U-Net Model for Reconstruction of Extremely Dark RGGB Images
by Yiyao Huang, Xiaobao Zhu, Fenglian Yuan, Jing Shi, Kintak U, Junshuo Qin, Xiangjie Kong and Yiran Peng
Sensors 2025, 25(8), 2464; https://doi.org/10.3390/s25082464 - 14 Apr 2025
Viewed by 1199
Abstract
Currently, most images captured by high-pixel devices such as mobile phones, camcorders, and drones are in RGGB format. However, image quality in extremely dark scenes often needs improvement. Traditional methods for processing these dark RGGB images typically rely on end-to-end U-Net networks and [...] Read more.
Currently, most images captured by high-pixel devices such as mobile phones, camcorders, and drones are in RGGB format. However, image quality in extremely dark scenes often needs improvement. Traditional methods for processing these dark RGGB images typically rely on end-to-end U-Net networks and their enhancement techniques, which require substantial resources and processing time. To tackle this issue, we first converted RGGB images into RGB three-channel images by subtracting the black level and applying linear interpolation. During the training stage, we leveraged the computational efficiency of the state-space model (SSM) and developed a Mamba U-Net end-to-end model to enhance the restoration of extremely dark RGGB images. We utilized the see-in-the-dark (SID) dataset for training, assessing the effectiveness of our approach. Experimental results indicate that our method significantly reduces resource consumption compared to existing single-step training and prior multi-step training techniques, while achieving improved peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) outcomes. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop