Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (116)

Search Parameters:
Keywords = Swin–Unet

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 23715 KB  
Article
Intelligent Landslide Susceptibility Assessment Framework Using the Swin Transformer Technique: A Case Study of Changbai County, Jilin Province, China
by Jiachen Liu, Xiangjin Ran and Xi Wang
Appl. Sci. 2026, 16(1), 301; https://doi.org/10.3390/app16010301 (registering DOI) - 27 Dec 2025
Abstract
Frequent geological hazards such as landslides and rockfalls, intensified by human activities and extreme rainfall, highlight the urgent need for rapid, accurate, and interpretable susceptibility assessment. However, existing methods often struggle with insufficient characterization of spatial heterogeneity, fragmented spatial structures, and limited mechanistic [...] Read more.
Frequent geological hazards such as landslides and rockfalls, intensified by human activities and extreme rainfall, highlight the urgent need for rapid, accurate, and interpretable susceptibility assessment. However, existing methods often struggle with insufficient characterization of spatial heterogeneity, fragmented spatial structures, and limited mechanistic interpretability. To overcome these challenges, this study proposes an intelligent landslide susceptibility assessment framework based on the Swin-UNet architecture, which combines the window-based self-attention mechanism of the Swin Transformer with the encoder–decoder structure of U-Net. Eleven conditioning factors derived from remote sensing data were used to characterize the influencing conditions. Comprehensive experiments conducted in Changbai County, Jilin Province, China, demonstrate that the proposed Swin-UNet framework outperforms traditional models, including the information value method and the standard U-Net. It achieves a maximum overall accuracy of 99.87% and consistently yields higher AUROC, AUPRC, F1-score, and IoU metrics. The generated susceptibility maps exhibit enhanced spatial continuity, improved geomorphological coherence, and greater interpretability of contributing factors. These results confirm the robustness and generalizability of the proposed framework and highlight its potential as a powerful and interpretable tool for large-scale geological hazard assessment, providing a solid technical foundation for refined disaster prevention and mitigation strategies. Full article
(This article belongs to the Section Earth Sciences)
25 pages, 3835 KB  
Article
BuildFunc-MoE: An Adaptive Multimodal Mixture-of-Experts Network for Fine-Grained Building Function Identification
by Ru Wang, Zhan Zhang, Daoyu Shu, Nan Jia, Fang Wan, Wenkai Hu, Xiaoling Chen and Zhenghong Peng
Remote Sens. 2026, 18(1), 90; https://doi.org/10.3390/rs18010090 (registering DOI) - 26 Dec 2025
Abstract
Fine-grained building function identification (BFI) is essential for sustainable urban development, land-use analysis, and data-driven spatial planning. Recent progress in fully supervised semantic segmentation has advanced multimodal BFI; however, most approaches still rely on static fusion and lack explicit multi-scale alignment. As a [...] Read more.
Fine-grained building function identification (BFI) is essential for sustainable urban development, land-use analysis, and data-driven spatial planning. Recent progress in fully supervised semantic segmentation has advanced multimodal BFI; however, most approaches still rely on static fusion and lack explicit multi-scale alignment. As a result, they struggle to adaptively integrate heterogeneous inputs and suppress cross-modal interference, which constrains representation learning. To overcome these limitations, we propose BuildFunc-MoE, an adaptive multimodal Mixture-of-Experts (MoE) network built on an effective end-to-end Swin-UNet backbone. The model treats high-resolution remote sensing imagery as the primary input and integrates auxiliary geospatial data such as nighttime light imagery, DEM, and point-of-interest information. An Adaptive Multimodal Fusion Gate (AMMFG) first refines auxiliary features into informative fused representations, which are then combined with the primary modality and passed through multi-scale Swin-MoE blocks that extend standard Swin Transformer blocks with MoE routing. This enables fine-grained, dynamic fusion and alignment between primary and auxiliary modalities across feature scales. BuildFunc-MoE further introduces a Shared Task-Expert Module (STEM), which extends the MoE framework to share experts between the main BFI task and auxiliary tasks (road extraction, green space segmentation, and water body detection), enabling parameter-level transfer. This design enables complementary feature learning, where structural and contextual information jointly enhance the discrimination of building functions, thereby improving identification accuracy while maintaining model compactness. Experiments on the proposed Wuhan-BF multimodal dataset show that, under identical supervision, BuildFunc-MoE outperforms the strongest multimodal baseline by over 2% on average across metrics. Both PyTorch and LuoJiaNET implementations validate its effectiveness, while the latter achieves higher accuracy and faster inference through optimized computation. Overall, BuildFunc-MoE offers a scalable solution for fine-grained BFI with strong potential for urban planning and sustainable governance. Full article
(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)
Show Figures

Figure 1

20 pages, 7656 KB  
Article
Remote Sensing Extraction and Spatiotemporal Change Analysis of Time-Series Terraces in Complex Terrain on the Loess Plateau Based on a New Swin Transformer Dual-Branch Deformable Boundary Network (STDBNet)
by Guobin Kan, Jianhua Xiao, Benli Liu, Bao Wang, Chenchen He and Hong Yang
Remote Sens. 2026, 18(1), 85; https://doi.org/10.3390/rs18010085 - 26 Dec 2025
Viewed by 51
Abstract
Terrace construction is a critical engineering practice for soil and water conservation as well as sustainable agricultural development on the Loess Plateau (LP), China, where high-precision dynamic monitoring is essential for informed regional ecological governance. To address the challenges of inadequate extraction accuracy [...] Read more.
Terrace construction is a critical engineering practice for soil and water conservation as well as sustainable agricultural development on the Loess Plateau (LP), China, where high-precision dynamic monitoring is essential for informed regional ecological governance. To address the challenges of inadequate extraction accuracy and poor model generalization in time-series terrace mapping amid complex terrain and spectral confounding, this study proposes a novel Swin Transformer-based Terrace Dual-Branch Deformable Boundary Network (STDBNet) that seamlessly integrates multi-source remote sensing (RS) data with deep learning (DL). The STDBNet model integrates the Swin Transformer architecture with a dual-branch attention mechanism and introduces a boundary-assisted supervision strategy, thereby significantly enhancing terrace boundary recognition, multi-source feature fusion, and model generalization capability. Leveraging Sentinel-2 multi-temporal optical imagery and terrain-derived features, we constructed the first 10-m-resolution spatiotemporal dataset of terrace distribution across the LP, encompassing nine annual periods from 2017 to 2025. Performance evaluations demonstrate that STDBNet achieved an overall accuracy (OA) of 95.26% and a mean intersection over union (MIoU) of 86.84%, outperforming mainstream semantic segmentation models including U-Net and DeepLabV3+ by a significant margin. Further analysis reveals the spatiotemporal evolution dynamics of terraces over the nine-year period and their distribution patterns across gradients of key terrain factors. This study not only provides robust data support for research on terraced ecosystem processes and assessments of soil and water conservation efficacy on the LP but also lays a scientific foundation for informing the formulation of regional ecological restoration and land management policies. Full article
(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)
Show Figures

Figure 1

16 pages, 4888 KB  
Article
PGSUNet: A Phenology-Guided Deep Network for Tea Plantation Extraction from High-Resolution Remote Sensing Imagery
by Xiaoyong Zhang, Bochen Jiang and Hongrui Sun
Appl. Sci. 2025, 15(24), 13062; https://doi.org/10.3390/app152413062 - 11 Dec 2025
Viewed by 255
Abstract
Tea, recognized as one of the world’s three principal beverages, plays a significant role both economically and culturally. The accurate, large-scale mapping of tea plantations is crucial for quality control, industry regulation, and ecological assessments. Challenges arise in high-resolution imagery due to the [...] Read more.
Tea, recognized as one of the world’s three principal beverages, plays a significant role both economically and culturally. The accurate, large-scale mapping of tea plantations is crucial for quality control, industry regulation, and ecological assessments. Challenges arise in high-resolution imagery due to the spectral similarities with other land covers and the intricate nature of their boundaries. We introduce a Phenology-Guided SwinUnet (PGSUNet), a semantic segmentation network that amalgamates Swin Transformer encoding with a parallel phenology context branch. An intelligent fusion module within this network generates spatial attention informed by phenological priors, while a dual-head decoder enhances the precision through explicit edge supervision. Using Hangzhou City as the case study, PGSUNet was compared with seven mainstream models, including DeepLabV3+ and SegFormer. It achieved an F1-score of 0.84, outperforming the second-best model by 0.03, and obtained an mIoU of 84.53%, about 2% higher than the next-best result. This study demonstrates that the integration of phenological priors with edge supervision significantly improves the fine-scale extraction of agricultural land covers from complex remote sensing imagery. Full article
(This article belongs to the Section Agricultural Science and Technology)
Show Figures

Figure 1

22 pages, 2302 KB  
Article
MAF-GAN: A Multi-Attention Fusion Generative Adversarial Network for Remote Sensing Image Super-Resolution
by Zhaohe Wang, Hai Tan, Zhongwu Wang, Jinlong Ci and Haoran Zhai
Remote Sens. 2025, 17(24), 3959; https://doi.org/10.3390/rs17243959 - 7 Dec 2025
Viewed by 319
Abstract
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator [...] Read more.
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator of MAF-GAN is built on a U-Net backbone, which incorporates Oriented Convolutions (OrientedConv) to enhance the extraction of directional features and textures, while a novel co-calibration mechanism—incorporating channel, spatial, gating, and spectral attention—is embedded in the encoding path and skip connections, supplemented by an adaptive weighting strategy to enable effective multi-scale feature fusion, and a composite loss function is further designed to integrate adversarial loss, perceptual loss, hybrid pixel loss, total variation loss, and feature consistency loss for optimizing model performance; extensive experiments on the GF7-SR4×-MSD dataset demonstrate that MAF-GAN achieves state-of-the-art performance, delivering a Peak Signal-to-Noise Ratio (PSNR) of 27.14 dB, Structural Similarity Index (SSIM) of 0.7206, Learned Perceptual Image Patch Similarity (LPIPS) of 0.1017, and Spectral Angle Mapper (SAM) of 1.0871, which significantly outperforms mainstream models including SRGAN, ESRGAN, SwinIR, HAT, and ESatSR as well as exceeds traditional interpolation methods (e.g., Bicubic) by a substantial margin, and notably, MAF-GAN maintains an excellent balance between reconstruction quality and inference efficiency to further reinforce its advantages over competing methods; additionally, ablation studies validate the individual contribution of each proposed component to the model’s overall performance, and this method generates super-resolution remote sensing images with more natural visual perception, clearer spatial structures, and superior spectral fidelity, thus offering a reliable technical solution for high-precision remote sensing applications. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Figure 1

24 pages, 3036 KB  
Article
MPG-SwinUMamba: High-Precision Segmentation and Automated Measurement of Eye Muscle Area in Live Sheep Based on Deep Learning
by Zhou Zhang, Yaojing Yue, Fuzhong Li, Leifeng Guo and Svitlana Pavlova
Animals 2025, 15(24), 3509; https://doi.org/10.3390/ani15243509 - 5 Dec 2025
Viewed by 269
Abstract
Accurate EMA assessment in live sheep is crucial for genetic breeding and production management within the meat sheep industry. However, the segmentation accuracy and reliability of existing automated methods are limited by challenges inherent to B-mode ultrasound images, such as low contrast and [...] Read more.
Accurate EMA assessment in live sheep is crucial for genetic breeding and production management within the meat sheep industry. However, the segmentation accuracy and reliability of existing automated methods are limited by challenges inherent to B-mode ultrasound images, such as low contrast and noise interference. To address these challenges, we present MPG-SwinUMamba, a novel deep learning-based segmentation network. This model uniquely combines the state-space model with a U-Net architecture. It also integrates an edge-enhancement multi-scale attention module (MSEE) and a pyramid attention refinement module (PARM) to improve the detection of indistinct boundaries and better capture global context. The global context aggregation decoder (GCAD) is employed to precisely reconstruct the segmentation mask, enabling automated measurement of the EMA. Compared to 12 other leading segmentation models, MPG-SwinUMamba achieved superior performance, with an intersection-over-union of 91.62% and a Dice similarity coefficient of 95.54%. Additionally, automated measurements show excellent agreement with expert manual assessments (correlation coefficient r = 0.9637), with a mean absolute percentage error of only 4.05%. This method offers non-invasive and efficient and objective evaluation of carcass performance in live sheep, with the potential to reduce measurement costs and enhance breeding efficiency. Full article
(This article belongs to the Section Animal System and Management)
Show Figures

Figure 1

21 pages, 11281 KB  
Article
Developing Interpretable Deep Learning Model for Subtropical Forest Type Classification Using Beijing-2, Sentinel-1, and Time-Series NDVI Data of Sentinel-2
by Shudan Chen, Xuefeng Wang, Mengmeng Shi, Guofeng Tao, Shijiao Qiao and Zhulin Chen
Forests 2025, 16(11), 1709; https://doi.org/10.3390/f16111709 - 10 Nov 2025
Viewed by 483
Abstract
Accurate forest type classification in subtropical regions is essential for ecological monitoring and sustainable management. Multimodal remote sensing data provide rich information support, yet the synergy between network architectures and fusion strategies in deep learning models remains insufficiently explored. This study established a [...] Read more.
Accurate forest type classification in subtropical regions is essential for ecological monitoring and sustainable management. Multimodal remote sensing data provide rich information support, yet the synergy between network architectures and fusion strategies in deep learning models remains insufficiently explored. This study established a multimodal deep learning framework with integrated interpretability analysis by combining high-resolution Beijing-2 RGB imagery, Sentinel-1 data, and time-series Sentinel-2 NDVI data. Two representative architectures (U-Net and Swin-UNet) were systematically combined with three fusion strategies, including feature concatenation (Concat), gated multimodal fusion (GMU), and Squeeze-and-Excitation (SE). To quantify feature contributions and decision patterns, three complementary interpretability methods were also employed: Shapley Additive Explanations (SHAP), Grad-CAM++, and occlusion sensitivity. Results show that Swin-UNet consistently outperformed U-Net. The SwinUNet-SE model achieved the highest overall accuracy (OA) of 82.76%, exceeding the best U-Net model by 3.34%, with the largest improvement of 5.8% for mixed forest classification. The effectiveness of fusion strategies depended strongly on architecture. In U-Net, SE and Concat improved OA by 0.91% and 0.23% compared with the RGB baseline, while GMU slightly declined. In Swin-UNet, all strategies achieved higher gains between 1.03% and 2.17%, and SE effectively reduced NDVI sensitivity. SHAP analysis showed that RGB features contributed most (values > 0.0015), NDVI features from winter and spring ranked among the top 50%, and Sentinel-1 features contributed less. These findings reveal how architecture and fusion design interact to enhance multimodal forest classification. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

16 pages, 2865 KB  
Article
Deep Learning Model for Volume Measurement of the Remnant Pancreas After Pancreaticoduodenectomy and Distal Pancreatectomy
by Young Jae Kim, Juhui Lee, Yeon-Ho Park, Jaehun Yang, Doojin Kim, Kwang Gi Kim and Doo-Ho Lee
Diagnostics 2025, 15(22), 2834; https://doi.org/10.3390/diagnostics15222834 - 8 Nov 2025
Viewed by 444
Abstract
Background/Objectives: Accurate volumetry of the remnant pancreas after pancreatectomy is crucial for assessing postoperative endocrine and exocrine function but remains challenging due to anatomical variability and complex postoperative morphology. This study aimed to develop and validate a deep learning (DL)-based model for automatic [...] Read more.
Background/Objectives: Accurate volumetry of the remnant pancreas after pancreatectomy is crucial for assessing postoperative endocrine and exocrine function but remains challenging due to anatomical variability and complex postoperative morphology. This study aimed to develop and validate a deep learning (DL)-based model for automatic segmentation and volumetry of the remnant pancreas using abdominal CT images. Methods: A total of 1067 CT scans from 341 patients who underwent pancreaticoduodenectomy and 512 scans from 184 patients who underwent distal pancreatectomy were analyzed. Ground truth masks were manually delineated and verified through multi-expert consensus. Six 3D segmentation models were trained and compared, including four convolution-based U-Net variants (basic, dense, residual, and residual dense) and two transformer-based models (Trans U-Net and Swin U-Net). Model performance was evaluated using five-fold cross-validation with sensitivity, specificity, precision, accuracy, and Dice similarity coefficient. Results: The Residual Dense U-Net achieved the best performance among convolutional models, with dice similarity coefficient (DSC) values of 0.7655 ± 0.0052 for pancreaticoduodenectomy and 0.8086 ± 0.0091 for distal pancreatectomy. Transformer-based models showed slightly higher DSCs (Swin U-Net: 0.7787 ± 0.0062 and 0.8132 ± 0.0101), with statistically significant but numerically small improvements (p < 0.01). Conclusions: The proposed DL-based approach enables accurate and reproducible postoperative pancreas segmentation and volumetry. Automated volumetric assessment may support objective evaluation of remnant pancreatic function and provide a foundation for predictive modeling in long-term clinical management after pancreatectomy. Full article
(This article belongs to the Special Issue Abdominal Diseases: Diagnosis, Treatment and Management)
Show Figures

Figure 1

17 pages, 2642 KB  
Article
RE-XswinUnet: Rotary Positional Encoding and Inter-Slice Contextual Connections for Multi-Organ Segmentation
by Hang Yang, Chuanghua Yang, Dan Yang, Xiaojing Hang and Wu Liu
Big Data Cogn. Comput. 2025, 9(11), 274; https://doi.org/10.3390/bdcc9110274 - 31 Oct 2025
Viewed by 645
Abstract
Medical image segmentation has been a central research focus in deep learning, but methods based on convolutions have limitations in modeling the long-range validity of images. To overcome this issue, hybrid CNN-Transformer architectures have been explored, with SwinUNet being a classic approach. However, [...] Read more.
Medical image segmentation has been a central research focus in deep learning, but methods based on convolutions have limitations in modeling the long-range validity of images. To overcome this issue, hybrid CNN-Transformer architectures have been explored, with SwinUNet being a classic approach. However, SwinUNet still faces challenges such as insufficient modeling of relative position information, limited feature fusion capabilities in skip connections, and the loss of translational invariance caused by Patch Merging. To overcome these limitations, the architecture RE-XswinUnet is presented as a novel solution for medical image segmentation. In our design, relative position biases are replaced with rotary position embedding to enhance the model’s ability to extract detailed information. During the decoding stage, XskipNet is designed to improve cross-scale feature fusion and learning capabilities. Additionally, an SCAR Block downsampling module is incorporated to preserve translational invariance more effectively. The experimental results demonstrate that RE-XswinUnet achieves improvements of 2.65% and 0.95% in Dice coefficients on the Synapse multi-organ and ACDC datasets, respectively, validating its superiority in medical image segmentation tasks. Full article
Show Figures

Figure 1

17 pages, 3889 KB  
Article
STGAN: A Fusion of Infrared and Visible Images
by Liuhui Gong, Yueping Han and Ruihong Li
Electronics 2025, 14(21), 4219; https://doi.org/10.3390/electronics14214219 - 29 Oct 2025
Viewed by 554
Abstract
The fusion of infrared and visible images provides critical value in computer vision by integrating their complementary information, especially in the field of industrial detection, which provides a more reliable data basis for subsequent defect recognition. This paper presents STGAN, a novel Generative [...] Read more.
The fusion of infrared and visible images provides critical value in computer vision by integrating their complementary information, especially in the field of industrial detection, which provides a more reliable data basis for subsequent defect recognition. This paper presents STGAN, a novel Generative Adversarial Network framework based on a Swin Transformer for high-quality infrared and visible image fusion. Firstly, the generator employs a Swin Transformer as its backbone for feature extraction, which adopts a U-Net architecture, and the improved W-MSA is introduced into the bottleneck layer to enhance local attention and improve the expression ability of cross-modal features. Secondly, the discriminator uses a Markov discriminator to distinguish the difference. Then, the core GAN framework is leveraged to guarantee the retention of both infrared thermal radiation and visible-light texture details in the generated image so as to improve the clarity and contrast of the fused image. Finally, simulation verification showed that six out of seven indicators ranked in the top two, especially in key indicators such as PSNR, VIF, MI, and EN, which achieved optimal or suboptimal values. The experimental results on the general dataset show that this method is superior to the advanced method in terms of subjective vision and objective indicators, and it can effectively enhance the fine structure and thermal anomaly information in the image, which gives it great potential in the application of industrial surface defect detection. Full article
Show Figures

Figure 1

27 pages, 3948 KB  
Article
Fully Automated Segmentation of Cervical Spinal Cord in Sagittal MR Images Using Swin-Unet Architectures
by Rukiye Polattimur, Emre Dandıl, Mehmet Süleyman Yıldırım and Utku Şenol
J. Clin. Med. 2025, 14(19), 6994; https://doi.org/10.3390/jcm14196994 - 2 Oct 2025
Cited by 1 | Viewed by 1162
Abstract
Background/Objectives: The spinal cord is a critical component of the central nervous system that transmits neural signals between the brain and the body’s peripheral regions through its nerve roots. Despite being partially protected by the vertebral column, the spinal cord remains highly [...] Read more.
Background/Objectives: The spinal cord is a critical component of the central nervous system that transmits neural signals between the brain and the body’s peripheral regions through its nerve roots. Despite being partially protected by the vertebral column, the spinal cord remains highly vulnerable to trauma, tumors, infections, and degenerative or inflammatory disorders. These conditions can disrupt neural conduction, resulting in severe functional impairments, such as paralysis, motor deficits, and sensory loss. Therefore, accurate and comprehensive spinal cord segmentation is essential for characterizing its structural features and evaluating neural integrity. Methods: In this study, we propose a fully automated method for segmentation of the cervical spinal cord in sagittal magnetic resonance (MR) images. This method facilitates rapid clinical evaluation and supports early diagnosis. Our approach uses a Swin-Unet architecture, which integrates vision transformer blocks into the U-Net framework. This enables the model to capture both local anatomical details and global contextual information. This design improves the delineation of the thin, curved, low-contrast cervical cord, resulting in more precise and robust segmentation. Results: In experimental studies, the proposed Swin-Unet model (SWU1), which uses transformer blocks in the encoder layer, achieved Dice Similarity Coefficient (DSC) and Hausdorff Distance 95 (HD95) scores of 0.9526 and 1.0707 mm, respectively, for cervical spinal cord segmentation. These results confirm that the model can consistently deliver precise, pixel-level delineations that are structurally accurate, which supports its reliability for clinical assessment. Conclusions: The attention-enhanced Swin-Unet architecture demonstrated high accuracy in segmenting thin and complex anatomical structures, such as the cervical spinal cord. Its ability to generalize with limited data highlights its potential for integration into clinical workflows to support diagnosis, monitoring, and treatment planning. Full article
(This article belongs to the Special Issue Artificial Intelligence and Deep Learning in Medical Imaging)
Show Figures

Figure 1

23 pages, 18084 KB  
Article
WetSegNet: An Edge-Guided Multi-Scale Feature Interaction Network for Wetland Classification
by Li Chen, Shaogang Xia, Xun Liu, Zhan Xie, Haohong Chen, Feiyu Long, Yehong Wu and Meng Zhang
Remote Sens. 2025, 17(19), 3330; https://doi.org/10.3390/rs17193330 - 29 Sep 2025
Cited by 1 | Viewed by 655
Abstract
Wetlands play a crucial role in climate regulation, pollutant filtration, and biodiversity conservation. Accurate wetland classification through high-resolution remote sensing imagery is pivotal for the scientific management, ecological monitoring, and sustainable development of these ecosystems. However, the intricate spatial details in such imagery [...] Read more.
Wetlands play a crucial role in climate regulation, pollutant filtration, and biodiversity conservation. Accurate wetland classification through high-resolution remote sensing imagery is pivotal for the scientific management, ecological monitoring, and sustainable development of these ecosystems. However, the intricate spatial details in such imagery pose significant challenges to conventional interpretation techniques, necessitating precise boundary extraction and multi-scale contextual modeling. In this study, we propose WetSegNet, an edge-guided Multi-Scale Feature Interaction network for wetland classification, which integrates a convolutional neural network (CNN) and Swin Transformer within a U-Net architecture to synergize local texture perception and global semantic comprehension. Specifically, the framework incorporates two novel components: (1) a Multi-Scale Feature Interaction (MFI) module employing cross-attention mechanisms to mitigate semantic discrepancies between encoder–decoder features, and (2) a Multi-Feature Fusion (MFF) module that hierarchically enhances boundary delineation through edge-guided spatial attention (EGA). Experimental validation on GF-2 satellite imagery of Dongting Lake wetlands demonstrates that WetSegNet achieves state-of-the-art performance, with an overall accuracy (OA) of 90.81% and a Kappa coefficient of 0.88. Notably, it achieves classification accuracies exceeding 90% for water, sedge, and reed habitats, surpassing the baseline U-Net by 3.3% in overall accuracy and 0.05 in Kappa. The proposed model effectively addresses heterogeneous wetland classification challenges, validating its capability to reconcile local–global feature representation. Full article
Show Figures

Figure 1

27 pages, 5776 KB  
Article
R-SWTNet: A Context-Aware U-Net-Based Framework for Segmenting Rural Roads and Alleys in China with the SQVillages Dataset
by Jianing Wu, Junqi Yang, Xiaoyu Xu, Ying Zeng, Yan Cheng, Xiaodong Liu and Hong Zhang
Land 2025, 14(10), 1930; https://doi.org/10.3390/land14101930 - 23 Sep 2025
Viewed by 573
Abstract
Rural road networks are vital for rural development, yet narrow alleys and occluded segments remain underrepresented in digital maps due to irregular morphology, spectral ambiguity, and limited model generalization. Traditional segmentation models struggle to balance local detail preservation and long-range dependency modeling, prioritizing [...] Read more.
Rural road networks are vital for rural development, yet narrow alleys and occluded segments remain underrepresented in digital maps due to irregular morphology, spectral ambiguity, and limited model generalization. Traditional segmentation models struggle to balance local detail preservation and long-range dependency modeling, prioritizing either local features or global context alone. Hypothesizing that integrating hierarchical local features and global context will mitigate these limitations, this study aims to accurately segment such rural roads by proposing R-SWTNet, a context-aware U-Net-based framework, and constructing the SQVillages dataset. R-SWTNet integrates ResNet34 for hierarchical feature extraction, Swin Transformer for long-range dependency modeling, ASPP for multi-scale context fusion, and CAM-Residual blocks for channel-wise attention. The SQVillages dataset, built from multi-source remote sensing imagery, includes 18 diverse villages with adaptive augmentation to mitigate class imbalance. Experimental results show R-SWTNet achieves a validation IoU of 54.88% and F1-score of 70.87%, outperforming U-Net and Swin-UNet, and with less overfitting than R-Net and D-LinkNet. Its lightweight variant supports edge deployment, enabling on-site road management. This work provides a data-driven tool for infrastructure planning under China’s Rural Revitalization Strategy, with potential scalability to global unstructured rural road scenes. Full article
(This article belongs to the Section Land Innovations – Data and Machine Learning)
Show Figures

Figure 1

26 pages, 3973 KB  
Article
ViT-DCNN: Vision Transformer with Deformable CNN Model for Lung and Colon Cancer Detection
by Aditya Pal, Hari Mohan Rai, Joon Yoo, Sang-Ryong Lee and Yooheon Park
Cancers 2025, 17(18), 3005; https://doi.org/10.3390/cancers17183005 - 15 Sep 2025
Cited by 1 | Viewed by 1068
Abstract
Background/Objectives: Lung and colon cancers remain among the most prevalent and fatal diseases worldwide, and their early detection is a serious challenge. The data used in this study was obtained from the Lung and Colon Cancer Histopathological Images Dataset, which comprises five different [...] Read more.
Background/Objectives: Lung and colon cancers remain among the most prevalent and fatal diseases worldwide, and their early detection is a serious challenge. The data used in this study was obtained from the Lung and Colon Cancer Histopathological Images Dataset, which comprises five different classes of image data, namely colon adenocarcinoma, colon normal, lung adenocarcinoma, lung normal, and lung squamous cell carcinoma, split into training (80%), validation (10%), and test (10%) subsets. In this study, we propose the ViT-DCNN (Vision Transformer with Deformable CNN) model, with the aim of improving cancer detection and classification using medical images. Methods: The combination of the ViT’s self-attention capabilities with deformable convolutions allows for improved feature extraction, while also enabling the model to learn both holistic contextual information as well as fine-grained localized spatial details. Results: On the test set, the model performed remarkably well, with an accuracy of 94.24%, an F1 score of 94.23%, recall of 94.24%, and precision of 94.37%, confirming its robustness in detecting cancerous tissues. Furthermore, our proposed ViT-DCNN model outperforms several state-of-the-art models, including ResNet-152, EfficientNet-B7, SwinTransformer, DenseNet-201, ConvNext, TransUNet, CNN-LSTM, MobileNetV3, and NASNet-A, across all major performance metrics. Conclusions: By using deep learning and advanced image analysis, this model enhances the efficiency of cancer detection, thus representing a valuable tool for radiologists and clinicians. This study demonstrates that the proposed ViT-DCNN model can reduce diagnostic inaccuracies and improve detection efficiency. Future work will focus on dataset enrichment and enhancing the model’s interpretability to evaluate its clinical applicability. This paper demonstrates the promise of artificial-intelligence-driven diagnostic models in transforming lung and colon cancer detection and improving patient diagnosis. Full article
(This article belongs to the Special Issue Image Analysis and Machine Learning in Cancers: 2nd Edition)
Show Figures

Figure 1

16 pages, 551 KB  
Article
The ASC Module: A GPU Memory-Efficient, Physiology-Aware Approach for Improving Segmentation Accuracy on Poorly Contrast-Enhanced CT Scans—A Preliminary Study
by Zuoyuan Zhao, Toru Higaki, Yanlei Gu and Bisser Raytchev
Bioengineering 2025, 12(9), 974; https://doi.org/10.3390/bioengineering12090974 - 12 Sep 2025
Viewed by 728
Abstract
At present, some aging populations, such as those in Japan, face an underlying risk of inadequate medical resources. Using neural networks to assist doctors in locating the aorta in patients via computed tomography (CT) before surgery is a task with practical value. While [...] Read more.
At present, some aging populations, such as those in Japan, face an underlying risk of inadequate medical resources. Using neural networks to assist doctors in locating the aorta in patients via computed tomography (CT) before surgery is a task with practical value. While UNet and some of its derived models are efficient for the semantic segmentation of optimally contrast-enhanced CT images, their segmentation accuracy on poorly or non-contrasted CT images is too low to provide usable results. To solve this problem, we propose a data-processing module based on the physical–spatial structure and anatomical properties of the aorta, which we call the Automatic Spatial Contrast Module. In an experiment using UNet, Attention UNet, TransUNet, and Swin-UNet as baselines, modified versions of these models using the proposed Automatic Spatial Contrast (ASC) Module showed improvements of up to 24.84% in the Intersection-over-Union (IoU) and 28.13% in the Dice Similarity Coefficient (DSC). Furthermore, the proposed approach entails only a small increase in GPU memory when compared with the baseline models. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Medical Imaging Processing)
Show Figures

Graphical abstract

Back to TopTop