MDPI - Publisher of Open Access Journals

28 pages, 3552 KB

Open AccessArticle

GCN-Embedding Swin–Unet for Forest Remote Sensing Image Semantic Segmentation

by Pingbo Liu, Gui Zhang and Jianzhong Li

Remote Sens. 2026, 18(2), 242; https://doi.org/10.3390/rs18020242 - 12 Jan 2026

Cited by 2 | Viewed by 919

Forest resources are among the most important ecosystems on the earth. The semantic segmentation and accurate positioning of ground objects in forest remote sensing (RS) imagery are crucial to the emergency treatment of forest natural disasters, especially forest fires. Currently, most existing methods for image semantic segmentation are built upon convolutional neural networks (CNNs). Nevertheless, these techniques face difficulties in directly accessing global contextual information and accurately detecting geometric transformations within the image’s target regions. This limitation stems from the inherent locality of convolution operations, which are restricted to processing data structured in Euclidean space and confined to square-shaped regions. Inspired by the graph convolution network (GCN) with robust capabilities in processing irregular and complex targets, as well as Swin Transformers renowned for exceptional global context modeling, we present a hybrid semantic segmentation framework for forest RS imagery termed GSwin–Unet. This framework embeds the GCN model into Swin–Unet architecture to address the issue of low semantic segmentation accuracy of RS imagery in forest scenarios, which is caused by the complex texture features, diverse shapes, and unclear boundaries of land objects. GSwin–Unet features a parallel dual-encoder architecture of GCN and Swin Transformer. First, we integrate the Zero-DCE (Zero-Reference Deep Curve Estimation) algorithm into GSwin–Unet to enhance forest RS image feature representation. Second, a feature aggregation module (FAM) is proposed to bridge the dual encoders by fusing GCN-derived local aggregated features with Swin Transformer-extracted features. Our study demonstrates that, compared with the baseline models TransUnet, Swin–Unet, Unet, and DeepLab V3+, the GSwin–Unet achieves improvements of 7.07%, 5.12%, 8.94%, and 2.69% in the mean Intersection over Union (MIoU) and 3.19%, 1.72%, 4.3%, and 3.69% in the average F1 score (Ave.F1), respectively, on the RGB forest RS dataset. On the NIRGB forest RS dataset, the improvements in MIoU are 5.75%, 3.38%, 6.79%, and 2.44%, and the improvements in Ave.F1 are 4.02%, 2.38%, 4.72%, and 1.67%, respectively. Meanwhile, GSwin–Unet shows excellent adaptability on the selected GID dataset with high forest coverage, where the MIoU and Ave.F1 reach 72.92% and 84.3%, respectively. Full article

► Show Figures

Figure 1

24 pages, 1471 KB

Open AccessArticle

WDM-UNet: A Wavelet-Deformable Gated Fusion Network for Multi-Scale Retinal Vessel Segmentation

by Xinlong Li and Hang Zhou

Sensors 2025, 25(15), 4840; https://doi.org/10.3390/s25154840 - 6 Aug 2025

Cited by 3 | Viewed by 1988

Abstract

Retinal vessel segmentation in fundus images is critical for diagnosing microvascular and ophthalmologic diseases. However, the task remains challenging due to significant vessel width variation and low vessel-to-background contrast. To address these limitations, we propose WDM-UNet, a novel spatial-wavelet dual-domain fusion architecture that integrates spatial and wavelet-domain representations to simultaneously enhance the local detail and global context. The encoder combines a Deformable Convolution Encoder (DCE), which adaptively models complex vascular structures through dynamic receptive fields, and a Wavelet Convolution Encoder (WCE), which captures the semantic and structural contexts through low-frequency components and hierarchical wavelet convolution. These features are further refined by a Gated Fusion Transformer (GFT), which employs gated attention to enhance multi-scale feature integration. In the decoder, depthwise separable convolutions are used to reduce the computational overhead without compromising the representational capacity. To preserve fine structural details and facilitate contextual information flow across layers, the model incorporates skip connections with a hierarchical fusion strategy, enabling the effective integration of shallow and deep features. We evaluated WDM-UNet in three public datasets: DRIVE, STARE, and CHASE_DB1. The quantitative evaluations demonstrate that WDM-UNet consistently outperforms state-of-the-art methods, achieving 96.92% accuracy, 83.61% sensitivity, and an 82.87% F1-score in the DRIVE dataset, with superior performance across all the benchmark datasets in both segmentation accuracy and robustness, particularly in complex vascular scenarios. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

22 pages, 4136 KB

Open AccessArticle

DepthCrackNet: A Deep Learning Model for Automatic Pavement Crack Detection

by Alireza Saberironaghi and Jing Ren

J. Imaging 2024, 10(5), 100; https://doi.org/10.3390/jimaging10050100 - 26 Apr 2024

Cited by 19 | Viewed by 6970

Abstract

Detecting cracks in the pavement is a vital component of ensuring road safety. Since manual identification of these cracks can be time-consuming, an automated method is needed to speed up this process. However, creating such a system is challenging due to factors including crack variability, variations in pavement materials, and the occurrence of miscellaneous objects and anomalies on the pavement. Motivated by the latest progress in deep learning applied to computer vision, we propose an effective U-Net-shaped model named DepthCrackNet. Our model employs the Double Convolution Encoder (DCE), composed of a sequence of convolution layers, for robust feature extraction while keeping parameters optimally efficient. We have incorporated the TriInput Multi-Head Spatial Attention (TMSA) module into our model; in this module, each head operates independently, capturing various spatial relationships and boosting the extraction of rich contextual information. Furthermore, DepthCrackNet employs the Spatial Depth Enhancer (SDE) module, specifically designed to augment the feature extraction capabilities of our segmentation model. The performance of the DepthCrackNet was evaluated on two public crack datasets: Crack500 and DeepCrack. In our experimental studies, the network achieved mIoU scores of 77.0% and 83.9% with the Crack500 and DeepCrack datasets, respectively. Full article

(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)

► Show Figures

Figure 1

26 pages, 54207 KB

Open AccessArticle

Multilocation and Multiscale Learning Framework with Skip Connection for Fault Diagnosis of Bearing under Complex Working Conditions

by Hongwei Ban, Dazhi Wang, Sihan Wang and Ziming Liu

Sensors 2021, 21(9), 3226; https://doi.org/10.3390/s21093226 - 6 May 2021

Cited by 4 | Viewed by 3109

Abstract

Considering various fault states under severe working conditions, the comprehensive feature extraction from the raw vibration signal is still a challenge for the diagnosis task of rolling bearing. To deal with strong coupling and high nonlinearity of the vibration signal, this article proposes a novel multilocation and multikernel scale learning framework based on deep convolution encoder (DCE) and bidirectional long short-term memory network (BiLSTM). The procedure of the proposed method using a cascade structure is developed in three stages. In the first stage, each parallel branch of the multifeature learning combines the skip connection and the DCE, and uses different size kernels. The multifeature learning network can automatically extract and fuse global and local features from different network depths and time scales of the raw vibration signal. In the second stage, the BiLSTM as the feature protection network is designed to employ the internal calculating data of the forward propagation and backward propagation at the same network propagation node. The feature protection network is used for further mining sensitive and complementary features. In the third stage of bearing diagnosis, the classifier identifies the fault types. Consequently, the proposed network scheme can perform well in generalization capability. The performance of the proposed method is verified on the two kinds of bearing datasets. The diagnostic results demonstrate that the proposed method can diagnose multiple fault types more accurately. Also, the method performs better in load and speed adaptation compared with other intelligent fault classification methods. Full article

(This article belongs to the Collection Artificial Intelligence for Data-Driven Fault Detection and Diagnosis)

► Show Figures

Figure 1

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI