MDPI - Publisher of Open Access Journals

39 pages, 13725 KB

Open AccessArticle

SRTSOD-YOLO: Stronger Real-Time Small Object Detection Algorithm Based on Improved YOLO11 for UAV Imageries

by Zechao Xu, Huaici Zhao, Pengfei Liu, Liyong Wang, Guilong Zhang and Yuan Chai

Remote Sens. 2025, 17(20), 3414; https://doi.org/10.3390/rs17203414 - 12 Oct 2025

Viewed by 761

To address the challenges of small target detection in UAV aerial images—such as difficulty in feature extraction, complex background interference, high miss rates, and stringent real-time requirements—this paper proposes an innovative model series named SRTSOD-YOLO, based on YOLO11. The backbone network incorporates a [...] Read more.

To address the challenges of small target detection in UAV aerial images—such as difficulty in feature extraction, complex background interference, high miss rates, and stringent real-time requirements—this paper proposes an innovative model series named SRTSOD-YOLO, based on YOLO11. The backbone network incorporates a Multi-scale Feature Complementary Aggregation Module (MFCAM), designed to mitigate the loss of small target information as network depth increases. By integrating channel and spatial attention mechanisms with multi-scale convolutional feature extraction, MFCAM effectively locates small objects in the image. Furthermore, we introduce a novel neck architecture termed Gated Activation Convolutional Fusion Pyramid Network (GAC-FPN). This module enhances multi-scale feature fusion by emphasizing salient features while suppressing irrelevant background information. GAC-FPN employs three key strategies: adding a detection head with a small receptive field while removing the original largest one, leveraging large-scale features more effectively, and incorporating gated activation convolutional modules. To tackle the issue of positive-negative sample imbalance, we replace the conventional binary cross-entropy loss with an adaptive threshold focal loss in the detection head, accelerating network convergence. Additionally, to accommodate diverse application scenarios, we develop multiple versions of SRTSOD-YOLO by adjusting the width and depth of the network modules: a nano version (SRTSOD-YOLO-n), small (SRTSOD-YOLO-s), medium (SRTSOD-YOLO-m), and large (SRTSOD-YOLO-l). Experimental results on the VisDrone2019 and UAVDT datasets demonstrate that SRTSOD-YOLO-n improves the mAP@0.5 by 3.1% and 1.2% compared to YOLO11n, while SRTSOD-YOLO-l achieves gains of 7.9% and 3.3% over YOLO11l, respectively. Compared to other state-of-the-art methods, SRTSOD-YOLO-l attains the highest detection accuracy while maintaining real-time performance, underscoring the superiority of the proposed approach. Full article

(This article belongs to the Special Issue Advanced Image Processing Algorithms for Object Detection and Tracking in Aerial and Satellite Imagery)

► Show Figures

Figure 1

23 pages, 1950 KB

Open AccessArticle

Multi-Classification Model for PPG Signal Arrhythmia Based on Time–Frequency Dual-Domain Attention Fusion

by Yubo Sun, Keyu Meng, Shipan Lang, Pei Li, Wentao Wang and Jun Yang

Sensors 2025, 25(19), 5985; https://doi.org/10.3390/s25195985 - 27 Sep 2025

Viewed by 642

Abstract

Cardiac arrhythmia is a leading cause of sudden cardiac death. Its early detection and continuous monitoring hold significant clinical value. Photoplethysmography (PPG) signals, owing to their non-invasive nature, low cost, and convenience, have become a vital information source for monitoring cardiac activity and [...] Read more.

Cardiac arrhythmia is a leading cause of sudden cardiac death. Its early detection and continuous monitoring hold significant clinical value. Photoplethysmography (PPG) signals, owing to their non-invasive nature, low cost, and convenience, have become a vital information source for monitoring cardiac activity and vascular health. However, the inherent non-stationarity of PPG signals and significant inter-individual variations pose a major challenge in developing highly accurate and efficient arrhythmia classification methods. To address this challenge, we propose a Fusion Deep Multi-domain Attention Network (Fusion-DMA-Net). Within this framework, we innovatively introduce a cross-scale residual attention structure to comprehensively capture discriminative features in both the time and frequency domains. Additionally, to exploit complementary information embedded in PPG signals across these domains, we develop a fusion strategy integrating interactive attention, self-attention, and gating mechanisms. The proposed Fusion-DMA-Net model is evaluated for classifying four major types of cardiac arrhythmias. Experimental results demonstrate its outstanding classification performance, achieving an overall accuracy of 99.05%, precision of 99.06%, and an F1-score of 99.04%. These results demonstrate the feasibility of the Fusion-DMA-Net model in classifying four types of cardiac arrhythmias using single-channel PPG signals, thereby contributing to the early diagnosis and treatment of cardiovascular diseases and supporting the development of future wearable health technologies. Full article

(This article belongs to the Special Issue Systems for Contactless Monitoring of Vital Signs)

► Show Figures

Figure 1

22 pages, 2395 KB

Open AccessArticle

Multimodal Alignment and Hierarchical Fusion Network for Multimodal Sentiment Analysis

by Jiasheng Huang, Huan Li and Xinyue Mo

Electronics 2025, 14(19), 3828; https://doi.org/10.3390/electronics14193828 - 26 Sep 2025

Viewed by 721

Abstract

The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity [...] Read more.

The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity of modalities to noise. To enhance analytical accuracy, a novel model named MAHFNet is proposed. The proposed architecture is composed of three main components. Firstly, an attention-guided gated interaction alignment module is developed for modeling the semantic interaction between text and image using a gated network and a cross-modal attention mechanism. Next, a contrastive learning mechanism is introduced to encourage the aggregation of semantically aligned image-text pairs. Subsequently, an intra-modality emotion extraction module is designed to extract local emotional features within each modality. This module serves to compensate for detail loss during interaction fusion. The intra-modal local emotion features and cross-modal interaction features are then fed into a hierarchical gated fusion module, where the local features are fused through a cross-gated mechanism to dynamically adjust the contribution of each modality while suppressing modality-specific noise. Then, the fusion results and cross-modal interaction features are further fused using a multi-scale attention gating module to capture hierarchical dependencies between local and global emotional information, thereby enhancing the model’s ability to perceive and integrate emotional cues across multiple semantic levels. Finally, extensive experiments have been conducted on three public multimodal sentiment datasets, with results demonstrating that the proposed model outperforms existing methods across multiple evaluation metrics. Specifically, on the TumEmo dataset, our model achieves improvements of 2.55% in ACC and 2.63% in F1 score compared to the second-best method. On the HFM dataset, these gains reach 0.56% in ACC and 0.9% in F1 score, respectively. On the MVSA-S dataset, these gains reach 0.03% in ACC and 1.26% in F1 score. These findings collectively validate the overall effectiveness of the proposed model. Full article

► Show Figures

Figure 1

22 pages, 5746 KB

Open AccessArticle

AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching

by Qianglong Feng, Xiaofeng Wang, Zhenglin Lu, Haiyu Wang, Tingfeng Qi and Tianyi Zhang

Sensors 2025, 25(18), 5905; https://doi.org/10.3390/s25185905 - 21 Sep 2025

Viewed by 501

Abstract

The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To [...] Read more.

The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To address these problems, we propose an Adaptive Geometry-aware Stereo-KANformer Network (AGSK-Net) for unsupervised stereo matching. Firstly, to resolve the conflict between the isotropic nature of traditional ViT and the epipolar geometry priors in stereo matching, we propose Adaptive Geometry-aware Multi-head Self-Attention (AG-MSA), which embeds epipolar priors via an adaptive hybrid structure of geometric modulation and penalty, enabling geometry-aware global context modeling. Secondly, we design Spatial Group-Rational KAN (SGR-KAN), which integrates the nonlinear capability of rational functions with the spatial awareness of deep convolutions, replacing the MLP with flexible, learnable rational functions to enhance the nonlinear expression ability of complex regions. Finally, we propose a Dynamic Candidate Gated Fusion (DCGF) module that employs dynamic dual-candidate states and spatially aware pre-enhancement to adaptively fuse global and local features across scales. Experiments demonstrate that AGSK-Net achieves state-of-the-art accuracy and generalizability on Scene Flow, KITTI 2012/2015, and Middlebury 2021. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

27 pages, 9667 KB

Open AccessArticle

REU-YOLO: A Context-Aware UAV-Based Rice Ear Detection Model for Complex Field Scenes

by Dongquan Chen, Kang Xu, Wenbin Sun, Danyang Lv, Songmei Yang, Ranbing Yang and Jian Zhang

Agronomy 2025, 15(9), 2225; https://doi.org/10.3390/agronomy15092225 - 20 Sep 2025

Viewed by 435

Abstract

Accurate detection and counting of rice ears serve as a critical indicator for yield estimation, but the complex conditions of paddy fields limit the efficiency and precision of traditional sampling methods. We propose REU-YOLO, a model specifically designed for UAV low-altitude remote sensing [...] Read more.

Accurate detection and counting of rice ears serve as a critical indicator for yield estimation, but the complex conditions of paddy fields limit the efficiency and precision of traditional sampling methods. We propose REU-YOLO, a model specifically designed for UAV low-altitude remote sensing to collect images of rice ears, to address issues such as high-density and complex spatial distribution with occlusion in field scenes. Initially, we combine the Additive Block containing Convolutional Additive Self-attention (CAS) and Convolutional Gated Linear Unit (CGLU) to propose a novel module called Additive-CGLU-C2F (AC-C2f) as a replacement for the original C2f in YOLOv8. It can capture the contextual information between different regions of images and improve the feature extraction ability of the model, introduce the Dropblock strategy to reduce model overfitting, and replace the original SPPF module with the SPPFCSPC-G module to enhance feature representation and improve the capacity of the model to extract features across varying scales. We further propose a feature fusion network called Multi-branch Bidirectional Feature Pyramid Network (MBiFPN), which introduces a small object detection head and adjusts the head to focus more on small and medium-sized rice ear targets. By using adaptive average pooling and bidirectional weighted feature fusion, shallow and deep features are dynamically fused to enhance the robustness of the model. Finally, the Inner-PloU loss function is introduced to improve the adaptability of the model to rice ear morphology. In the self-developed dataset UAVR, REU-YOLO achieves a precision (P) of 90.76%, a recall (R) of 86.94%, an mAP_0.5 of 93.51%, and an mAP_0.5:0.95 of 78.45%, which are 4.22%, 3.76%, 4.85%, and 8.27% higher than the corresponding values obtained with YOLOv8 s, respectively. Furthermore, three public datasets, DRPD, MrMT, and GWHD, were used to perform a comprehensive evaluation of REU-YOLO. The results show that REU-YOLO indicates great generalization capabilities and more stable detection performance. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

24 pages, 2338 KB

Open AccessArticle

DynaNet: A Dynamic Feature Extraction and Multi-Path Attention Fusion Network for Change Detection

by Xue Li, Dong Li, Jiandong Fang and Xueying Feng

Sensors 2025, 25(18), 5832; https://doi.org/10.3390/s25185832 - 18 Sep 2025

Viewed by 502

Abstract

Existing change detection methods often struggle with both inadequate feature fusion and interference from background noise when processing bi-temporal remote sensing imagery. These challenges are particularly pronounced in building change detection, where capturing subtle spatial and semantic dependencies is critical. To address these [...] Read more.

Existing change detection methods often struggle with both inadequate feature fusion and interference from background noise when processing bi-temporal remote sensing imagery. These challenges are particularly pronounced in building change detection, where capturing subtle spatial and semantic dependencies is critical. To address these issues, we propose DynaNet, a dynamic feature extraction and multi-path attention fusion network for change detection. Specifically, we design a Dynamic Feature Extractor (DFE) that leverages a cross-temporal gating mechanism to amplify relevant change signals while suppressing irrelevant variations, enabling high-quality feature alignment. A Contextual Attention Module (CAM) is then employed to incorporate global contextual information, further enhancing the discriminative capability of change regions. Additionally, a Multi-Branch Attention Fusion Module (MBAFM) is introduced to model inter-scale semantic relationships through self- and cross-attention mechanisms, thereby improving the detection of fine-grained structural changes. To facilitate robust evaluation, we present a new benchmark dataset, Inner-CD, comprising 800 pairs of 256 × 256 bi-temporal satellite images with 0.5–2 m spatial resolution. Unlike existing datasets, Inner-CD features abundant buildings in both temporal images, with changes manifested as subtle morphological variations. Extensive experiments demonstrate that DynaNet achieves state-of-the-art performance, obtaining F1-scores of 90.92% on Inner-CD, 92.38% on LEVIR-CD, and 94.35% on WHU-CD. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

28 pages, 1812 KB

Open AccessArticle

An Integrated Hybrid Deep Learning Framework for Intrusion Detection in IoT and IIoT Networks Using CNN-LSTM-GRU Architecture

by Doaa Mohsin Abd Ali Afraji, Jaime Lloret and Lourdes Peñalver

Computation 2025, 13(9), 222; https://doi.org/10.3390/computation13090222 - 14 Sep 2025

Viewed by 1097

Abstract

Intrusion detection systems (IDSs) are critical for securing modern networks, particularly in IoT and IIoT environments where traditional defenses such as firewalls and encryption are insufficient against evolving cyber threats. This paper proposes an enhanced hybrid deep learning model that integrates convolutional neural [...] Read more.

Intrusion detection systems (IDSs) are critical for securing modern networks, particularly in IoT and IIoT environments where traditional defenses such as firewalls and encryption are insufficient against evolving cyber threats. This paper proposes an enhanced hybrid deep learning model that integrates convolutional neural networks (CNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU) in a multi-branch architecture designed to capture spatial and temporal dependencies while minimizing redundant computations. Unlike conventional hybrid approaches, the proposed parallel–sequential fusion framework leverages the strengths of each component independently before merging features, thereby improving detection granularity and learning efficiency. A rigorous preprocessing pipeline is employed to handle real-world data challenges: missing values are imputed using median filling, class imbalance is mitigated through SMOTE (Synthetic Minority Oversampling Technique), and feature scaling is performed with Min–Max normalization to ensure convergence consistency. The methodology is validated on the TON_IoT and CICIDS2017 dataset, chosen for its diversity and realism in IoT/IIoT attack scenarios. Three hybrid models—CNN-LSTM, CNN-GRU, and the proposed CNN-LSTM-GRU—are assessed for binary and multiclass intrusion detection. Experimental results demonstrate that the CNN-LSTM-GRU architecture achieves superior performance, attaining 100% accuracy in binary classification and 97% in multiclass detection, with balanced precision, recall, and F1-scores across all classes. Furthermore, evaluation on the CICIDS2017 dataset confirms the model’s generalization ability, achieving 99.49% accuracy with precision, recall, and F1-scores of 0.9954, 0.9943, and 0.9949, respectively, outperforming CNN-LSTM and CNN-GRU baselines. Compared to existing IDS models, our approach delivers higher robustness, scalability, and adaptability, making it a promising candidate for next-generation IoT/IIoT security. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Figure 1

22 pages, 2230 KB

Open AccessArticle

A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration

by Xun Dou, Ruiang Yang, Zhenlan Dou, Chunyan Zhang, Chen Xu and Jiacheng Li

Sustainability 2025, 17(18), 8162; https://doi.org/10.3390/su17188162 - 10 Sep 2025

Viewed by 399

Abstract

With the advancement of new power system construction, thermostatically controlled loads represented by regional air conditioning systems are being extensively integrated into the grid, leading to a surge in the number of user nodes. This large-scale integration of new loads creates challenges for [...] Read more.

With the advancement of new power system construction, thermostatically controlled loads represented by regional air conditioning systems are being extensively integrated into the grid, leading to a surge in the number of user nodes. This large-scale integration of new loads creates challenges for the grid, as the resulting load data exhibits strong periodicity and randomness over time. These characteristics are influenced by factors like temperature and user behavior. At the same time, spatially adjacent nodes show similarities and clustering in electricity usage. This creates complex spatiotemporal coupling features. These complex spatiotemporal characteristics challenge traditional forecasting methods. Their high model complexity and numerous parameters often lead to overfitting or the curse of dimensionality, which hinders both prediction accuracy and efficiency. To address this issue, this paper proposes a load forecasting method based on spatiotemporal partitioning and collaborative cross-regional attention. First, a spatiotemporal similarity matrix is constructed using the Shape Dynamic Time Warping (ShapeDTW) algorithm and an adaptive Gaussian kernel function based on the Haversine distance. Spectral clustering combined with the Gap Statistic criterion is then applied to adaptively determine the optimal number of partitions, dividing all load nodes in the power grid into several sub-regions with homogeneous spatiotemporal characteristics. Second, for each sub-region, a local Spatiotemporal Graph Convolutional Network (STGCN) model is built. By integrating gated temporal convolution with spatial feature extraction, the model accurately captures the spatiotemporal evolution patterns within each sub-region. On this basis, a cross-regional attention mechanism is designed to dynamically learn the correlation weights among sub-regions, enabling collaborative fusion of global features. Finally, the proposed method is evaluated on a multi-node load dataset. The effectiveness of the approach is validated through comparative experiments and ablation studies (that is, by removing key components of the model to evaluate their contribution to the overall performance). Experimental results demonstrate that the proposed method achieves excellent performance in short-term load forecasting tasks across multiple nodes. Full article

(This article belongs to the Special Issue Energy Conservation Towards a Low-Carbon and Sustainability Future)

► Show Figures

Figure 1

30 pages, 5137 KB

Open AccessArticle

High-Resolution Remote Sensing Imagery Water Body Extraction Using a U-Net with Cross-Layer Multi-Scale Attention Fusion

by Chunyan Huang, Mingyang Wang, Zichao Zhu and Yanling Li

Sensors 2025, 25(18), 5655; https://doi.org/10.3390/s25185655 - 10 Sep 2025

Viewed by 646

Abstract

The accurate extraction of water bodies from remote sensing imagery is crucial for water resource monitoring and flood disaster warning. However, this task faces significant challenges due to complex land cover, large variations in water body morphology and spatial scales, and spectral similarities [...] Read more.

The accurate extraction of water bodies from remote sensing imagery is crucial for water resource monitoring and flood disaster warning. However, this task faces significant challenges due to complex land cover, large variations in water body morphology and spatial scales, and spectral similarities between water and non-water features, leading to misclassification and low accuracy. While deep learning-based methods have become a research hotspot, traditional convolutional neural networks (CNNs) struggle to represent multi-scale features and capture global water body information effectively. To enhance water feature recognition and precisely delineate water boundaries, we propose the AMU-Net model. Initially, an improved residual connection module was embedded into the U-Net backbone to enhance complex feature learning. Subsequently, a multi-scale attention mechanism was introduced, combining grouped channel attention with multi-scale convolutional strategies for lightweight yet precise segmentation. Thereafter, a dual-attention gated modulation module dynamically fusing channel and spatial attention was employed to strengthen boundary localization. Furthermore, a cross-layer geometric attention fusion module, incorporating grouped projection convolution and a triple-level geometric attention mechanism, optimizes segmentation accuracy and boundary quality. Finally, a triple-constraint loss framework synergistically optimized global classification, regional overlap, and background specificity to boost segmentation performance. Evaluated on the GID and WHDLD datasets, AMU-Net achieved remarkable IoU scores of 93.6% and 95.02%, respectively, providing an effective new solution for remote sensing water body extraction. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

27 pages, 7274 KB

Open AccessArticle

Intelligent Identification of Internal Leakage of Spring Full-Lift Safety Valve Based on Improved Convolutional Neural Network

by Shuxun Li, Kang Yuan, Jianjun Hou and Xiaoqi Meng

Sensors 2025, 25(17), 5451; https://doi.org/10.3390/s25175451 - 3 Sep 2025

Viewed by 724

Abstract

In modern industry, the spring full-lift safety valve is a key device for safe pressure relief of pressure-bearing systems. Its valve seat sealing surface is easily damaged after long-term use, causing internal leakage, resulting in safety hazards and economic losses. Therefore, it is [...] Read more.

In modern industry, the spring full-lift safety valve is a key device for safe pressure relief of pressure-bearing systems. Its valve seat sealing surface is easily damaged after long-term use, causing internal leakage, resulting in safety hazards and economic losses. Therefore, it is of great significance to quickly and accurately diagnose its internal leakage state. Among the current methods for identifying fluid machinery faults, model-based methods have difficulties in parameter determination. Although the data-driven convolutional neural network (CNN) has great potential in the field of fault diagnosis, it has problems such as hyperparameter selection relying on experience, insufficient capture of time series and multi-scale features, and lack of research on valve internal leakage type identification. To this end, this study proposes a safety valve internal leakage identification method based on high-frequency FPGA data acquisition and improved CNN. The acoustic emission signals of different internal leakage states are obtained through the high-frequency FPGA acquisition system, and the two-dimensional time–frequency diagram is obtained by short-time Fourier transform and input into the improved model. The model uses the leaky rectified linear unit (LReLU) activation function to enhance nonlinear expression, introduces random pooling to prevent overfitting, optimizes hyperparameters with the help of horned lizard optimization algorithm (HLOA), and integrates the bidirectional gated recurrent unit (BiGRU) and selective kernel attention module (SKAM) to enhance temporal feature extraction and multi-scale feature capture. Experiments show that the average recognition accuracy of the model for the internal leakage state of the safety valve is 99.7%, which is better than the comparison model such as ResNet-18. This method provides an effective solution for the diagnosis of internal leakage of safety valves, and the signal conversion method can be extended to the fault diagnosis of other mechanical equipment. In the future, we will explore the fusion of lightweight networks and multi-source data to improve real-time and robustness. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

21 pages, 7413 KB

Open AccessArticle

PA-MSFormer: A Phase-Aware Multi-Scale Transformer Network for ISAR Image Enhancement

by Jiale Huang, Xiaoyong Li, Lei Liu, Xiaoran Shi and Feng Zhou

Remote Sens. 2025, 17(17), 3047; https://doi.org/10.3390/rs17173047 - 2 Sep 2025

Viewed by 871

Abstract

Inverse Synthetic Aperture Radar (ISAR) imaging plays a crucial role in reconnaissance and target monitoring. However, the presence of uncertain factors often leads to indistinct component visualization and significant noise contamination in imaging results, where weak scattering components are frequently submerged by noise. [...] Read more.

Inverse Synthetic Aperture Radar (ISAR) imaging plays a crucial role in reconnaissance and target monitoring. However, the presence of uncertain factors often leads to indistinct component visualization and significant noise contamination in imaging results, where weak scattering components are frequently submerged by noise. To address these challenges, this paper proposes a Phase-Aware Multi-Scale Transformer network (PA-MSFormer) that simultaneously enhances weak component regions and suppresses noise. Unlike existing methods that struggled with this fundamental trade-off, our approach achieved 70.93 dB PSNR on electromagnetic simulation data, surpassing the previous best method by 0.6 dB, while maintaining only 1.59 million parameters. Specifically, we introduce a phase-aware attention mechanism that separates noise from weak scattering features through complex-domain modulation, a dual-branch fusion network that establishes frequency-domain separability criteria, and a progressive gate fuser that achieves pixel-level alignment between high- and low-frequency features. Extensive experiments on electromagnetic simulation and real-measured datasets demonstrate that PA-MSFormer effectively suppresses noise while significantly enhancing target visualization, establishing a solid foundation for subsequent interpretation tasks. Full article

► Show Figures

Figure 1

36 pages, 25793 KB

Open AccessArticle

DATNet: Dynamic Adaptive Transformer Network for SAR Image Denoising

by Yan Shen, Yazhou Chen, Yuming Wang, Liyun Ma and Xiaolu Zhang

Remote Sens. 2025, 17(17), 3031; https://doi.org/10.3390/rs17173031 - 1 Sep 2025

Viewed by 1026

Abstract

Aiming at the problems of detail blurring and structural distortion caused by speckle noise, additive white noise and hybrid noise interference in synthetic aperture radar (SAR) images, this paper proposes a Dynamic Adaptive Transformer Network (DAT-Net) integrating a dynamic gated attention module and [...] Read more.

Aiming at the problems of detail blurring and structural distortion caused by speckle noise, additive white noise and hybrid noise interference in synthetic aperture radar (SAR) images, this paper proposes a Dynamic Adaptive Transformer Network (DAT-Net) integrating a dynamic gated attention module and a frequency-domain multi-expert enhancement module for SAR image denoising. The proposed model leverages a multi-scale encoder–decoder framework, combining local convolutional feature extraction with global self-attention mechanisms to transcend the limitations of conventional approaches restricted to single noise types, thereby achieving adaptive suppression of multi-source noise contamination. Key innovations comprise the following: (1) A Dynamic Gated Attention Module (DGAM) employing dual-path feature embedding and dynamic thresholding mechanisms to precisely characterize noise spatial heterogeneity; (2) A Frequency-domain Multi-Expert Enhancement (FMEE) Module utilizing Fourier decomposition and expert network ensembles for collaborative optimization of high-frequency and low-frequency components; (3) Lightweight Multi-scale Convolution Blocks (MCB) enhancing cross-scale feature fusion capabilities. Experimental results demonstrate that DAT-Net achieves quantifiable performance enhancement in both simulated and real SAR environments. Compared with other denoising algorithms, the proposed methodology exhibits superior noise suppression across diverse noise scenarios while preserving intrinsic textural features. Full article

► Show Figures

Graphical abstract

21 pages, 3089 KB

Open AccessArticle

Lightweight SCL-YOLOv8: A High-Performance Model for Transmission Line Foreign Object Detection

by Houling Ji, Xishi Chen, Jingpan Bai and Chengjie Gong

Sensors 2025, 25(16), 5147; https://doi.org/10.3390/s25165147 - 19 Aug 2025

Viewed by 919

Abstract

Transmission lines are widely distributed in complex environments, making them susceptible to foreign object intrusion, which could lead to serious consequences, i.e., power outages. Currently, foreign object detection on transmission lines is primarily conducted through UAV-based field inspections. However, the captured data must [...] Read more.

Transmission lines are widely distributed in complex environments, making them susceptible to foreign object intrusion, which could lead to serious consequences, i.e., power outages. Currently, foreign object detection on transmission lines is primarily conducted through UAV-based field inspections. However, the captured data must be transmitted back to a central facility for analysis, resulting in low efficiency and the inability to perform real-time, industrial-grade detection. Although recent YOLO series models can be deployed on UAVs for object detection, these models’ substantial computational requirements often exceed the processing capabilities of UAV platforms, limiting their ability to perform real-time inference tasks. In this study, we propose a novel lightweight detection algorithm, SCL-YOLOv8, which is based on the original YOLO model. We introduce StarNet to replace the CSPDarknet53 backbone as the feature extraction network, thereby reducing computational complexity while maintaining high feature extraction efficiency. We design a lightweight module, CGLU-ConvFormer, which enhances multi-scale feature representation and local feature extraction by integrating convolutional operations with gating mechanisms. Furthermore, the detection head of the original YOLO model is improved by introducing shared convolutional layers and group normalization, which helps reduce redundant computations and enhances multi-scale feature fusion. Experimental results demonstrate that the proposed model not only improves the detection accuracy but also significantly reduces the number of model parameters. Specifically, SCL-YOLOv8 achieves a mAP@0.5 of 94.2% while reducing the number of parameters by 56.8%, FLOPS by 45.7%, and model size by 50% compared with YOLOv8n. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

26 pages, 7726 KB

Open AccessArticle

Multi-Branch Channel-Gated Swin Network for Wetland Hyperspectral Image Classification

by Ruopu Liu, Jie Zhao, Shufang Tian, Guohao Li and Jingshu Chen

Remote Sens. 2025, 17(16), 2862; https://doi.org/10.3390/rs17162862 - 17 Aug 2025

Viewed by 561

Abstract

Hyperspectral classification of wetland environments remains challenging due to high spectral similarity, class imbalance, and blurred boundaries. To address these issues, we propose a novel Multi-Branch Channel-Gated Swin Transformer network (MBCG-SwinNet). In contrast to previous CNN-based designs, our model introduces a Swin Transformer [...] Read more.

Hyperspectral classification of wetland environments remains challenging due to high spectral similarity, class imbalance, and blurred boundaries. To address these issues, we propose a novel Multi-Branch Channel-Gated Swin Transformer network (MBCG-SwinNet). In contrast to previous CNN-based designs, our model introduces a Swin Transformer spectral branch to enhance global contextual modeling, enabling improved spectral discrimination. To effectively fuse spatial and spectral features, we design a residual feature interaction chain comprising a Residual Spatial Fusion (RSF) module, a channel-wise gating mechanism, and a multi-scale feature fusion (MFF) module, which together enhance spatial adaptivity and feature integration. Additionally, a DenseCRF-based post-processing step is employed to refine classification boundaries and suppress salt-and-pepper noise. Experimental results on three UAV-based hyperspectral wetland datasets from the Yellow River Delta (Shandong, China)—NC12, NC13, and NC16—demonstrate that MBCG-SwinNet achieves superior classification performance, with overall accuracies of 97.62%, 82.37%, and 97.32%, respectively—surpassing state-of-the-art methods. The proposed architecture offers a robust and scalable solution for hyperspectral image classification in complex ecological settings. Full article

► Show Figures

Figure 1

24 pages, 3961 KB

Open AccessArticle

Hierarchical Multi-Scale Mamba with Tubular Structure-Aware Convolution for Retinal Vessel Segmentation

by Tao Wang, Dongyuan Tian, Haonan Zhao, Jiamin Liu, Weijie Wang, Chunpei Li and Guixia Liu

Entropy 2025, 27(8), 862; https://doi.org/10.3390/e27080862 - 14 Aug 2025

Viewed by 964

Abstract

Retinal vessel segmentation plays a crucial role in diagnosing various retinal and cardiovascular diseases and serves as a foundation for computer-aided diagnostic systems. Blood vessels in color retinal fundus images, captured using fundus cameras, are often affected by illumination variations and noise, making [...] Read more.

Retinal vessel segmentation plays a crucial role in diagnosing various retinal and cardiovascular diseases and serves as a foundation for computer-aided diagnostic systems. Blood vessels in color retinal fundus images, captured using fundus cameras, are often affected by illumination variations and noise, making it difficult to preserve vascular integrity and posing a significant challenge for vessel segmentation. In this paper, we propose HM-Mamba, a novel hierarchical multi-scale Mamba-based architecture that incorporates tubular structure-aware convolution to extract both local and global vascular features for retinal vessel segmentation. First, we introduce a tubular structure-aware convolution to reinforce vessel continuity and integrity. Building on this, we design a multi-scale fusion module that aggregates features across varying receptive fields, enhancing the model’s robustness in representing both primary trunks and fine branches. Second, we integrate multi-branch Fourier transform with the dynamic state modeling capability of Mamba to capture both long-range dependencies and multi-frequency information. This design enables robust feature representation and adaptive fusion, thereby enhancing the network’s ability to model complex spatial patterns. Furthermore, we propose a hierarchical multi-scale interactive Mamba block that integrates multi-level encoder features through gated Mamba-based global context modeling and residual connections, enabling effective multi-scale semantic fusion and reducing detail loss during downsampling. Extensive evaluations on five widely used benchmark datasets—DRIVE, CHASE_DB1, STARE, IOSTAR, and LES-AV—demonstrate the superior performance of HM-Mamba, yielding Dice coefficients of 0.8327, 0.8197, 0.8239, 0.8307, and 0.8426, respectively. Full article

(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing, Third Edition)

► Show Figures

Figure 1

Search Results (79)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (79)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI