MDPI - Publisher of Open Access Journals

22 pages, 24173 KiB

Open AccessArticle

ScaleViM-PDD: Multi-Scale EfficientViM with Physical Decoupling and Dual-Domain Fusion for Remote Sensing Image Dehazing

by Hao Zhou, Yalun Wang, Wanting Peng, Xin Guan and Tao Tao

Remote Sens. 2025, 17(15), 2664; https://doi.org/10.3390/rs17152664 (registering DOI) - 1 Aug 2025

Abstract

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm [...] Read more.

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm for vision tasks, showing great promise due to their computational efficiency and robust capacity to model global dependencies. However, most existing learning-based dehazing methods lack physical interpretability, leading to weak generalization. Furthermore, they typically rely on spatial features while neglecting crucial frequency domain information, resulting in incomplete feature representation. To address these challenges, we propose ScaleViM-PDD, a novel network that enhances an SSM backbone with two key innovations: a Multi-scale EfficientViM with Physical Decoupling (ScaleViM-P) module and a Dual-Domain Fusion (DD Fusion) module. The ScaleViM-P module synergistically integrates a Physical Decoupling block within a Multi-scale EfficientViM architecture. This design enables the network to mitigate haze interference in a physically grounded manner at each representational scale while simultaneously capturing global contextual information to adaptively handle complex haze distributions. To further address detail loss, the DD Fusion module replaces conventional skip connections by incorporating a novel Frequency Domain Module (FDM) alongside channel and position attention. This allows for a more effective fusion of spatial and frequency features, significantly improving the recovery of fine-grained details, including color and texture information. Extensive experiments on nine publicly available remote sensing datasets demonstrate that ScaleViM-PDD consistently surpasses state-of-the-art baselines in both qualitative and quantitative evaluations, highlighting its strong generalization ability. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing (5th Edition))

► Show Figures

Figure 1

24 pages, 3953 KiB

Open AccessArticle

A New Signal Separation and Sampling Duration Estimation Method for ISRJ Based on FRFT and Hybrid Modality Fusion Network

by Siyu Wang, Chang Zhu, Zhiyong Song, Zhanling Wang and Fulai Wang

Remote Sens. 2025, 17(15), 2648; https://doi.org/10.3390/rs17152648 - 30 Jul 2025

Viewed by 159

Abstract

Accurate estimation of Interrupted Sampling Repeater Jamming (ISRJ) sampling duration is essential for effective radar anti-jamming. However, in complex electromagnetic environments, the simultaneous presence of suppressive and deceptive jamming, coupled with significant signal overlap in the time–frequency domain, renders ISRJ separation and parameter [...] Read more.

Accurate estimation of Interrupted Sampling Repeater Jamming (ISRJ) sampling duration is essential for effective radar anti-jamming. However, in complex electromagnetic environments, the simultaneous presence of suppressive and deceptive jamming, coupled with significant signal overlap in the time–frequency domain, renders ISRJ separation and parameter estimation considerably challenging. To address this challenge, this paper proposes a method utilizing the Fractional Fourier Transform (FRFT) and a Hybrid Modality Fusion Network (HMFN) for ISRJ signal separation and sampling-duration estimation. The proposed method first employs FRFT and a time–frequency mask to separate the ISRJ and target echo from the mixed signal. This process effectively suppresses interference and extracts the ISRJ signal. Subsequently, an HMFN is employed for high-precision estimation of the ISRJ sampling duration, offering crucial parameter support for active electromagnetic countermeasures. Simulation results validate the performance of the proposed method. Specifically, even under strong interference conditions with a Signal-to-Jamming Ratio (SJR) of −5 dB for deceptive jamming and as low as −10 dB for suppressive jamming, the regression model’s coefficient of determination still reaches 0.91. This result clearly demonstrates the method’s robustness and effectiveness in complex electromagnetic environments. Full article

► Show Figures

Figure 1

26 pages, 62045 KiB

Open AccessArticle

CML-RTDETR: A Lightweight Wheat Head Detection and Counting Algorithm Based on the Improved RT-DETR

by Yue Fang, Chenbo Yang, Chengyong Zhu, Hao Jiang, Jingmin Tu and Jie Li

Electronics 2025, 14(15), 3051; https://doi.org/10.3390/electronics14153051 - 30 Jul 2025

Viewed by 108

Abstract

Wheat is one of the important grain crops, and spike counting is crucial for predicting spike yield. However, in complex farmland environments, the wheat body scale has huge differences, its color is highly similar to the background, and wheat ears often overlap with [...] Read more.

Wheat is one of the important grain crops, and spike counting is crucial for predicting spike yield. However, in complex farmland environments, the wheat body scale has huge differences, its color is highly similar to the background, and wheat ears often overlap with each other, which makes wheat ear detection work face a lot of challenges. At the same time, the increasing demand for high accuracy and fast response in wheat spike detection has led to the need for models to be lightweight function with reduced the hardware costs. Therefore, this study proposes a lightweight wheat ear detection model, CML-RTDETR, for efficient and accurate detection of wheat ears in real complex farmland environments. In the model construction, the lightweight network CSPDarknet is firstly introduced as the backbone network of CML-RTDETR to enhance the feature extraction efficiency. In addition, the FM module is cleverly introduced to modify the bottleneck layer in the C2f component, and hybrid feature extraction is realized by spatial and frequency domain splicing to enhance the feature extraction capability of wheat to be tested in complex scenes. Secondly, to improve the model’s detection capability for targets of different scales, a multi-scale feature enhancement pyramid (MFEP) is designed, consisting of GHSDConv, for efficiently obtaining low-level detail information and CSPDWOK for constructing a multi-scale semantic fusion structure. Finally, channel pruning based on Layer-Adaptive Magnitude Pruning (LAMP) scoring is performed to reduce model parameters and runtime memory. The experimental results on the GWHD2021 dataset show that the

{AP}_{50}

of CML-RTDETR reaches 90.5%, which is an improvement of 1.2% compared to the baseline RTDETR-R18 model. Meanwhile, the parameters and GFLOPs have been decreased to 11.03 M and 37.8 G, respectively, resulting in a reduction of 42% and 34%, respectively. Finally, the real-time frame rate reaches 73 fps, significantly achieving parameter simplification and speed improvement. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 2525 KiB

Open AccessArticle

mmHSE: A Two-Stage Framework for Human Skeleton Estimation Using mmWave FMCW Radar Signals

by Jiake Tian, Yi Zou and Jiale Lai

Appl. Sci. 2025, 15(15), 8410; https://doi.org/10.3390/app15158410 - 29 Jul 2025

Viewed by 111

Abstract

We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using [...] Read more.

We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using a dual-node radar acquisition platform. Leveraging the collected data, we develop a two-stage neural architecture for human skeleton estimation. The first stage employs a dual-branch network with depthwise separable convolutions and self-attention to extract multi-scale spatiotemporal features from dual-view radar inputs. A cross-modal attention fusion module is then used to generate initial estimates of 21 skeletal keypoints. The second stage refines these estimates using a skeletal topology module based on graph convolutional networks, which captures spatial dependencies among joints to enhance localization accuracy. Experiments show that mmHSE achieves a Mean Absolute Error (MAE) of 2.78 cm. In cross-domain evaluations, the MAE remains at 3.14 cm, demonstrating the method’s generalization ability and robustness for non-intrusive human pose estimation from mmWave FMCW radar signals. Full article

► Show Figures

Figure 1

28 pages, 2918 KiB

Open AccessArticle

Machine Learning-Powered KPI Framework for Real-Time, Sustainable Ship Performance Management

by Christos Spandonidis, Vasileios Iliopoulos and Iason Athanasopoulos

J. Mar. Sci. Eng. 2025, 13(8), 1440; https://doi.org/10.3390/jmse13081440 - 28 Jul 2025

Viewed by 275

Abstract

The maritime sector faces escalating demands to minimize emissions and optimize operational efficiency under tightening environmental regulations. Although technologies such as the Internet of Things (IoT), Artificial Intelligence (AI), and Digital Twins (DT) offer substantial potential, their deployment in real-time ship performance analytics [...] Read more.

The maritime sector faces escalating demands to minimize emissions and optimize operational efficiency under tightening environmental regulations. Although technologies such as the Internet of Things (IoT), Artificial Intelligence (AI), and Digital Twins (DT) offer substantial potential, their deployment in real-time ship performance analytics is at an emerging state. This paper proposes a machine learning-driven framework for real-time ship performance management. The framework starts with data collected from onboard sensors and culminates in a decision support system that is easily interpretable, even by non-experts. It also provides a method to forecast vessel performance by extrapolating Key Performance Indicator (KPI) values. Furthermore, it offers a flexible methodology for defining KPIs for every crucial component or aspect of vessel performance, illustrated through a use case focusing on fuel oil consumption. Leveraging Artificial Neural Networks (ANNs), hybrid multivariate data fusion, and high-frequency sensor streams, the system facilitates continuous diagnostics, early fault detection, and data-driven decision-making. Unlike conventional static performance models, the framework employs dynamic KPIs that evolve with the vessel’s operational state, enabling advanced trend analysis, predictive maintenance scheduling, and compliance assurance. Experimental comparison against classical KPI models highlights superior predictive fidelity, robustness, and temporal consistency. Furthermore, the paper delineates AI and ML applications across core maritime operations and introduces a scalable, modular system architecture applicable to both commercial and naval platforms. This approach bridges advanced simulation ecosystems with in situ operational data, laying a robust foundation for digital transformation and sustainability in maritime domains. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

29 pages, 3125 KiB

Open AccessArticle

Tomato Leaf Disease Identification Framework FCMNet Based on Multimodal Fusion

by Siming Deng, Jiale Zhu, Yang Hu, Mingfang He and Yonglin Xia

Plants 2025, 14(15), 2329; https://doi.org/10.3390/plants14152329 - 27 Jul 2025

Viewed by 404

Abstract

Precisely recognizing diseases in tomato leaves plays a crucial role in enhancing the health, productivity, and quality of tomato crops. However, disease identification methods that rely on single-mode information often face the problems of insufficient accuracy and weak generalization ability. Therefore, this paper [...] Read more.

Precisely recognizing diseases in tomato leaves plays a crucial role in enhancing the health, productivity, and quality of tomato crops. However, disease identification methods that rely on single-mode information often face the problems of insufficient accuracy and weak generalization ability. Therefore, this paper proposes a tomato leaf disease recognition framework FCMNet based on multimodal fusion, which combines tomato leaf disease image and text description to enhance the ability to capture disease characteristics. In this paper, the Fourier-guided Attention Mechanism (FGAM) is designed, which systematically embeds the Fourier frequency-domain information into the spatial-channel attention structure for the first time, enhances the stability and noise resistance of feature expression through spectral transform, and realizes more accurate lesion location by means of multi-scale fusion of local and global features. In order to realize the deep semantic interaction between image and text modality, a Cross Vision–Language Alignment module (CVLA) is further proposed. This module generates visual representations compatible with Bert embeddings by utilizing block segmentation and feature mapping techniques. Additionally, it incorporates a probability-based weighting mechanism to achieve enhanced multimodal fusion, significantly strengthening the model’s comprehension of semantic relationships across different modalities. Furthermore, to enhance both training efficiency and parameter optimization capabilities of the model, we introduce a Multi-strategy Improved Coati Optimization Algorithm (MSCOA). This algorithm integrates Good Point Set initialization with a Golden Sine search strategy, thereby boosting global exploration, accelerating convergence, and effectively preventing entrapment in local optima. Consequently, it exhibits robust adaptability and stable performance within high-dimensional search spaces. The experimental results show that the FCMNet model has increased the accuracy and precision by 2.61% and 2.85%, respectively, compared with the baseline model on the self-built dataset of tomato leaf diseases, and the recall and F1 score have increased by 3.03% and 3.06%, respectively, which is significantly superior to the existing methods. This research provides a new solution for the identification of tomato leaf diseases and has broad potential for agricultural applications. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)

► Show Figures

Figure 1

21 pages, 5527 KiB

Open AccessArticle

SGNet: A Structure-Guided Network with Dual-Domain Boundary Enhancement and Semantic Fusion for Skin Lesion Segmentation

by Haijiao Yun, Qingyu Du, Ziqing Han, Mingjing Li, Le Yang, Xinyang Liu, Chao Wang and Weitian Ma

Sensors 2025, 25(15), 4652; https://doi.org/10.3390/s25154652 - 27 Jul 2025

Viewed by 278

Abstract

Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based [...] Read more.

Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based on UNet or Transformer architectures, often face limitations in regard to fully exploiting lesion features and incur high computational costs, compromising precise lesion delineation. To overcome these challenges, we propose SGNet, a structure-guided network, integrating a hybrid CNN–Mamba framework for robust skin lesion segmentation. The SGNet employs the Visual Mamba (VMamba) encoder to efficiently extract multi-scale features, followed by the Dual-Domain Boundary Enhancer (DDBE), which refines boundary representations and suppresses noise through spatial and frequency-domain processing. The Semantic-Texture Fusion Unit (STFU) adaptively integrates low-level texture with high-level semantic features, while the Structure-Aware Guidance Module (SAGM) generates coarse segmentation maps to provide global structural guidance. The Guided Multi-Scale Refiner (GMSR) further optimizes boundary details through a multi-scale semantic attention mechanism. Comprehensive experiments based on the ISIC2017, ISIC2018, and PH2 datasets demonstrate SGNet’s superior performance, with average improvements of 3.30% in terms of the mean Intersection over Union (mIoU) value and 1.77% in regard to the Dice Similarity Coefficient (DSC) compared to state-of-the-art methods. Ablation studies confirm the effectiveness of each component, highlighting SGNet’s exceptional accuracy and robust generalization for computer-aided dermatological diagnosis. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

20 pages, 28899 KiB

Open AccessArticle

MSDP-Net: A Multi-Scale Domain Perception Network for HRRP Target Recognition

by Hongxu Li, Xiaodi Li, Zihan Xu, Xinfei Jin and Fulin Su

Remote Sens. 2025, 17(15), 2601; https://doi.org/10.3390/rs17152601 - 26 Jul 2025

Viewed by 321

Abstract

High-resolution range profile (HRRP) recognition serves as a foundational task in radar automatic target recognition (RATR), enabling robust classification under all-day and all-weather conditions. However, existing approaches often struggle to simultaneously capture the multi-scale spatial dependencies and global spectral relationships inherent in HRRP [...] Read more.

High-resolution range profile (HRRP) recognition serves as a foundational task in radar automatic target recognition (RATR), enabling robust classification under all-day and all-weather conditions. However, existing approaches often struggle to simultaneously capture the multi-scale spatial dependencies and global spectral relationships inherent in HRRP signals, limiting their effectiveness in complex scenarios. To address these limitations, we propose a novel multi-scale domain perception network tailored for HRRP-based target recognition, called MSDP-Net. MSDP-Net introduces a hybrid spatial–spectral representation learning strategy through a multiple-domain perception HRRP (DP-HRRP) encoder, which integrates multi-head convolutions to extract spatial features across diverse receptive fields, and frequency-aware filtering to enhance critical spectral components. To further enhance feature fusion, we design a hierarchical scale fusion (HSF) branch that employs stacked semantically enhanced scale fusion (SESF) blocks to progressively aggregate information from fine to coarse scales in a bottom-up manner. This architecture enables MSDP-Net to effectively model complex scattering patterns and aspect-dependent variations. Extensive experiments on both simulated and measured datasets demonstrate the superiority of MSDP-Net, achieving 80.75% accuracy on the simulated dataset and 94.42% on the measured dataset, highlighting its robustness and practical applicability. Full article

(This article belongs to the Special Issue Target Recognition and Detection Based on High Resolution Radar Images)

► Show Figures

Figure 1

21 pages, 1936 KiB

Open AccessArticle

FFT-RDNet: A Time–Frequency-Domain-Based Intrusion Detection Model for IoT Security

by Bingjie Xiang, Renguang Zheng, Kunsan Zhang, Chaopeng Li and Jiachun Zheng

Sensors 2025, 25(15), 4584; https://doi.org/10.3390/s25154584 - 24 Jul 2025

Viewed by 277

Abstract

Resource-constrained Internet of Things (IoT) devices demand efficient and robust intrusion detection systems (IDSs) to counter evolving cyber threats. The traditional IDS models, however, struggle with high computational complexity and inadequate feature extraction, limiting their accuracy and generalizability in IoT environments. To address [...] Read more.

Resource-constrained Internet of Things (IoT) devices demand efficient and robust intrusion detection systems (IDSs) to counter evolving cyber threats. The traditional IDS models, however, struggle with high computational complexity and inadequate feature extraction, limiting their accuracy and generalizability in IoT environments. To address this, we propose FFT-RDNet, a lightweight IDS framework leveraging depthwise separable convolution and frequency-domain feature fusion. An ADASYN-Tomek Links hybrid strategy first addresses class imbalances. The core innovation of FFT-RDNet lies in its novel two-dimensional spatial feature modeling approach, realized through a dedicated dual-path feature embedding module. One branch extracts discriminative statistical features in the time domain, while the other branch transforms the data into the frequency domain via Fast Fourier Transform (FFT) to capture the essential energy distribution characteristics. These time–frequency domain features are fused to construct a two-dimensional feature space, which is then processed by a streamlined residual network using depthwise separable convolution. This network effectively captures complex periodic attack patterns with minimal computational overhead. Comprehensive evaluation on the NSL-KDD and CIC-IDS2018 datasets shows that FFT-RDNet outperforms state-of-the-art neural network IDSs across accuracy, precision, recall, and F1 score (improvements: 0.22–1%). Crucially, it achieves superior accuracy with a significantly reduced computational complexity, demonstrating high efficiency for resource-constrained IoT security deployments. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

19 pages, 9361 KiB

Open AccessArticle

A Multi-Domain Enhanced Network for Underwater Image Enhancement

by Tianmeng Sun, Yinghao Zhang, Jiamin Hu, Haiyuan Cui and Teng Yu

Information 2025, 16(8), 627; https://doi.org/10.3390/info16080627 - 23 Jul 2025

Viewed by 157

Abstract

Owing to the intricate variability of underwater environments, images suffer from degradation including light absorption, scattering, and color distortion. However, U-Net architectures severely limit global context utilization due to fixed-receptive-field convolutions, while traditional attention mechanisms incur quadratic complexity and fail to efficiently fuse [...] Read more.

Owing to the intricate variability of underwater environments, images suffer from degradation including light absorption, scattering, and color distortion. However, U-Net architectures severely limit global context utilization due to fixed-receptive-field convolutions, while traditional attention mechanisms incur quadratic complexity and fail to efficiently fuse spatial–frequency features. Unlike local enhancement-focused methods, HMENet integrates a transformer sub-network for long-range dependency modeling and dual-domain attention for bidirectional spatial–frequency fusion. This design increases the receptive field while maintaining linear complexity. On UIEB and EUVP datasets, HMENet achieves PSNR/SSIM of 25.96/0.946 and 27.92/0.927, surpassing HCLR-Net by 0.97 dB/1.88 dB, respectively. Full article

► Show Figures

Figure 1

27 pages, 8957 KiB

Open AccessArticle

DFAN: Single Image Super-Resolution Using Stationary Wavelet-Based Dual Frequency Adaptation Network

by Gyu-Il Kim and Jaesung Lee

Symmetry 2025, 17(8), 1175; https://doi.org/10.3390/sym17081175 - 23 Jul 2025

Viewed by 277

Abstract

Single image super-resolution is the inverse problem of reconstructing a high-resolution image from its low-resolution counterpart. Although recent Transformer-based architectures leverage global context integration to improve reconstruction quality, they often overlook frequency-specific characteristics, resulting in the loss of high-frequency information. To address this [...] Read more.

Single image super-resolution is the inverse problem of reconstructing a high-resolution image from its low-resolution counterpart. Although recent Transformer-based architectures leverage global context integration to improve reconstruction quality, they often overlook frequency-specific characteristics, resulting in the loss of high-frequency information. To address this limitation, we propose the Dual Frequency Adaptive Network (DFAN). DFAN first decomposes the input into low- and high-frequency components via Stationary Wavelet Transform. In the low-frequency branch, Swin Transformer layers restore global structures and color consistency. In contrast, the high-frequency branch features a dedicated module that combines Directional Convolution with Residual Dense Blocks, precisely reinforcing edges and textures. A frequency fusion module then adaptively merges these complementary features using depthwise and pointwise convolutions, achieving a balanced reconstruction. During training, we introduce a frequency-aware multi-term loss alongside the standard pixel-wise loss to explicitly encourage high-frequency preservation. Extensive experiments on the Set5, Set14, BSD100, Urban100, and Manga109 benchmarks show that DFAN achieves up to +0.64 dBpeak signal-to-noise ratio, +0.01 structural similarity index measure, and −0.01learned perceptual image patch similarity over the strongest frequency-domain baselines, while also delivering visibly sharper textures and cleaner edges. By unifying spatial and frequency-domain advantages, DFAN effectively mitigates high-frequency degradation and enhances SISR performance. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

21 pages, 2308 KiB

Open AccessArticle

Forgery-Aware Guided Spatial–Frequency Feature Fusion for Face Image Forgery Detection

by Zhenxiang He, Zhihao Liu and Ziqi Zhao

Symmetry 2025, 17(7), 1148; https://doi.org/10.3390/sym17071148 - 18 Jul 2025

Viewed by 293

Abstract

The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they [...] Read more.

The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they still suffer from limited sensitivity to local forgery regions and inadequate interaction between spatial and frequency information in practical applications. To address these challenges, we propose a novel forgery-aware guided spatial–frequency feature fusion network. A lightweight U-Net is employed to generate pixel-level saliency maps by leveraging structural symmetry and semantic consistency, without relying on ground-truth masks. These maps dynamically guide the fusion of spatial features (from an improved Swin Transformer) and frequency features (via Haar wavelet transforms). Cross-domain attention, channel recalibration, and spatial gating are introduced to enhance feature complementarity and regional discrimination. Extensive experiments conducted on two benchmark face forgery datasets, FaceForensics++ and Celeb-DFv2, show that the proposed method consistently outperforms existing state-of-the-art techniques in terms of detection accuracy and generalization capability. The future work includes improving robustness under compression, incorporating temporal cues, extending to multimodal scenarios, and evaluating model efficiency for real-world deployment. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 4882 KiB

Open AccessArticle

Dual-Branch Spatio-Temporal-Frequency Fusion Convolutional Network with Transformer for EEG-Based Motor Imagery Classification

by Hao Hu, Zhiyong Zhou, Zihan Zhang and Wenyu Yuan

Electronics 2025, 14(14), 2853; https://doi.org/10.3390/electronics14142853 - 17 Jul 2025

Viewed by 248

Abstract

The decoding of motor imagery (MI) electroencephalogram (EEG) signals is crucial for motor control and rehabilitation. However, as feature extraction is the core component of the decoding process, traditional methods, often limited to single-feature domains or shallow time-frequency fusion, struggle to comprehensively capture [...] Read more.

The decoding of motor imagery (MI) electroencephalogram (EEG) signals is crucial for motor control and rehabilitation. However, as feature extraction is the core component of the decoding process, traditional methods, often limited to single-feature domains or shallow time-frequency fusion, struggle to comprehensively capture the spatio-temporal-frequency characteristics of the signals, thereby limiting decoding accuracy. To address these limitations, this paper proposes a dual-branch neural network architecture with multi-domain feature fusion, the dual-branch spatio-temporal-frequency fusion convolutional network with Transformer (DB-STFFCNet). The DB-STFFCNet model consists of three modules: the spatiotemporal feature extraction module (STFE), the frequency feature extraction module (FFE), and the feature fusion and classification module. The STFE module employs a lightweight multi-dimensional attention network combined with a temporal Transformer encoder, capable of simultaneously modeling local fine-grained features and global spatiotemporal dependencies, effectively integrating spatiotemporal information and enhancing feature representation. The FFE module constructs a hierarchical feature refinement structure by leveraging the fast Fourier transform (FFT) and multi-scale frequency convolutions, while a frequency-domain Transformer encoder captures the global dependencies among frequency domain features, thus improving the model’s ability to represent key frequency information. Finally, the fusion module effectively consolidates the spatiotemporal and frequency features to achieve accurate classification. To evaluate the feasibility of the proposed method, experiments were conducted on the BCI Competition IV-2a and IV-2b public datasets, achieving accuracies of 83.13% and 89.54%, respectively, outperforming existing methods. This study provides a novel solution for joint time-frequency representation learning in EEG analysis. Full article

(This article belongs to the Special Issue Artificial Intelligence Methods for Biomedical Data Processing)

► Show Figures

Figure 1

21 pages, 4199 KiB

Open AccessArticle

Time–Frequency-Domain Fusion Cross-Attention Fault Diagnosis Method Based on Dynamic Modeling of Bearing Rotor System

by Shiyu Xing, Zinan Wang, Rui Zhao, Xirui Guo, Aoxiang Liu and Wenfeng Liang

Appl. Sci. 2025, 15(14), 7908; https://doi.org/10.3390/app15147908 - 15 Jul 2025

Viewed by 267

Abstract

Deep learning (DL) and machine learning (ML) have advanced rapidly. This has driven significant progress in intelligent fault diagnosis (IFD) of bearings. However, methods like self-attention have limitations. They only capture features within a single sequence. They fail to effectively extract and fuse [...] Read more.

Deep learning (DL) and machine learning (ML) have advanced rapidly. This has driven significant progress in intelligent fault diagnosis (IFD) of bearings. However, methods like self-attention have limitations. They only capture features within a single sequence. They fail to effectively extract and fuse time- and frequency-domain characteristics from raw signals. This is a critical bottleneck. To tackle this, a dual-channel cross-attention dynamic fault diagnosis network for time–frequency signals is proposed. This model’s intrinsic correlations between time-domain and frequency-domain features, which overcomes single-sequence limitations. The simulation and experimental data validate the method. It achieves over 95% diagnostic accuracy. It effectively captures complex fault patterns. This work provides a theoretical basis for better fault identification in bearing–rotor systems. Full article

► Show Figures

Figure 1

23 pages, 1603 KiB

Open AccessArticle

Uncertainty-Based Fusion Method for Structural Modal Parameter Identification

by Xiaoteng Liu, Zirui Dong, Hongxia Ji, Zhenjiang Yue and Jie Kang

Sensors 2025, 25(14), 4397; https://doi.org/10.3390/s25144397 - 14 Jul 2025

Viewed by 322

Abstract

The structural modal parameter identification method can be classified into time-domain and frequency-domain methods. Practically, two types of methods are characterized by different advantages, and the estimated modal parameters are always subjected to statistical uncertainties due to measurement noise. In this work, an [...] Read more.

The structural modal parameter identification method can be classified into time-domain and frequency-domain methods. Practically, two types of methods are characterized by different advantages, and the estimated modal parameters are always subjected to statistical uncertainties due to measurement noise. In this work, an uncertainty-based fusion method for structural mode identification is proposed to merge the advantages of different methods. The extensively applied time-domain AutoRegressive (AR) and frequency-domain Left-Matrix Fraction (LMF) models are expressed in a unified parametric model. With this unified model, a generalized framework is developed to identify the modal parameters of structures and compute variances associated with modal parameter estimates. The final modal parameter estimates are computed as the inverse-variance weighted sum of the results identified from different methods. A numerical and an experimental example demonstrate that the proposed method can obtain reliable modal parameter estimates, substantially mitigating the occurrence of extremely large estimation errors. Furthermore, the fusion method demonstrates enhanced identification capabilities, effectively reducing the likelihood of missing structural modes. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

Search Results (360)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (360)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI