MDPI - Publisher of Open Access Journals

25 pages, 34242 KB

Open AccessArticle

ImbDef-GAN: Defect Image-Generation Method Based on Sample Imbalance

by Dengbiao Jiang, Nian Tao, Kelong Zhu, Yiming Wang and Haijian Shao

J. Imaging 2025, 11(10), 367; https://doi.org/10.3390/jimaging11100367 - 16 Oct 2025

In industrial settings, defect detection using deep learning typically requires large numbers of defective samples. However, defective products are rare on production lines, creating a scarcity of defect samples and an overabundance of samples that contain only background. We introduce ImbDef-GAN, a sample [...] Read more.

In industrial settings, defect detection using deep learning typically requires large numbers of defective samples. However, defective products are rare on production lines, creating a scarcity of defect samples and an overabundance of samples that contain only background. We introduce ImbDef-GAN, a sample imbalance generative framework, to address three persistent limitations in defect image generation: unnatural transitions at defect background boundaries, misalignment between defects and their masks, and out-of-bounds defect placement. The framework operates in two stages: (i) background image generation and (ii) defect image generation conditioned on the generated background. In the background image-generation stage, a lightweight StyleGAN3 variant jointly generates the background image and its segmentation mask. A Progress-coupled Gated Detail Injection module uses global scheduling driven by training progress and per-pixel gating to inject high-frequency information in a controlled manner, thereby enhancing background detail while preserving training stability. In the defect image-generation stage, the design augments the background generator with a residual branch that extracts defect features. By blending defect features with a smoothing coefficient, the resulting defect boundaries transition more naturally and gradually. A mask-aware matching discriminator enforces consistency between each defect image and its mask. In addition, an Edge Structure Loss and a Region Consistency Loss strengthen morphological fidelity and spatial constraints within the valid mask region. Extensive experiments on the MVTec AD dataset demonstrate that ImbDef-GAN surpasses existing methods in both the realism and diversity of generated defects. When the generated data are used to train a downstream detector, YOLOv11 achieves a 5.4% improvement in mAP@0.5, indicating that the proposed approach effectively improves detection accuracy under sample imbalance. Full article

(This article belongs to the Section Image and Video Processing)

24 pages, 2221 KB

Open AccessArticle

Multi-Scale Frequency-Aware Transformer for Pipeline Leak Detection Using Acoustic Signals

by Menghan Chen, Yuchen Lu, Wangyu Wu, Yanchen Ye, Bingcai Wei and Yao Ni

Sensors 2025, 25(20), 6390; https://doi.org/10.3390/s25206390 (registering DOI) - 16 Oct 2025

Abstract

Pipeline leak detection through acoustic signal measurement faces critical challenges, including insufficient utilization of time-frequency domain features, poor adaptability to noisy environments, and inadequate exploitation of frequency-domain prior knowledge in existing deep learning approaches. This paper proposes a Multi-Scale Frequency-Aware Transformer (MSFAT) architecture [...] Read more.

Pipeline leak detection through acoustic signal measurement faces critical challenges, including insufficient utilization of time-frequency domain features, poor adaptability to noisy environments, and inadequate exploitation of frequency-domain prior knowledge in existing deep learning approaches. This paper proposes a Multi-Scale Frequency-Aware Transformer (MSFAT) architecture that integrates measurement-based acoustic signal analysis with artificial intelligence techniques. The MSFAT framework consists of four core components: a frequency-aware embedding layer that achieves joint representation learning of time-frequency dual-domain features through parallel temporal convolution and frequency transformation, a multi-head frequency attention mechanism that dynamically adjusts attention weights based on spectral distribution using frequency features as modulation signals, an adaptive noise filtering module that integrates noise detection, signal enhancement, and adaptive fusion functions through end-to-end joint optimization, and a multi-scale feature aggregation mechanism that extracts discriminative global representations through complementary pooling strategies. The proposed method addresses the fundamental limitations of traditional measurement-based detection systems by incorporating domain-specific prior knowledge into neural network architecture design. Experimental validation demonstrates that MSFAT achieves 97.2% accuracy and an F1-score, representing improvements of 10.5% and 10.9%, respectively, compared to standard Transformer approaches. The model maintains robust detection performance across signal-to-noise ratio conditions ranging from 5 to 30 dB, demonstrating superior adaptability to complex industrial measurement environments. Ablation studies confirm the effectiveness of each innovative module, with frequency-aware mechanisms contributing most significantly to the enhanced measurement precision and reliability in pipeline leak detection applications. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

25 pages, 2474 KB

Open AccessArticle

Data Augmentation-Enhanced Myocardial Infarction Classification and Localization Using a ResNet-Transformer Cascaded Network

by Yunfan Chen, Qi Gao, Jinxing Ye, Yuting Li and Xiangkui Wan

Biology 2025, 14(10), 1425; https://doi.org/10.3390/biology14101425 - 16 Oct 2025

Abstract

Accurate diagnosis of myocardial infarction (MI) holds significant clinical importance for public health systems. Deep learning-based ECG, classification and localization methods can automatically extract features, thereby overcoming the dependence on manual feature extraction in traditional methods. However, these methods still face challenges such [...] Read more.

Accurate diagnosis of myocardial infarction (MI) holds significant clinical importance for public health systems. Deep learning-based ECG, classification and localization methods can automatically extract features, thereby overcoming the dependence on manual feature extraction in traditional methods. However, these methods still face challenges such as insufficient utilization of dynamic information in cardiac cycles, inadequate ability to capture both global and local features, and data imbalance. To address these issues, this paper proposes a ResNet-Transformer cascaded network (RTCN) to process time frequency features of ECG signals generated by the S-transform. First, the S-transform is applied to adaptively extract global time frequency features from the time frequency domain of ECG signals. Its scalable Gaussian window and high phase resolution can effectively capture the dynamic changes in cardiac cycles that traditional methods often fail to extract. Then, we develop an architecture that combines the Transformer attention mechanism with ResNet to extract multi-scale local features and global temporal dependencies collaboratively. This compensates for the existing deep learning models’ insufficient ability to capture both global and local features simultaneously. To address the data imbalance problem, the Denoising Diffusion Probabilistic Model (DDPM) is applied to synthesize high-quality ECG samples for minority classes, increasing the inter-patient accuracy from 61.66% to 68.39%. Gradient-weighted Class Activation Mapping (Grad-CAM) visualization confirms that the model’s attention areas are highly consistent with pathological features, verifying its clinical interpretability. Full article

(This article belongs to the Special Issue Advancing Translational Science Using Bioinformatics and Big Data-Driven Approaches)

► Show Figures

Figure 1

21 pages, 1706 KB

Open AccessArticle

Spatiotemporal Feature Learning for Daily-Life Cough Detection Using FMCW Radar

by Saihu Lu, Yuhan Liu, Guangqiang He, Zhongrui Bai, Zhenfeng Li, Pang Wu, Xianxiang Chen, Lidong Du, Peng Wang and Zhen Fang

Bioengineering 2025, 12(10), 1112; https://doi.org/10.3390/bioengineering12101112 - 15 Oct 2025

Abstract

Cough is a key symptom reflecting respiratory health, with its frequency and pattern providing valuable insights into disease progression and clinical management. Objective and reliable cough detection systems are therefore of broad significance for healthcare and remote monitoring. However, existing algorithms often struggle [...] Read more.

Cough is a key symptom reflecting respiratory health, with its frequency and pattern providing valuable insights into disease progression and clinical management. Objective and reliable cough detection systems are therefore of broad significance for healthcare and remote monitoring. However, existing algorithms often struggle to jointly model spatial and temporal information, limiting their robustness in real-world applications. To address this issue, we propose a cough recognition framework based on frequency-modulated continuous-wave (FMCW) radar, integrating a deep convolutional neural network (CNN) with a Self-Attention mechanism. The CNN extracts spatial features from range-Doppler maps, while Self-Attention captures temporal dependencies, and effective data augmentation strategies enhance generalization by simulating position variations and masking local dependencies. To rigorously evaluate practicality, we collected a large-scale radar dataset covering diverse positions, orientations, and activities. Experimental results demonstrate that, under subject-independent five-fold cross-validation, the proposed model achieved a mean F1-score of

0.974 \pm 0.016

and an accuracy of

99.05 \pm 0.55

%, further supported by high precision of

98.77 \pm 1.05

%, recall of

96.07 \pm 2.16

%, and specificity of

99.73 \pm 0.23

%. These results confirm that our method is not only robust in realistic scenarios but also provides a practical pathway toward continuous, non-invasive, and privacy-preserving respiratory health monitoring in both clinical and telehealth applications. Full article

(This article belongs to the Special Issue Artificial Intelligence for Better Healthcare and Precision Medicine, 2nd Edition)

► Show Figures

Graphical abstract

20 pages, 4914 KB

Open AccessArticle

Dual-Channel Parallel Multimodal Feature Fusion for Bearing Fault Diagnosis

by Wanrong Li, Haichao Cai, Xiaokang Yang, Yujun Xue, Jun Ye and Xiangyi Hu

Machines 2025, 13(10), 950; https://doi.org/10.3390/machines13100950 (registering DOI) - 15 Oct 2025

Abstract

In recent years, the powerful feature extraction capabilities of deep learning have attracted widespread attention in the field of bearing fault diagnosis. To address the limitations of single-modal and single-channel feature extraction methods, which often result in incomplete information representation and difficulty in [...] Read more.

In recent years, the powerful feature extraction capabilities of deep learning have attracted widespread attention in the field of bearing fault diagnosis. To address the limitations of single-modal and single-channel feature extraction methods, which often result in incomplete information representation and difficulty in obtaining high-quality fault features, this paper proposes a dual-channel parallel multimodal feature fusion model for bearing fault diagnosis. In this method, the one-dimensional vibration signals are first transformed into two-dimensional time-frequency representations using continuous wavelet transform (CWT). Subsequently, both the one-dimensional vibration signals and the two-dimensional time-frequency representations are fed simultaneously into the dual-branch parallel model. Within this architecture, the first branch employs a combination of a one-dimensional convolutional neural network (1DCNN) and a bidirectional gated recurrent unit (BiGRU) to extract temporal features from the one-dimensional vibration signals. The second branch utilizes a dilated convolutional to capture spatial time–frequency information from the CWT-derived two-dimensional time–frequency representations. The features extracted by both branches were are input into the feature fusion layer. Furthermore, to leverage fault features more comprehensively, a channel attention mechanism is embedded after the feature fusion layer. This enables the network to focus more effectively on salient features across channels while suppressing interference from redundant features, thereby enhancing the performance and accuracy of the dual-branch network. Finally, the fused fault features are passed to a softmax classifier for fault classification. Experimental results demonstrate that the proposed method achieved an average accuracy of 99.50% on the Case Western Reserve University (CWRU) bearing dataset and 97.33% on the Southeast University (SEU) bearing dataset. These results confirm that the suggested model effectively improves fault diagnosis accuracy and exhibits strong generalization capability. Full article

(This article belongs to the Section Machines Testing and Maintenance)

► Show Figures

Figure 1

16 pages, 5754 KB

Open AccessArticle

PPG-Net 4: Deep-Learning-Based Approach for Classification of Blood Flow Using Non-Invasive Dual Photoplethysmography (PPG) Signals

by Manisha Samant and Utkarsha Pacharaney

Sensors 2025, 25(20), 6362; https://doi.org/10.3390/s25206362 - 15 Oct 2025

Abstract

Cardiovascular disease diagnosis heavily relies on accurate blood flow assessments, traditionally performed using invasive and often uncomfortable methods like catheterization. This research introduces PPG-Net 4, an innovative deep learning approach for non-invasive blood flow pattern classification using dual photoplethysmography (PPG) signals. By leveraging [...] Read more.

Cardiovascular disease diagnosis heavily relies on accurate blood flow assessments, traditionally performed using invasive and often uncomfortable methods like catheterization. This research introduces PPG-Net 4, an innovative deep learning approach for non-invasive blood flow pattern classification using dual photoplethysmography (PPG) signals. By leveraging advanced machine learning techniques, the proposed method addresses critical limitations in current diagnostic technologies. The study employed a novel dual-sensor arrangement capturing PPG signals from two body locations, generating a comprehensive dataset from 75 participants. Advanced signal processing techniques, including mel spectrogram generation and mel-frequency cepstral coefficient extraction, enabled sophisticated feature representation. The deep learning model, PPG-Net 4, demonstrated good capability at classifying the following five distinct blood flow patterns: laminar, turbulent, stagnant, pulsatile, and oscillatory. The experimental results revealed strong classification performance, with F1-scores ranging from 0.86 to 0.92 across different flow patterns. The highest accuracy was observed for pulsatile flow (F1-score: 0.92), underscoring the model’s precision and reliability. This approach not only provides a non-invasive alternative to traditional diagnostic methods but also offers a potentially useful technique for early cardiovascular disease detection and continuous monitoring. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

30 pages, 23104 KB

Open AccessArticle

MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion

by Guolong Wu and Yimin Lu

Remote Sens. 2025, 17(20), 3425; https://doi.org/10.3390/rs17203425 - 13 Oct 2025

Viewed by 148

Abstract

Accurate mapping of marine aquaculture areas is critical for environmental management and sustainable development for marine ecosystem protection and sustainable resource utilization. However, remote sensing imagery based on single-sensor modalities has inherent limitations when extracting aquaculture zones in complex marine environments. To address [...] Read more.

Accurate mapping of marine aquaculture areas is critical for environmental management and sustainable development for marine ecosystem protection and sustainable resource utilization. However, remote sensing imagery based on single-sensor modalities has inherent limitations when extracting aquaculture zones in complex marine environments. To address this challenge, we constructed a multi-modal dataset from five Chinese coastal regions using cloud detection methods and developed Multi-modal Spatial–Frequency Adaptive Fusion Network (MSAFNet) for optical-radar data fusion. MSAFNet employs a dual-path architecture utilizing a Multi-scale Dual-path Feature Module (MDFM) that combines CNN and Transformer capabilities to extract multi-scale features. Additionally, it implements a Dynamic Frequency Domain Adaptive Fusion Module (DFAFM) to achieve deep integration of multi-modal features in both spatial and frequency domains, effectively leveraging the complementary advantages of different sensor data. Results demonstrate that MSAFNet achieves 76.93% mean intersection over union (mIoU), 86.96% mean F1 score (mF1), and 93.26% mean Kappa coefficient (mKappa) in extracting floating raft aquaculture (FRA) and cage aquaculture (CA), significantly outperforming existing methods. Applied to China’s coastal waters, the model generated 2020 nearshore aquaculture distribution maps, demonstrating its generalization capability and practical value in complex marine environments. This approach provides reliable technical support for marine resource management and ecological monitoring. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning for Multi-Modal and Multi-Spectral Remote Sensing Image Processing)

► Show Figures

Figure 1

25 pages, 14038 KB

Open AccessArticle

Infrared Target Detection Based on Image Enhancement and an Improved Feature Extraction Network

by Peng Wu, Zhen Zuo, Shaojing Su and Boyuan Zhao

Drones 2025, 9(10), 695; https://doi.org/10.3390/drones9100695 - 10 Oct 2025

Viewed by 252

Abstract

Small unmanned aerial vehicles (UAVs) pose significant security challenges due to their low detectability in infrared imagery, particularly when appearing as small, low-contrast targets against complex backgrounds. This paper presents a novel infrared target detection framework that addresses these challenges through two key [...] Read more.

Small unmanned aerial vehicles (UAVs) pose significant security challenges due to their low detectability in infrared imagery, particularly when appearing as small, low-contrast targets against complex backgrounds. This paper presents a novel infrared target detection framework that addresses these challenges through two key innovations: an improved Gaussian filtering-based image enhancement module and a hierarchical feature extraction network. The proposed image enhancement module incorporates a vertical weight function to handle abnormal feature values while preserving edge information, effectively improving image contrast and reducing noise. The detection network introduces the SODMamba backbone with Deep Feature Perception Modules (DFPMs) that leverage high-frequency components to enhance small target features. Extensive experiments on the custom SIDD dataset demonstrate that our method achieves superior detection performance across diverse backgrounds (urban, mountain, sea, and sky), with mAP@0.5 reaching 96.0%, 74.1%, 92.0%, and 98.7%, respectively. Notably, our model maintains a lightweight profile with only 6.2M parameters and enables real-time inference, which is crucial for practical deployment. Real-world validation experiments confirm the effectiveness and efficiency of the proposed approach for practical UAV detection applications. Full article

► Show Figures

Figure 1

13 pages, 2381 KB

Open AccessArticle

DCNN–Transformer Hybrid Network for Robust Feature Extraction in FMCW LiDAR Ranging

by Wenhao Xu, Pansong Zhang, Guohui Yuan, Shichang Xu, Longfei Li, Junxiang Zhang, Longfei Li, Tianyu Li and Zhuoran Wang

Photonics 2025, 12(10), 995; https://doi.org/10.3390/photonics12100995 - 10 Oct 2025

Viewed by 244

Abstract

Frequency-Modulated Continuous-Wave (FMCW) Laser Detection and Ranging (LiDAR) systems are widely used due to their high accuracy and resolution. Nevertheless, conventional distance extraction methods often lack robustness in noisy and complex environments. To address this limitation, we propose a deep learning-based signal extraction [...] Read more.

Frequency-Modulated Continuous-Wave (FMCW) Laser Detection and Ranging (LiDAR) systems are widely used due to their high accuracy and resolution. Nevertheless, conventional distance extraction methods often lack robustness in noisy and complex environments. To address this limitation, we propose a deep learning-based signal extraction framework that integrates a Dual Convolutional Neural Network (DCNN) with a Transformer model. The DCNN extracts multi-scale spatial features through multi-layer and pointwise convolutions, while the Transformer employs a self-attention mechanism to capture global temporal dependencies of the beat-frequency signals. The proposed DCNN–Transformer network is evaluated through beat-frequency signal inversion experiments across distances ranging from 3 m to 40 m. The experimental results show that the method achieves a mean absolute error (MAE) of 4.1 mm and a root-mean-square error (RMSE) of 3.08 mm. These results demonstrate that the proposed approach provides stable and accurate predictions, with strong generalization ability and robustness for FMCW LiDAR systems. Full article

(This article belongs to the Section Optical Interaction Science)

► Show Figures

Figure 1

26 pages, 52162 KB

Open AccessArticle

ASFT-Transformer: A Fast and Accurate Framework for EEG-Based Pilot Fatigue Recognition

by Jiming Liu, Yi Zhou, Qileng He and Zhenxing Gao

Sensors 2025, 25(19), 6256; https://doi.org/10.3390/s25196256 - 9 Oct 2025

Viewed by 394

Abstract

Objective evaluation of pilot fatigue is crucial for enhancing aviation safety. Although electroencephalography (EEG) is regarded as an effective tool for recognizing pilot fatigue, the direct application of deep learning models to raw EEG signals faces significant challenges due to issues such as [...] Read more.

Objective evaluation of pilot fatigue is crucial for enhancing aviation safety. Although electroencephalography (EEG) is regarded as an effective tool for recognizing pilot fatigue, the direct application of deep learning models to raw EEG signals faces significant challenges due to issues such as massive data volume, excessively long training time, and model overfitting. Moreover, existing feature-based methods often suffer from data redundancy due to the lack of effective feature and channel selections, which compromises the model’s recognition efficiency and accuracy. To address these issues, this paper proposes a framework, named ASFT-Transformer, for fast and accurate detection of pilot fatigue. This framework first extracts time-domain and frequency-domain features from the four EEG frequency bands. Subsequently, it introduces a feature and channel selection strategy based on one-way analysis of variance and support vector machine (ANOVA-SVM) to identify the most fatigue-relevant features and pivotal EEG channels. Finally, the FT-Transformer (Feature Tokenizer + Transformer) model is employed for classification based on the selected features, transforming the fatigue recognition problem into a tabular data classification task. EEG data is collected from 32 pilots before and after actual simulator training to validate the proposed method. The results show that ASFT-Transformer achieved average accuracies of 97.24% and 87.72% based on cross-clip data partitioning and cross-subject data partitioning, which were significantly superior to several mainstream machine learning and deep learning models. Under the two types of cross-validation, the proposed feature and channel selection strategy not only improved the average accuracy by 2.45% and 8.07%, respectively, but also drastically reduced the average training time from above 1 h to under 10 min. This study offers civil aviation authorities and airline operators a tool to manage pilot fatigue objectively and effectively, thereby contributing to flight safety. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

20 pages, 4284 KB

Open AccessArticle

An Adaptive Deep Ensemble Learning for Specific Emitter Identification

by Peng Shang, Lishu Guo, Decai Zou, Xue Wang, Pengfei Liu and Shuaihe Gao

Sensors 2025, 25(19), 6245; https://doi.org/10.3390/s25196245 - 9 Oct 2025

Viewed by 295

Abstract

Specific emitter identification (SEI), which classifies radio transmitters by extracting hardware-intrinsic radio frequency fingerprints (RFFs), faces critical challenges in noise robustness, generalization under limited training data and class imbalance. To address these limitations, we propose adaptive deep ensemble learning (ADEL)—a framework that integrates [...] Read more.

Specific emitter identification (SEI), which classifies radio transmitters by extracting hardware-intrinsic radio frequency fingerprints (RFFs), faces critical challenges in noise robustness, generalization under limited training data and class imbalance. To address these limitations, we propose adaptive deep ensemble learning (ADEL)—a framework that integrates heterogeneous neural networks including convolutional neural networks (CNN), multilayer perception (MLP) and transformer for hierarchical feature extraction. Crucially, ADEL also adopts adaptive weighted predictions of the three base classifiers based on reconstruction errors and hybrid losses for robust classification. The methodology employs (1) three heterogeneous neural networks for robust feature extraction; (2) the hybrid losses refine feature space structure and preserve feature integrity for better feature generalization; and (3) collaborative decision-making via adaptive weighted reconstruction errors of the base learners for precise inference. Extensive experiments are performed to validate the effectiveness of ADEL. The results indicate that the proposed method significantly outperforms other competing methods. ADEL establishes a new SEI paradigm through robust feature extraction and adaptive decision integrity, enabling potential deployment in space target identification and situational awareness under limited training samples and imbalanced classes conditions. Full article

(This article belongs to the Section Electronic Sensors)

► Show Figures

Figure 1

29 pages, 5154 KB

Open AccessArticle

Spatial-Frequency-Scale Variational Autoencoder for Enhanced Flow Diagnostics of Schlieren Data

by Ronghua Yang, Hao Wu, Rongfei Yang, Xingshuang Wu, Yifan Song, Meiying Lü and Mingrui Wang

Sensors 2025, 25(19), 6233; https://doi.org/10.3390/s25196233 - 8 Oct 2025

Viewed by 368

Abstract

Schlieren imaging is a powerful optical sensing technique that captures flow-induced refractive index gradients, offering valuable visual data for analyzing complex fluid dynamics. However, the large volume and structural complexity of the data generated by this sensor pose significant challenges for extracting key [...] Read more.

Schlieren imaging is a powerful optical sensing technique that captures flow-induced refractive index gradients, offering valuable visual data for analyzing complex fluid dynamics. However, the large volume and structural complexity of the data generated by this sensor pose significant challenges for extracting key physical insights and performing efficient reconstruction and temporal prediction. In this study, we propose a Spatial-Frequency-Scale variational autoencoder (SFS-VAE), a deep learning framework designed for the unsupervised feature decomposition of Schlieren sensor data. To address the limitations of traditional

β

-variational autoencoder (

β

-VAE) in capturing complex flow regions, the Progressive Frequency-enhanced Spatial Multi-scale Module (PFSM) is designed, which enhances the structures of different frequency bands through Fourier transform and multi-scale convolution; the Feature-Spatial Enhancement Module (FSEM) employs a gradient-driven spatial attention mechanism to extract key regional features. Experiments on flat plate film-cooled jet schlieren data show that SFS-VAE can effectively preserve the information of the mainstream region and more accurately capture the high-gradient features of the jet region, reducing the Root Mean Square Error (RMSE) by approximately 16.9% and increasing the Peak Signal-to-Noise Ratio (PSNR) by approximately 1.6 dB. Furthermore, when integrated with a Transformer for temporal prediction, the model exhibits significantly improved stability and accuracy in forecasting flow field evolution. Overall, the model’s physical interpretability and generalization ability make it a powerful new tool for advanced flow diagnostics through the robust analysis of Schlieren sensor data. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

27 pages, 32995 KB

Open AccessArticle

Recognition of Wood-Boring Insect Creeping Signals Based on Residual Denoising Vision Network

by Henglong Lin, Huajie Xue, Jingru Gong, Cong Huang, Xi Qiao, Liping Yin and Yiqi Huang

Sensors 2025, 25(19), 6176; https://doi.org/10.3390/s25196176 - 5 Oct 2025

Viewed by 403

Abstract

Currently, the customs inspection of wood-boring pests in timber still primarily relies on manual visual inspection, which involves observing insect holes on the timber surface and splitting the timber for confirmation. However, this method has significant drawbacks such as long detection time, high [...] Read more.

Currently, the customs inspection of wood-boring pests in timber still primarily relies on manual visual inspection, which involves observing insect holes on the timber surface and splitting the timber for confirmation. However, this method has significant drawbacks such as long detection time, high labor cost, and accuracy relying on human experience, making it difficult to meet the practical needs of efficient and intelligent customs quarantine. To address this issue, this paper develops a rapid identification system based on the peristaltic signals of wood-boring pests through the PyQt framework. The system employs a deep learning model with multi-attention mechanisms, namely the Residual Denoising Vision Network (RDVNet). Firstly, a LabVIEW-based hardware–software system is used to collect pest peristaltic signals in an environment free of vibration interference. Subsequently, the original signals are clipped, converted to audio format, and mixed with external noise. Then signal features are extracted through three cepstral feature extraction methods Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and RelAtive SpecTrAl-Perceptual Linear Prediction (RASTA-PLP) and input into the model. In the experimental stage, this paper compares the denoising module of RDVNet (de-RDVNet) with four classic denoising models under five noise intensity conditions. Finally, it evaluates the performance of RDVNet and four other noise reduction classification models in classification tasks. The results show that PNCC has the most comprehensive feature extraction capability. When PNCC is used as the model input, de-RDVNet achieves an average peak signal-to-noise ratio (PSNR) of 29.8 and a Structural Similarity Index Measure (SSIM) of 0.820 in denoising experiments, both being the best among the comparative models. In classification experiments, RDVNet has an average F1 score of 0.878 and an accuracy of 92.8%, demonstrating the most excellent performance. Overall, the application of this system in customs timber quarantine can effectively improve detection efficiency and reduce labor costs and has significant practical value and promotion prospects. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

22 pages, 1556 KB

Open AccessArticle

Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights

by Tommaso Senatori, Daniela Nardone, Michele Lo Giudice and Alessandro Salvini

Information 2025, 16(10), 864; https://doi.org/10.3390/info16100864 - 5 Oct 2025

Viewed by 298

Abstract

This paper presents an automatic system for the classification of musical instruments from audio recordings. The project leverages deep learning (DL) techniques to achieve its objective, exploring three different classification approaches based on distinct input representations. The first method involves the extraction of [...] Read more.

This paper presents an automatic system for the classification of musical instruments from audio recordings. The project leverages deep learning (DL) techniques to achieve its objective, exploring three different classification approaches based on distinct input representations. The first method involves the extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from the audio files, which are then fed into a two-dimensional convolutional neural network (Conv2D). The second approach makes use of mel-spectrogram images as input to a similar Conv2D architecture. The third approach employs conventional machine learning (ML) classifiers, including Logistic Regression, K-Nearest Neighbors, and Random Forest, trained on MFCC-derived feature vectors. To gain insight into the behavior of the DL model, explainability techniques were applied to the Conv2D model using mel-spectrograms, allowing for a better understanding of how the network interprets relevant features for classification. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was employed on the MFCC vectors to visualize how instrument classes are organized in the feature space. One of the main challenges encountered was the class imbalance within the dataset, which was addressed by assigning class-specific weights during training. The results, in terms of classification accuracy, were very satisfactory across all approaches, with the convolutional models and Random Forest achieving around 97–98%, and Logistic Regression yielding slightly lower performance. In conclusion, the proposed methods proved effective for the selected dataset, and future work may focus on further improving class balance techniques. Full article

(This article belongs to the Special Issue Artificial Intelligence for Acoustics and Audio Signal Processing)

► Show Figures

Figure 1

16 pages, 2720 KB

Open AccessArticle

Shale Oil T₂ Spectrum Inversion Method Based on Autoencoder and Fourier Transform

by Jun Zhao, Shixiang Jiao, Li Bai, Bing Xie, Yan Chen, Zhenguan Wu and Shaomin Zhang

Geosciences 2025, 15(10), 387; https://doi.org/10.3390/geosciences15100387 - 4 Oct 2025

Viewed by 273

Abstract

Accurate inversion of the T₂ spectrum of shale oil reservoir fluids is crucial for reservoir evaluation. However, traditional nuclear magnetic resonance inversion methods face challenges in extracting features from multi-exponential decay signals. This study proposed an inversion method that combines autoencoder (AE) [...] Read more.

Accurate inversion of the T₂ spectrum of shale oil reservoir fluids is crucial for reservoir evaluation. However, traditional nuclear magnetic resonance inversion methods face challenges in extracting features from multi-exponential decay signals. This study proposed an inversion method that combines autoencoder (AE) and Fourier transform, aiming to enhance the accuracy and stability of T₂ spectrum estimation for shale oil reservoirs. The autoencoder is employed to automatically extract deep features from the echo train, while the Fourier transform is used to enhance frequency domain features of multi-exponential decay information. Furthermore, this paper designs a customized weighted loss function based on a self-attention mechanism to focus the model’s learning capability on peak regions, thereby mitigating the negative impact of zero-value regions on model training. Experimental results demonstrate significant improvements in inversion accuracy, noise resistance, and computational efficiency compared to traditional inversion methods. This research provides an efficient and reliable new approach for precise evaluation of the T₂ spectrum in shale oil reservoirs. Full article

(This article belongs to the Section Geophysics)

► Show Figures

Figure 1

Search Results (806)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (806)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI