MDPI - Publisher of Open Access Journals

23 pages, 5772 KB

Open AccessArticle

A Multimodal Voice Phishing Detection System Integrating Text and Audio Analysis

by Jiwon Kim, Seuli Gu, Youngbeom Kim, Sukwon Lee and Changgu Kang

Appl. Sci. 2025, 15(20), 11170; https://doi.org/10.3390/app152011170 (registering DOI) - 18 Oct 2025

Voice phishing has emerged as a critical security threat, exploiting both linguistic manipulation and advances in synthetic speech technologies. Traditional keyword-based approaches often fail to capture contextual patterns or detect forged audio, limiting their effectiveness in real-world scenarios. To address this gap, we [...] Read more.

Voice phishing has emerged as a critical security threat, exploiting both linguistic manipulation and advances in synthetic speech technologies. Traditional keyword-based approaches often fail to capture contextual patterns or detect forged audio, limiting their effectiveness in real-world scenarios. To address this gap, we propose a multimodal voice phishing detection system that integrates text and audio analysis. The text module employs a KoBERT-based transformer classifier with self-attention interpretation, while the audio module leverages MFCC features and a CNN–BiLSTM classifier to identify synthetic speech. A fusion mechanism combines the outputs of both modalities, with experiments conducted on real-world call transcripts, phishing datasets, and synthetic voice corpora. The results demonstrate that the proposed system consistently achieves high values regarding the accuracy, precision, recall, and F1-score on validation data while maintaining robust performance in noisy and diverse real-call scenarios. Furthermore, attention-based interpretability enhances trustworthiness by revealing cross-token and discourse-level interaction patterns specific to phishing contexts. These findings highlight the potential of the proposed system as a reliable, explainable, and deployable solution for preventing the financial and social damage caused by voice phishing. Unlike prior studies limited to single-modality or shallow fusion, our work presents a fully integrated text–audio detection pipeline optimized for Korean real-world datasets and robust to noisy, multi-speaker conditions. Full article

22 pages, 964 KB

Open AccessArticle

Multi-Modal Emotion Detection and Tracking System Using AI Techniques

by Werner Mostert, Anish Kurien and Karim Djouani

Computers 2025, 14(10), 441; https://doi.org/10.3390/computers14100441 - 16 Oct 2025

Abstract

Emotion detection significantly impacts healthcare by enabling personalized patient care and improving treatment outcomes. Single-modality emotion recognition often lacks reliability due to the complexity and subjectivity of human emotions. This study proposes a multi-modal emotion detection platform integrating visual, audio, and heart rate [...] Read more.

Emotion detection significantly impacts healthcare by enabling personalized patient care and improving treatment outcomes. Single-modality emotion recognition often lacks reliability due to the complexity and subjectivity of human emotions. This study proposes a multi-modal emotion detection platform integrating visual, audio, and heart rate data using AI techniques, including convolutional neural networks and support vector machines. The system outperformed single-modality approaches, demonstrating enhanced accuracy and robustness. This improvement underscores the value of multi-modal AI in emotion detection, offering potential benefits across healthcare, education, and human–computer interaction. Full article

(This article belongs to the Special Issue Advances in Semantic Multimedia and Personalized Digital Content)

► Show Figures

Figure 1

15 pages, 2694 KB

Open AccessArticle

Seismic Facies Recognition Based on Multimodal Network with Knowledge Graph

by Binpeng Yan, Mutian Li, Rui Pan and Jiaqi Zhao

Appl. Sci. 2025, 15(20), 11087; https://doi.org/10.3390/app152011087 - 16 Oct 2025

Abstract

Seismic facies recognition constitutes a fundamental task in seismic data interpretation, playing an essential role in characterizing subsurface geological structures, sedimentary environments, and hydrocarbon reservoir distributions. Conventional approaches primarily depend on expert interpretation, which often introduces substantial subjectivity and operational inefficiency. Although deep [...] Read more.

Seismic facies recognition constitutes a fundamental task in seismic data interpretation, playing an essential role in characterizing subsurface geological structures, sedimentary environments, and hydrocarbon reservoir distributions. Conventional approaches primarily depend on expert interpretation, which often introduces substantial subjectivity and operational inefficiency. Although deep learning-based methods have been introduced, most rely solely on unimodal data—namely, seismic images—and encounter challenges such as limited annotated samples and inadequate generalization capability. To overcome these limitations, this study proposes a multimodal seismic facies recognition framework named GAT-UKAN, which integrates a U-shaped Kolmogorov–Arnold Network (U-KAN) with a Graph Attention Network (GAT). This model is designed to accept dual-modality inputs. By fusing visual features with knowledge embeddings at intermediate network layers, the model achieves knowledge-guided feature refinement. This approach effectively mitigates issues related to limited samples and poor generalization inherent in single-modality frameworks. Experiments were conducted on the F3 block dataset from the North Sea. A knowledge graph comprising 47 entities and 12 relation types was constructed to incorporate expert knowledge. The results indicate that GAT-UKAN achieved a Pixel Accuracy of 89.7% and a Mean Intersection over Union of 70.6%, surpassing the performance of both U-Net and U-KAN. Furthermore, the model was transferred to the Parihaka field in New Zealand via transfer learning. After fine-tuning, the predictions exhibited strong alignment with seismic profiles, demonstrating the model’s robustness under complex geological conditions. Although the proposed model demonstrates excellent performance in accuracy and robustness, it has so far been validated only on 2D seismic profiles. Its capability to characterize continuous 3D geological features therefore remains limited. Full article

► Show Figures

Figure 1

15 pages, 3464 KB

Open AccessArticle

Multimode Magneto-Optical Fiber Based on Borogermanate Glass Containing Tb³⁺ for Sensing Applications

by Douglas F. Franco, Steeve Morency, Younès Messaddeq and Marcelo Nalin

Materials 2025, 18(20), 4736; https://doi.org/10.3390/ma18204736 - 16 Oct 2025

Viewed by 83

Abstract

A multimode magneto-optical fiber based on Tb³⁺-containing borogermanate glass was designed, fabricated, and characterized, aiming at potential sensing applications. There are continuing challenges in the development of single-mode (SMF) or multimode (MMF) optical fibers doped with rare-earth (RE) ions and exhibiting [...] Read more.

A multimode magneto-optical fiber based on Tb³⁺-containing borogermanate glass was designed, fabricated, and characterized, aiming at potential sensing applications. There are continuing challenges in the development of single-mode (SMF) or multimode (MMF) optical fibers doped with rare-earth (RE) ions and exhibiting high Verdet constants, related to devitrification of the precursor glass. Most RE-doped glass compositions are not suitable as precursors for core-cladding fiber production due to devitrification processes and consequent poor optical quality. Application as Faraday rotators is limited by the intrinsically low Verdet constant of silica (~0.589 rad T⁻¹ m⁻¹ at 1550 nm and 0.876 rad T⁻¹ m⁻¹ at 1310 nm). Borogermanate glasses are good candidates for manufacturing optical fibers due to their excellent potential to solubilize high concentrations of Tb³⁺ ions as well as satisfactory thermal stability. In this work, a magneto-optical core-cladding borogermanate fiber with a 227 μm diameter was fabricated, with characterization using differential scanning calorimetry (DSC), thermomechanical analysis (TMA), viscosity measurements, M-lines spectroscopy, UV-Vis-NIR absorption spectroscopy, the cut-back technique, and magneto-optical measurements. The measured numerical aperture (NA) was 0.183, with minimum attenuation of 13 dB m⁻¹ at 1270 nm. The Verdet constant (V_B) reached −6.74 rad T⁻¹ m⁻¹ at 1330 nm. Full article

(This article belongs to the Special Issue Advanced Rare Earth Doped Functional Materials)

► Show Figures

Figure 1

20 pages, 4914 KB

Open AccessArticle

Dual-Channel Parallel Multimodal Feature Fusion for Bearing Fault Diagnosis

by Wanrong Li, Haichao Cai, Xiaokang Yang, Yujun Xue, Jun Ye and Xiangyi Hu

Machines 2025, 13(10), 950; https://doi.org/10.3390/machines13100950 - 15 Oct 2025

Viewed by 163

Abstract

In recent years, the powerful feature extraction capabilities of deep learning have attracted widespread attention in the field of bearing fault diagnosis. To address the limitations of single-modal and single-channel feature extraction methods, which often result in incomplete information representation and difficulty in [...] Read more.

In recent years, the powerful feature extraction capabilities of deep learning have attracted widespread attention in the field of bearing fault diagnosis. To address the limitations of single-modal and single-channel feature extraction methods, which often result in incomplete information representation and difficulty in obtaining high-quality fault features, this paper proposes a dual-channel parallel multimodal feature fusion model for bearing fault diagnosis. In this method, the one-dimensional vibration signals are first transformed into two-dimensional time-frequency representations using continuous wavelet transform (CWT). Subsequently, both the one-dimensional vibration signals and the two-dimensional time-frequency representations are fed simultaneously into the dual-branch parallel model. Within this architecture, the first branch employs a combination of a one-dimensional convolutional neural network (1DCNN) and a bidirectional gated recurrent unit (BiGRU) to extract temporal features from the one-dimensional vibration signals. The second branch utilizes a dilated convolutional to capture spatial time–frequency information from the CWT-derived two-dimensional time–frequency representations. The features extracted by both branches were are input into the feature fusion layer. Furthermore, to leverage fault features more comprehensively, a channel attention mechanism is embedded after the feature fusion layer. This enables the network to focus more effectively on salient features across channels while suppressing interference from redundant features, thereby enhancing the performance and accuracy of the dual-branch network. Finally, the fused fault features are passed to a softmax classifier for fault classification. Experimental results demonstrate that the proposed method achieved an average accuracy of 99.50% on the Case Western Reserve University (CWRU) bearing dataset and 97.33% on the Southeast University (SEU) bearing dataset. These results confirm that the suggested model effectively improves fault diagnosis accuracy and exhibits strong generalization capability. Full article

(This article belongs to the Section Machines Testing and Maintenance)

► Show Figures

Figure 1

18 pages, 4337 KB

Open AccessArticle

A Transformer-Based Multimodal Fusion Network for Emotion Recognition Using EEG and Facial Expressions in Hearing-Impaired Subjects

by Shuni Feng, Qingzhou Wu, Kailin Zhang and Yu Song

Sensors 2025, 25(20), 6278; https://doi.org/10.3390/s25206278 - 10 Oct 2025

Viewed by 323

Abstract

Hearing-impaired people face challenges in expressing and perceiving emotions, and traditional single-modal emotion recognition methods demonstrate limited effectiveness in complex environments. To enhance recognition performance, this paper proposes a multimodal fusion neural network based on a multimodal multi-head attention fusion neural network (MMHA-FNN). [...] Read more.

Hearing-impaired people face challenges in expressing and perceiving emotions, and traditional single-modal emotion recognition methods demonstrate limited effectiveness in complex environments. To enhance recognition performance, this paper proposes a multimodal fusion neural network based on a multimodal multi-head attention fusion neural network (MMHA-FNN). This method utilizes differential entropy (DE) and bilinear interpolation features as inputs, learning the spatial–temporal characteristics of brain regions through an MBConv-based module. By incorporating the Transformer-based multi-head self-attention mechanism, we dynamically model the dependencies between EEG and facial expression features, enabling adaptive weighting and deep interaction of cross-modal characteristics. The experiment conducted a four-classification task on the MED-HI dataset (15 subjects, 300 trials). The taxonomy included happy, sad, fear, and calmness, where ‘calmness’ corresponds to a low-arousal neutral state as defined in the MED-HI protocol. Results indicate that the proposed method achieved an average accuracy of 81.14%, significantly outperforming feature concatenation (71.02%) and decision layer fusion (69.45%). This study demonstrates the complementary nature of EEG and facial expressions in emotion recognition among hearing-impaired individuals and validates the effectiveness of feature layer interaction fusion based on attention mechanisms in enhancing emotion recognition performance. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

21 pages, 6844 KB

Open AccessArticle

MMFNet: A Mamba-Based Multimodal Fusion Network for Remote Sensing Image Semantic Segmentation

by Jingting Qiu, Wei Chang, Wei Ren, Shanshan Hou and Ronghao Yang

Sensors 2025, 25(19), 6225; https://doi.org/10.3390/s25196225 - 8 Oct 2025

Viewed by 586

Abstract

Accurate semantic segmentation of high-resolution remote sensing imagery is challenged by substantial intra-class variability, inter-class similarity, and the limitations of single-modality data. This paper proposes MMFNet, a novel multimodal fusion network that leverages the Mamba architecture to efficiently capture long-range dependencies for semantic [...] Read more.

Accurate semantic segmentation of high-resolution remote sensing imagery is challenged by substantial intra-class variability, inter-class similarity, and the limitations of single-modality data. This paper proposes MMFNet, a novel multimodal fusion network that leverages the Mamba architecture to efficiently capture long-range dependencies for semantic segmentation tasks. MMFNet adopts a dual-encoder design, combining ResNet-18 for local detail extraction and VMamba for global contextual modelling, striking a balance between segmentation accuracy and computational efficiency. A Multimodal Feature Fusion Block (MFFB) is introduced to effectively integrate complementary information from optical imagery and digital surface models (DSMs), thereby enhancing multimodal feature interaction and improving segmentation accuracy. Furthermore, a frequency-aware upsampling module (FreqFusion) is incorporated in the decoder to enhance boundary delineation and recover fine spatial details. Extensive experiments on the ISPRS Vaihingen and Potsdam benchmarks demonstrate that MMFNet achieves mean IoU scores of 83.50% and 86.06%, outperforming eight state-of-the-art methods while maintaining relatively low computational complexity. These results highlight MMFNet’s potential for efficient and accurate multimodal semantic segmentation in remote sensing applications. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

29 pages, 23948 KB

Open AccessArticle

CAGMC-Defence: A Cross-Attention-Guided Multimodal Collaborative Defence Method for Multimodal Remote Sensing Image Target Recognition

by Jiahao Cui, Hang Cao, Lingquan Meng, Wang Guo, Keyi Zhang, Qi Wang, Cheng Chang and Haifeng Li

Remote Sens. 2025, 17(19), 3300; https://doi.org/10.3390/rs17193300 - 25 Sep 2025

Viewed by 375

Abstract

With the increasing diversity of remote sensing modalities, multimodal image fusion improves target recognition accuracy but also introduces new security risks. Adversaries can inject small, imperceptible perturbations into a single modality to mislead model predictions, which undermines system reliability. Most existing defences are [...] Read more.

With the increasing diversity of remote sensing modalities, multimodal image fusion improves target recognition accuracy but also introduces new security risks. Adversaries can inject small, imperceptible perturbations into a single modality to mislead model predictions, which undermines system reliability. Most existing defences are designed for single-modal inputs and face two key challenges in multimodal settings: 1. vulnerability to perturbation propagation due to static fusion strategies, and 2. the lack of collaborative mechanisms that limit overall robustness according to the weakest modality. To address these issues, we propose CAGMC-Defence, a cross-attention-guided multimodal collaborative defence framework for multimodal remote sensing. It contains two main modules. The Multimodal Feature Enhancement and Fusion (MFEF) module adopts a pseudo-Siamese network and cross-attention to decouple features, capture intermodal dependencies, and suppress perturbation propagation through weighted regulation and consistency alignment. The Multimodal Adversarial Training (MAT) module jointly generates optical and SAR adversarial examples and optimizes network parameters under consistency loss, enhancing robustness and generalization. Experiments on the WHU-OPT-SAR dataset show that CAGMC-Defence maintains stable performance under various typical adversarial attacks, such as FGSM, PGD, and MIM, retaining 85.74% overall accuracy even under the strongest white-box MIM attack (

ϵ = 0.05

), significantly outperforming existing multimodal defence baselines. Full article

(This article belongs to the Special Issue Advances in Multimodal Remote Sensing Data: Processing, Fusion and Applications)

► Show Figures

Figure 1

72 pages, 4170 KB

Open AccessSystematic Review

Digital Twin Cognition: AI-Biomarker Integration in Biomimetic Neuropsychology

by Evgenia Gkintoni and Constantinos Halkiopoulos

Biomimetics 2025, 10(10), 640; https://doi.org/10.3390/biomimetics10100640 - 23 Sep 2025

Viewed by 1272

Abstract

(1) Background: The convergence of digital twin technology, artificial intelligence, and multimodal biomarkers heralds a transformative era in neuropsychological assessment and intervention. Digital twin cognition represents an emerging paradigm that creates dynamic, personalized virtual models of individual cognitive systems, enabling continuous monitoring, predictive [...] Read more.

(1) Background: The convergence of digital twin technology, artificial intelligence, and multimodal biomarkers heralds a transformative era in neuropsychological assessment and intervention. Digital twin cognition represents an emerging paradigm that creates dynamic, personalized virtual models of individual cognitive systems, enabling continuous monitoring, predictive modeling, and precision interventions. This systematic review comprehensively examines the integration of AI-driven biomarkers within biomimetic neuropsychological frameworks to advance personalized cognitive health. (2) Methods: Following PRISMA 2020 guidelines, we conducted a systematic search across six major databases spanning medical, neuroscience, and computer science disciplines for literature published between 2014 and 2024. The review synthesized evidence addressing five research questions examining framework integration, predictive accuracy, clinical translation, algorithm effectiveness, and neuropsychological validity. (3) Results: Analysis revealed that multimodal integration approaches combining neuroimaging, physiological, behavioral, and digital phenotyping data substantially outperformed single-modality assessments. Deep learning architectures demonstrated superior pattern recognition capabilities, while traditional machine learning maintained advantages in interpretability and clinical implementation. Successful frameworks, particularly for neurodegenerative diseases and multiple sclerosis, achieved earlier detection, improved treatment personalization, and enhanced patient outcomes. However, significant challenges persist in algorithm interpretability, population generalizability, and the integration of healthcare systems. Critical analysis reveals that high-accuracy claims (85–95%) predominantly derive from small, homogeneous cohorts with limited external validation. Real-world performance in diverse clinical settings likely ranges 10–15% lower, emphasizing the need for large-scale, multi-site validation studies before clinical deployment. (4) Conclusions: Digital twin cognition establishes a new frontier in personalized neuropsychology, offering unprecedented opportunities for early detection, continuous monitoring, and adaptive interventions while requiring continued advancement in standardization, validation, and ethical frameworks. Full article

(This article belongs to the Special Issue Innovative Biomimetics: Integrating Machine Learning, Neuropsychology, and Cognitive Neuroscience in Applied Psychological Research)

► Show Figures

Figure 1

18 pages, 4817 KB

Open AccessArticle

A Multimodal Deep Learning Framework for Accurate Wildfire Segmentation Using RGB and Thermal Imagery

by Tao Yue, Hong Huang, Qingyang Wang, Bo Song and Yun Chen

Appl. Sci. 2025, 15(18), 10268; https://doi.org/10.3390/app151810268 - 21 Sep 2025

Viewed by 496

Abstract

Wildfires pose serious threats to ecosystems, human life, and climate stability, underscoring the urgent need for accurate monitoring. Traditional approaches based on either optical or thermal imagery often fail under challenging conditions such as lighting interference, varying data sources, or small-scale flames, as [...] Read more.

Wildfires pose serious threats to ecosystems, human life, and climate stability, underscoring the urgent need for accurate monitoring. Traditional approaches based on either optical or thermal imagery often fail under challenging conditions such as lighting interference, varying data sources, or small-scale flames, as they do not account for the hierarchical nature of feature representations. To overcome these limitations, we propose a multimodal deep learning framework that integrates visible (RGB) and thermal infrared (TIR) imagery for accurate wildfire segmentation. The framework incorporates edge-guided supervision and multilevel fusion to capture fine fire boundaries while exploiting complementary information from both modalities. To assess its effectiveness, we constructed a multi-scale flame segmentation dataset and validated the method across diverse conditions, including different data sources, lighting environments, and five flame size categories ranging from small to large. Experimental results show that BFCNet achieves an IoU of 88.25% and an F1 score of 93.76%, outperforming both single-modality and existing multimodal approaches across all evaluation tasks. These results demonstrate the potential of multimodal deep learning to enhance wildfire monitoring, offering practical value for disaster management, ecological protection, and the deployment of autonomous aerial surveillance systems. Full article

► Show Figures

Figure 1

9 pages, 1995 KB

Open AccessArticle

Silicon-Based Multimode Complex Bragg Gratings for Spectra-Tailored Filter

by Xiuqiu Shen, Huifang Kang, Wangping Wang, Xiong Liang and Huiye Qiu

Photonics 2025, 12(9), 924; https://doi.org/10.3390/photonics12090924 - 17 Sep 2025

Viewed by 528

Abstract

Multimode waveguide Bragg gratings (MWBGs) provide significant advantages over traditional single-mode counterparts through their mode-coupling operations. Nevertheless, flexible spectral response design methodologies for MWBG-based filter remain less studied. This work introduces a spectral tailoring methodology enabling physically realizable complex responses in MWBGs. We [...] Read more.

Multimode waveguide Bragg gratings (MWBGs) provide significant advantages over traditional single-mode counterparts through their mode-coupling operations. Nevertheless, flexible spectral response design methodologies for MWBG-based filter remain less studied. This work introduces a spectral tailoring methodology enabling physically realizable complex responses in MWBGs. We demonstrate silicon-based multi-channel Gaussian-shaped MWBGs using lateral phase delay modulation (LPDM) apodization. Experimental results confirm close conformance between measured spectral responses and target design specifications. Full article

(This article belongs to the Special Issue Recent Advancement in Microwave Photonics)

► Show Figures

Figure 1

33 pages, 13243 KB

Open AccessArticle

Maize Yield Prediction via Multi-Branch Feature Extraction and Cross-Attention Enhanced Multimodal Data Fusion

by Suning She, Zhiyun Xiao and Yulong Zhou

Agronomy 2025, 15(9), 2199; https://doi.org/10.3390/agronomy15092199 - 16 Sep 2025

Viewed by 500

Abstract

This study conducted field experiments in 2024 in Meidaizhao Town, Tumed Right Banner, Baotou City, Inner Mongolia Autonomous Region, adopting a plant-level sampling design with 10 maize plots selected as sampling areas (20 plants per plot). At four critical growth stages—jointing, heading, filling, [...] Read more.

This study conducted field experiments in 2024 in Meidaizhao Town, Tumed Right Banner, Baotou City, Inner Mongolia Autonomous Region, adopting a plant-level sampling design with 10 maize plots selected as sampling areas (20 plants per plot). At four critical growth stages—jointing, heading, filling, and maturity—multimodal data, including that covering leaf spectra, root-zone soil spectra, and leaf chlorophyll and nitrogen content, were synchronously collected from each plant. In response to the prevalent limitations of the existing yield prediction methods, such as insufficient accuracy and limited generalization ability due to reliance on single-modal data, this study takes the acquired multimodal maize data as the research object and innovatively proposes a multimodal fusion prediction network. First, to handle the heterogeneous nature of multimodal data, a parallel feature extraction architecture is designed, utilizing independent feature extraction branches—leaf spectral branch, soil spectral branch, and biochemical parameter branch—to preserve the distinct characteristics of each modality. Subsequently, a dual-path feature fusion method, enhanced by a cross-attention mechanism, is introduced to enable dynamic interaction and adaptive weight allocation between cross-modal features, specifically between leaf spectra–soil spectra and leaf spectra–biochemical parameters, thereby significantly improving maize yield prediction accuracy. The experimental results demonstrate that the proposed model outperforms single-modal approaches by effectively leveraging complementary information from multimodal data, achieving an R² of 0.951, an RMSE of 8.68, an RPD of 4.50, and an MAE of 5.28. Furthermore, the study reveals that deep fusion between soil spectra, leaf biochemical parameters, and leaf spectral data substantially enhances prediction accuracy. This work not only validates the effectiveness of multimodal data fusion in maize yield prediction but also provides valuable insights for accurate and non-destructive yield prediction. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

22 pages, 9649 KB

Open AccessArticle

DTC-YOLO: Multimodal Object Detection via Depth-Texture Coupling and Dynamic Gating Optimization

by Wei Xu, Xiaodong Du, Ruochen Li and Lei Xing

Sensors 2025, 25(18), 5731; https://doi.org/10.3390/s25185731 - 14 Sep 2025

Viewed by 784

Abstract

To address the inherent limitations of single-modality sensors constrained by physical properties and data modalities, we propose DTC-YOLO (Depth-Texture Coupling Mechanism YOLO), a depth-texture coupled multimodal detection framework. The main contributions are as follows: RGB-LiDAR (RGB-Light Detection and Ranging) Fusion: We propose a [...] Read more.

To address the inherent limitations of single-modality sensors constrained by physical properties and data modalities, we propose DTC-YOLO (Depth-Texture Coupling Mechanism YOLO), a depth-texture coupled multimodal detection framework. The main contributions are as follows: RGB-LiDAR (RGB-Light Detection and Ranging) Fusion: We propose a depth-color mapping and weighted fusion strategy to effectively integrate depth and texture features. ADF³-Net (Adaptive Dimension-aware Focused Fusion Network): A feature fusion network with hierarchical perception, channel decoupling, and spatial adaptation. A dynamic gated fusion mechanism enables adaptive weighting across multidimensional features, thereby enhancing depth-texture representation. Adown Module: A dual-path adaptive downsampling module that separates high-frequency details from low-frequency semantics, reducing GFLOPs (Giga Floating-point Operations Per Second) by 10.53% while maintaining detection performance. DTC-YOLO achieves substantial improvements over the baseline: +3.50% mAP50, +3.40% mAP50-95, and +3.46% precision. Moreover, it maintains moderate improvements for medium-scale objects while significantly enhancing detection of extremely large and small objects, effectively mitigating the scale-related accuracy discrepancies of vision-only models in complex traffic environments. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 11250 KB

Open AccessArticle

Fault Diagnosis of Wind Turbine Rotating Bearing Based on Multi-Mode Signal Enhancement and Fusion

by Shaohu Ding, Guangsheng Zhou, Xinyu Wang and Weibin Li

Entropy 2025, 27(9), 951; https://doi.org/10.3390/e27090951 - 13 Sep 2025

Viewed by 432

Abstract

Wind turbines operate under harsh conditions, heightening the risk of rotating bearing failures. While fault diagnosis using acoustic or vibration signals is feasible, single-modal methods are highly vulnerable to environmental noise and system uncertainty, reducing diagnostic accuracy. Existing multi-modal approaches also struggle with [...] Read more.

Wind turbines operate under harsh conditions, heightening the risk of rotating bearing failures. While fault diagnosis using acoustic or vibration signals is feasible, single-modal methods are highly vulnerable to environmental noise and system uncertainty, reducing diagnostic accuracy. Existing multi-modal approaches also struggle with noise interference and lack causal feature exploration, limiting fusion performance and generalization. To address these issues, this paper proposes CAVF-Net—a novel framework integrating bidirectional cross-attention (BCA) and causal inference (CI). It enhances Mel-Frequency Cepstral Coefficients (MFCCs) of acoustic and short-time Fourier transform (STFT) features of vibration via BCA and employs CI to derive adaptive fusion weights, effectively preserving causal relationships and achieving robust cross-modal integration. The fused features are classified for fault diagnosis under real-world conditions. Experiments show that CAVF-Net attains 99.2% accuracy with few iterations on clean data and maintains 95.42% accuracy in high-entropy multi-noise environments—outperforming single-model acoustic and vibration by 16.32% and 8.86%, respectively, while significantly reducing information uncertainty in downstream classification. Full article

(This article belongs to the Special Issue Failure Diagnosis of Complex Systems)

► Show Figures

Figure 1

34 pages, 3067 KB

Open AccessArticle

NRGAMTE: Neurophysiological Residual Gated Attention Multimodal Transformer Encoder for Sleep Disorder Detection

by Jayapoorani Subramaniam, Aruna Mogarala Guruvaya, Anupama Vijaykumar and Puttamadappa Chaluve Gowda

Brain Sci. 2025, 15(9), 985; https://doi.org/10.3390/brainsci15090985 - 13 Sep 2025

Viewed by 579

Abstract

Background/Objective: Sleep is significant for human mental and physical health. Sleep disorders represent a crucial risk to human health, and a large portion of the world population suffers from them. The efficient identification of sleep disorders is significant for effective treatment. However, the [...] Read more.

Background/Objective: Sleep is significant for human mental and physical health. Sleep disorders represent a crucial risk to human health, and a large portion of the world population suffers from them. The efficient identification of sleep disorders is significant for effective treatment. However, the precise and automatic detection of sleep disorders remains challenging due to the inter-subject variability, overlapping symptoms, and reliance on single-modality physiological signals. Methods: To address these challenges, a Neurophysiological Residual Gated Attention Multimodal Transformer Encoder (NRGAMTE) model was developed for robust sleep disorder detection using multimodal physiological signals, including Electroencephalogram (EEG), Electromyogram (EMG), and Electrooculogram (EOG). Initially, raw signals are segmented into 30-s windows and processed to capture the significant time- and frequency-domain features. Every modality is independently embedded by a One-Dimensional Convolutional Neural Network (1D-CNN), which preserves signal-specific characteristics. A Modality-wise Residual Gated Cross-Attention Fusion (MRGCAF) mechanism is introduced to select significant cross-modal interactions, while the learnable residual path ensures that the most relevant features are retained during the gating process. Results: The developed NRGAMTE model achieved an accuracy of 94.51% on the Sleep-EDF expanded dataset and 99.64% on the Cyclic Alternating Pattern (CAP Sleep database), significantly outperforming the existing single- and multimodal algorithms in terms of robustness and computational efficiency. Conclusions: The results shows that NRGAMTE obtains high performance across multiple datasets, significantly improving detection accuracy. This demonstrated their potential as a reliable tool for clinical sleep disorder detection. Full article

(This article belongs to the Section Sleep and Circadian Neuroscience)

► Show Figures

Figure 1

Search Results (314)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (314)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI