MDPI - Publisher of Open Access Journals

47 pages, 3959 KB

Open AccessReview

A Review of Deep Learning in Rotating Machinery Fault Diagnosis and Its Prospects for Port Applications

by Haifeng Wang, Hui Wang and Xianqiong Tang

Appl. Sci. 2025, 15(21), 11303; https://doi.org/10.3390/app152111303 - 22 Oct 2025

Viewed by 242

As port operations rapidly evolve toward intelligent and heavy-duty applications, fault diagnosis for core equipment demands higher levels of real-time performance and robustness. Deep learning, with its powerful autonomous feature learning capabilities, demonstrates significant potential in mechanical fault prediction and health management. This [...] Read more.

As port operations rapidly evolve toward intelligent and heavy-duty applications, fault diagnosis for core equipment demands higher levels of real-time performance and robustness. Deep learning, with its powerful autonomous feature learning capabilities, demonstrates significant potential in mechanical fault prediction and health management. This paper first provides a systematic review of deep learning research advances in rotating machinery fault diagnosis over the past eight years, focusing on the technical approaches and application cases of four representative models: Deep Belief Networks (DBNs), Convolutional Neural Networks (CNNs), Auto-encoders (AEs), and Recurrent Neural Networks (RNNs). These models, respectively, embody four core paradigms, unsupervised feature generation, spatial pattern extraction, data reconstruction learning, and temporal dependency modeling, forming the technological foundation of contemporary intelligent diagnostics. Building upon this foundation, this paper delves into the unique challenges encountered when transferring these methods from generic laboratory components to specialized port equipment such as shore cranes and yard cranes—including complex operating conditions, harsh environments, and system coupling. It further explores future research directions, including cross-condition transfer, multi-source information fusion, and lightweight deployment, aiming to provide theoretical references and implementation pathways for the technological advancement of intelligent operation and maintenance in port equipment. Full article

► Show Figures

Figure 1

15 pages, 3296 KB

Open AccessArticle

EmbTCN-Transformer: An Embedding Temporal Convolutional Network–Transformer Model for Multi-Trajectory Prediction

by Ao Chen, Haotian Chen, Zhenxin Zhang, Mingkai Yang and Yang-Yang Chen

Mathematics 2025, 13(20), 3306; https://doi.org/10.3390/math13203306 - 16 Oct 2025

Viewed by 256

Abstract

This paper addresses the multi-trajectory prediction problem and a so-called Embedded-TCN-Transformer (EmbTCN-Transformer) model is designed by using the real-time historical trajectories in a formation. A temporal convolutional network (TCN) is utilized as the input embedding to introduce temporal awareness capabilities into the model. [...] Read more.

This paper addresses the multi-trajectory prediction problem and a so-called Embedded-TCN-Transformer (EmbTCN-Transformer) model is designed by using the real-time historical trajectories in a formation. A temporal convolutional network (TCN) is utilized as the input embedding to introduce temporal awareness capabilities into the model. Then, the self-attention mechanism is incorporated as the backbone to extract correlations among different positions of the trajectory. An encoder–decoder structure is adopted to generate future trajectories. Ablation experiments validate the effectiveness of the EmbTCN-Transformer, showing that the TCN-based input embedding and the self-attention mechanism contribute to 30% and 80% reductions in prediction error, respectively. Comparative experiments further demonstrate the superiority of the proposed model, achieving at least 60% and 10% performance improvements over Recurrent Neural Network (RNN)-based networks and the conventional Transformer, respectively. Full article

(This article belongs to the Special Issue Augmented Control: Algorithms and Applications)

► Show Figures

Figure 1

14 pages, 2107 KB

Open AccessArticle

Agricultural Knowledge-Enhanced Deep Learning for Joint Intent Detection and Slot Filling

by Mingtang Liu, Shanshan Wu, Wenlong Tian, Shuo Lei and Jiahao Miao

Appl. Sci. 2025, 15(20), 10932; https://doi.org/10.3390/app152010932 - 11 Oct 2025

Viewed by 283

Abstract

Intent detection and slot filling are fundamental components for constructing intelligent question-answering systems in agricultural domains. Existing approaches show notable limitations in semantic feature extraction and achieve relatively low accuracy when processing domain-specific agricultural queries with complex terminology and contextual dependencies. To address [...] Read more.

Intent detection and slot filling are fundamental components for constructing intelligent question-answering systems in agricultural domains. Existing approaches show notable limitations in semantic feature extraction and achieve relatively low accuracy when processing domain-specific agricultural queries with complex terminology and contextual dependencies. To address these challenges, this paper proposes an agricultural knowledge-enhanced deep learning approach that integrates agricultural domain knowledge and terminology with advanced neural architectures. The method integrates HanLP-based agricultural terminology processing with BERT contextual encoding, TextCNN feature extraction, and attention-based fusion. Experimental validation on a curated domain-specific agricultural dataset of 8041 melon cultivation queries demonstrates that the proposed model achieves an accuracy of 79.6%, recall of 80.1%, and F1-score of 79.8%, demonstrating significant improvements (7–22% performance gains) over baseline methods including TextRNN, TextRCNN, TextCNN, and BERT-TextCNN models. The results demonstrate significant potential for advancing intelligent agricultural advisory systems and domain-specific natural language understanding applications, particularly for precision agriculture applications. Full article

(This article belongs to the Section Agricultural Science and Technology)

► Show Figures

Figure 1

27 pages, 2279 KB

Open AccessArticle

HQRNN-FD: A Hybrid Quantum Recurrent Neural Network for Fraud Detection

by Yao-Chong Li, Yi-Fan Zhang, Rui-Qing Xu, Ri-Gui Zhou and Yi-Lin Dong

Entropy 2025, 27(9), 906; https://doi.org/10.3390/e27090906 - 27 Aug 2025

Cited by 1 | Viewed by 977

Abstract

Detecting financial fraud is a critical aspect of modern intelligent financial systems. Despite the advances brought by deep learning in predictive accuracy, challenges persist—particularly in capturing complex, high-dimensional nonlinear features. This study introduces a novel hybrid quantum recurrent neural network for fraud detection [...] Read more.

Detecting financial fraud is a critical aspect of modern intelligent financial systems. Despite the advances brought by deep learning in predictive accuracy, challenges persist—particularly in capturing complex, high-dimensional nonlinear features. This study introduces a novel hybrid quantum recurrent neural network for fraud detection (HQRNN-FD). The model utilizes variational quantum circuits (VQCs) incorporating angle encoding, data reuploading, and hierarchical entanglement to project transaction features into quantum state spaces, thereby facilitating quantum-enhanced feature extraction. For sequential analysis, the model integrates a recurrent neural network (RNN) with a self-attention mechanism to effectively capture temporal dependencies and uncover latent fraudulent patterns. To mitigate class imbalance, the synthetic minority over-sampling technique (SMOTE) is employed during preprocessing, enhancing both class representation and model generalizability. Experimental evaluations reveal that HQRNN-FD attains an accuracy of 0.972 on publicly available fraud detection datasets, outperforming conventional models by 2.4%. In addition, the framework exhibits robustness against quantum noise and improved predictive performance with increasing qubit numbers, validating its efficacy and scalability for imbalanced financial classification tasks. Full article

(This article belongs to the Special Issue Quantum Computing in the NISQ Era)

► Show Figures

Figure 1

24 pages, 3398 KB

Open AccessArticle

DEMNet: Dual Encoder–Decoder Multi-Frame Infrared Small Target Detection Network with Motion Encoding

by Feng He, Qiran Zhang, Yichuan Li and Tianci Wang

Remote Sens. 2025, 17(17), 2963; https://doi.org/10.3390/rs17172963 - 26 Aug 2025

Viewed by 931

Abstract

Infrared dim and small target detection aims to accurately localize targets within complex backgrounds or clutter. However, under extremely low signal-to-noise ratio (SNR) conditions, single-frame detection methods often fail to effectively detect such targets. In contrast, multi-frame detection can exploit temporal cues to [...] Read more.

Infrared dim and small target detection aims to accurately localize targets within complex backgrounds or clutter. However, under extremely low signal-to-noise ratio (SNR) conditions, single-frame detection methods often fail to effectively detect such targets. In contrast, multi-frame detection can exploit temporal cues to significantly improve the probability of detection (Pd) and reduce false alarms (Fa). Existing multi-frame approaches often employ 3D convolutions/RNNs to implicitly extract temporal features. However, they typically lack explicit modeling of target motion. To address this, we propose a Dual Encoder–Decoder Multi-Frame Infrared Small Target Detection Network with Motion Encoding (DEMNet) that explicitly incorporates motion information into the detection process. The first multi-level encoder–decoder module leverages spatial and channel attention mechanisms to fuse hierarchical features across multiple scales, enabling robust spatial feature extraction from each frame of the temporally aligned input sequence. The second encoder–decoder module encodes both inter-frame target motion and intra-frame target positional information, followed by 3D convolution to achieve effective motion information fusion. Extensive experiments demonstrate that DEMNet achieves state-of-the-art performance, outperforming recent advanced methods such as DTUM and SSTNet. For the DAUB dataset, compared to the second-best model, DEMNet improves Pd by 2.42 percentage points and reduces Fa by 4.13 × 10⁻⁶ (a 68.72% reduction). For the NUDT dataset, it improves Pd by 1.68 percentage points and reduces Fa by 0.67 × 10⁻⁶ (a 7.26% reduction) compared to the next-best model. Notably, DEMNet demonstrates even greater advantages on test sequences with SNR ≤ 3. Full article

(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)

► Show Figures

Figure 1

19 pages, 24320 KB

Open AccessArticle

Hierarchical Attention Transformer-Based Sensor Anomaly Detection in Structural Health Monitoring

by Dong Hu, Yizhou Lin, Shilong Li, Jing Wu and Hongwei Ma

Sensors 2025, 25(16), 4959; https://doi.org/10.3390/s25164959 - 11 Aug 2025

Viewed by 1100

Abstract

Structural health monitoring (SHM) is vital for ensuring structural integrity by continuously evaluating conditions through sensor data. However, sensor anomalies caused by external disturbances can severely compromise the effectiveness of SHM systems. Traditional anomaly detection methods face significant challenges due to reliance on [...] Read more.

Structural health monitoring (SHM) is vital for ensuring structural integrity by continuously evaluating conditions through sensor data. However, sensor anomalies caused by external disturbances can severely compromise the effectiveness of SHM systems. Traditional anomaly detection methods face significant challenges due to reliance on large labeled datasets, difficulties in handling long-term dependencies, and issues stemming from class imbalance. To address these limitations, this study introduces a hierarchical attention Transformer (HAT)-based method specifically designed for sensor anomaly detection in SHM applications. HAT leverages hierarchical temporal modeling with local and global Transformer encoders to effectively capture complex, multi-scale anomaly patterns. Evaluated on a real-world dataset from a large cable-stayed bridge, HAT achieves superior accuracy (96.3%) and robustness even with limited labeled data (20%), significantly outperforming traditional models like CNN, LSTM, and RNN. Additionally, this study visualizes the convergence process of the model, demonstrating its fast convergence and strong generalization capabilities. Thus, the proposed HAT method provides a practical and effective solution for anomaly detection in complex SHM scenarios. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

16 pages, 22555 KB

Open AccessTechnical Note

A Hybrid RNN-CNN Approach with TPI for High-Precision DEM Reconstruction

by Ruizhe Cao, Chunjing Yao, Hongchao Ma, Bin Guo, Jie Wang and Junhao Xu

Remote Sens. 2025, 17(16), 2770; https://doi.org/10.3390/rs17162770 - 9 Aug 2025

Viewed by 668

Abstract

Digital elevation models (DEMs), as the fundamental unit of terrain morphology, are crucial for understanding surface processes and for land use planning. However, automated classification faces challenges due to inefficient terrain feature extraction from raw LiDAR point clouds and the limitations of traditional [...] Read more.

Digital elevation models (DEMs), as the fundamental unit of terrain morphology, are crucial for understanding surface processes and for land use planning. However, automated classification faces challenges due to inefficient terrain feature extraction from raw LiDAR point clouds and the limitations of traditional methods in capturing fine-scale topographic variations. To address this, we propose a novel hybrid RNN-CNN framework that integrates multi-scale Topographic Position Index (TPI) features to enhance DEM generation. Our approach first models voxelated LiDAR point clouds as spatially ordered sequences, using Recurrent Neural Networks (RNNs) to encode vertical elevation dependencies and Convolutional Neural Networks (CNNs) to extract planar spatial features. By incorporating TPI as a semantic constraint, the model learns to distinguish terrain structures at multiple scales. Residual connections refine feature representations to preserve micro-topographic details during DEM reconstruction. Extensive experiments in the complex terrains of Jiuzhaigou, China, demonstrate that our lightweight hybrid framework not only achieves excellent DEM reconstruction accuracy in complex terrains, but also improves computational efficiency by more than 20% on average compared to traditional interpolation methods, making it highly suitable for resource-constrained applications. Full article

► Show Figures

Graphical abstract

19 pages, 821 KB

Open AccessArticle

Multimodal Multisource Neural Machine Translation: Building Resources for Image Caption Translation from European Languages into Arabic

by Roweida Mohammed, Inad Aljarrah, Mahmoud Al-Ayyoub and Ali Fadel

Computation 2025, 13(8), 194; https://doi.org/10.3390/computation13080194 - 8 Aug 2025

Viewed by 1042

Abstract

Neural machine translation (NMT) models combining textual and visual inputs generate more accurate translations compared with unimodal models. Moreover, translation models with an under-resourced target language benefit from multisource inputs (source sentences are provided in different languages). Building MultiModal MutliSource NMT (M³ [...] Read more.

Neural machine translation (NMT) models combining textual and visual inputs generate more accurate translations compared with unimodal models. Moreover, translation models with an under-resourced target language benefit from multisource inputs (source sentences are provided in different languages). Building MultiModal MutliSource NMT (M³S-NMT) systems require significant efforts to curate datasets suitable for such a multifaceted task. This work uses image caption translation as an example of multimodal translation and presents a novel public dataset for translating captions from multiple European languages (viz., English, German, French, and Czech) into the distant and under-resourced Arabic language. Moreover, it presents multitask learning models trained and tested on this dataset to serve as solid baselines to help further research in this area. These models involve two parts: one for learning the visual representations of the input images, and the other for translating the textual input based on these representations. The translations are produced from a framework of attention-based encoder–decoder architectures. The visual features are learned from a pretrained convolutional neural network (CNN). These features are then integrated with textual features learned through the very basic yet well-known recurrent neural networks (RNNs) with GloVe or BERT word embeddings. Despite the challenges associated with the task at hand, the results of these systems are very promising, reaching 34.57 and 42.52 METEOR scores. Full article

(This article belongs to the Section Computational Social Science)

► Show Figures

Figure 1

20 pages, 855 KB

Open AccessArticle

SegmentedCrossformer—A Novel and Enhanced Cross-Time and Cross-Dimensional Transformer for Multivariate Time Series Forecasting

by Zijiang Yang and Tad Gonsalves

Forecasting 2025, 7(3), 41; https://doi.org/10.3390/forecast7030041 - 3 Aug 2025

Viewed by 1627

Abstract

Multivariate Time Series Forecasting (MTSF) has been innovated with a series of models in the last two decades, ranging from traditional statistical approaches to RNN-based models. However, recent contributions from deep learning to time series problems have made huge progress with a series [...] Read more.

Multivariate Time Series Forecasting (MTSF) has been innovated with a series of models in the last two decades, ranging from traditional statistical approaches to RNN-based models. However, recent contributions from deep learning to time series problems have made huge progress with a series of Transformer-based models. Despite the breakthroughs by attention mechanisms applied to deep learning areas, many challenges remain to be solved with more sophisticated models. Existing Transformers known as attention-based models outperform classical models with abilities to capture temporal dependencies and better strategies for learning dependencies among variables as well as in the time domain in an efficient manner. Aiming to solve those issues, we propose a novel Transformer—SegmentedCrossformer (SCF), a Transformer-based model that considers both time and dependencies among variables in an efficient manner. The model is built upon the encoder–decoder architecture in different scales and compared with the previous state of the art. Experimental results on different datasets show the effectiveness of SCF with unique advantages and efficiency. Full article

(This article belongs to the Section AI Forecasting)

► Show Figures

Figure 1

24 pages, 2815 KB

Open AccessArticle

Blockchain-Powered LSTM-Attention Hybrid Model for Device Situation Awareness and On-Chain Anomaly Detection

by Qiang Zhang, Caiqing Yue, Xingzhe Dong, Guoyu Du and Dongyu Wang

Sensors 2025, 25(15), 4663; https://doi.org/10.3390/s25154663 - 28 Jul 2025

Viewed by 646

Abstract

With the increasing scale of industrial devices and the growing complexity of multi-source heterogeneous sensor data, traditional methods struggle to address challenges in fault detection, data security, and trustworthiness. Ensuring tamper-proof data storage and improving prediction accuracy for imbalanced anomaly detection for potential [...] Read more.

With the increasing scale of industrial devices and the growing complexity of multi-source heterogeneous sensor data, traditional methods struggle to address challenges in fault detection, data security, and trustworthiness. Ensuring tamper-proof data storage and improving prediction accuracy for imbalanced anomaly detection for potential deployment in the Industrial Internet of Things (IIoT) remain critical issues. This study proposes a blockchain-powered Long Short-Term Memory Network (LSTM)–Attention hybrid model: an LSTM-based Encoder–Attention–Decoder (LEAD) for industrial device anomaly detection. The model utilizes an encoder–attention–decoder architecture for processing multivariate time series data generated by industrial sensors and smart contracts for automated on-chain data verification and tampering alerts. Experiments on real-world datasets demonstrate that the LEAD achieves an

F_{0.1}

score of 0.96, outperforming baseline models (Recurrent Neural Network (RNN): 0.90; LSTM: 0.94; and Bi-directional LSTM (Bi-LSTM, 0.94)). We simulate the system using a private FISCO-BCOS network with a multi-node setup to demonstrate contract execution, anomaly data upload, and tamper alert triggering. The blockchain system successfully detects unauthorized access and data tampering, offering a scalable solution for device monitoring. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

26 pages, 829 KB

Open AccessArticle

Enhanced Face Recognition in Crowded Environments with 2D/3D Features and Parallel Hybrid CNN-RNN Architecture with Stacked Auto-Encoder

by Samir Elloumi, Sahbi Bahroun, Sadok Ben Yahia and Mourad Kaddes

Big Data Cogn. Comput. 2025, 9(8), 191; https://doi.org/10.3390/bdcc9080191 - 22 Jul 2025

Viewed by 1209

Abstract

Face recognition (FR) in unconstrained conditions remains an open research topic and an ongoing challenge. The facial images exhibit diverse expressions, occlusions, variations in illumination, and heterogeneous backgrounds. This work aims to produce an accurate and robust system for enhanced Security and Surveillance. [...] Read more.

Face recognition (FR) in unconstrained conditions remains an open research topic and an ongoing challenge. The facial images exhibit diverse expressions, occlusions, variations in illumination, and heterogeneous backgrounds. This work aims to produce an accurate and robust system for enhanced Security and Surveillance. A parallel hybrid deep learning model for feature extraction and classification is proposed. An ensemble of three parallel extraction layer models learns the best representative features using CNN and RNN. 2D LBP and 3D Mesh LBP are computed on face images to extract image features as input to two RNNs. A stacked autoencoder (SAE) merged the feature vectors extracted from the three CNN-RNN parallel layers. We tested the designed 2D/3D CNN-RNN framework on four standard datasets. We achieved an accuracy of

98.9 %

. The hybrid deep learning model significantly improves FR against similar state-of-the-art methods. The proposed model was also tested on an unconstrained conditions human crowd dataset, and the results were very promising with an accuracy of

95 %

. Furthermore, our model shows an 11.5% improvement over similar hybrid CNN-RNN architectures, proving its robustness in complex environments where the face can undergo different transformations. Full article

► Show Figures

Figure 1

24 pages, 3714 KB

Open AccessArticle

DTCMMA: Efficient Wind-Power Forecasting Based on Dimensional Transformation Combined with Multidimensional and Multiscale Convolutional Attention Mechanism

by Wenhan Song, Enguang Zuo, Junyu Zhu, Chen Chen, Cheng Chen, Ziwei Yan and Xiaoyi Lv

Sensors 2025, 25(15), 4530; https://doi.org/10.3390/s25154530 - 22 Jul 2025

Viewed by 550

Abstract

With the growing global demand for clean energy, the accuracy of wind-power forecasting plays a vital role in ensuring the stable operation of power systems. However, wind-power generation is significantly influenced by meteorological conditions and is characterized by high uncertainty and multiscale fluctuations. [...] Read more.

With the growing global demand for clean energy, the accuracy of wind-power forecasting plays a vital role in ensuring the stable operation of power systems. However, wind-power generation is significantly influenced by meteorological conditions and is characterized by high uncertainty and multiscale fluctuations. Traditional recurrent neural network (RNN) and long short-term memory (LSTM) models, although capable of handling sequential data, struggle with modeling long-term temporal dependencies due to the vanishing gradient problem; thus, they are now rarely used. Recently, Transformer models have made notable progress in sequence modeling compared to RNNs and LSTM models. Nevertheless, when dealing with long wind-power sequences, their quadratic computational complexity (O(L²)) leads to low efficiency, and their global attention mechanism often fails to capture local periodic features accurately, tending to overemphasize redundant information while overlooking key temporal patterns. To address these challenges, this paper proposes a wind-power forecasting method based on dimension-transformed collaborative multidimensional multiscale attention (DTCMMA). This method first employs fast Fourier transform (FFT) to automatically identify the main periodic components in wind-power data, reconstructing the one-dimensional time series as a two-dimensional spatiotemporal representation, thereby explicitly encoding periodic features. Based on this, a collaborative multidimensional multiscale attention (CMMA) mechanism is designed, which hierarchically integrates channel, spatial, and pixel attention to adaptively capture complex spatiotemporal dependencies. Considering the geometric characteristics of the reconstructed data, asymmetric convolution kernels are adopted to enhance feature extraction efficiency. Experiments on multiple wind-farm datasets and energy-related datasets demonstrate that DTCMMA outperforms mainstream methods such as Transformer, iTransformer, and TimeMixer in long-sequence forecasting tasks, achieving improvements in MSE performance by 34.22%, 2.57%, and 0.51%, respectively. The model’s training speed also surpasses that of the fastest baseline by 300%, significantly improving both prediction accuracy and computational efficiency. This provides an efficient and accurate solution for wind-power forecasting and contributes to the further development and application of wind energy in the global energy mix. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

31 pages, 7723 KB

Open AccessArticle

A Hybrid CNN–GRU–LSTM Algorithm with SHAP-Based Interpretability for EEG-Based ADHD Diagnosis

by Makbal Baibulova, Murat Aitimov, Roza Burganova, Lazzat Abdykerimova, Umida Sabirova, Zhanat Seitakhmetova, Gulsiya Uvaliyeva, Maksym Orynbassar, Aislu Kassekeyeva and Murizah Kassim

Algorithms 2025, 18(8), 453; https://doi.org/10.3390/a18080453 - 22 Jul 2025

Viewed by 1423

Abstract

This study proposes an interpretable hybrid deep learning framework for classifying attention deficit hyperactivity disorder (ADHD) using EEG signals recorded during cognitively demanding tasks. The core architecture integrates convolutional neural networks (CNNs), gated recurrent units (GRUs), and long short-term memory (LSTM) layers to [...] Read more.

This study proposes an interpretable hybrid deep learning framework for classifying attention deficit hyperactivity disorder (ADHD) using EEG signals recorded during cognitively demanding tasks. The core architecture integrates convolutional neural networks (CNNs), gated recurrent units (GRUs), and long short-term memory (LSTM) layers to jointly capture spatial and temporal dynamics. In addition to the final hybrid architecture, the CNN–GRU–LSTM model alone demonstrates excellent accuracy (99.63%) with minimal variance, making it a strong baseline for clinical applications. To evaluate the role of global attention mechanisms, transformer encoder models with two and three attention blocks, along with a spatiotemporal transformer employing 2D positional encoding, are benchmarked. A hybrid CNN–RNN–transformer model is introduced, combining convolutional, recurrent, and transformer-based modules into a unified architecture. To enhance interpretability, SHapley Additive exPlanations (SHAP) are employed to identify key EEG channels contributing to classification outcomes. Experimental evaluation using stratified five-fold cross-validation demonstrates that the proposed hybrid model achieves superior performance, with average accuracy exceeding 99.98%, F1-scores above 0.9999, and near-perfect AUC and Matthews correlation coefficients. In contrast, transformer-only models, despite high training accuracy, exhibit reduced generalization. SHAP-based analysis confirms the hybrid model’s clinical relevance. This work advances the development of transparent and reliable EEG-based tools for pediatric ADHD screening. Full article

► Show Figures

Graphical abstract

20 pages, 5700 KB

Open AccessArticle

Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features

by Hyeonuk Bhin and Jongsuk Choi

Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025

Viewed by 1417

Abstract

Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.

Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article

(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)

► Show Figures

Figure 1

21 pages, 5160 KB

Open AccessArticle

A Spatiotemporal Sequence Prediction Framework Based on Mask Reconstruction: Application to Short-Duration Precipitation Radar Echoes

by Zhi Yang, Changzheng Liu, Ping Mei and Lei Wang

Remote Sens. 2025, 17(13), 2326; https://doi.org/10.3390/rs17132326 - 7 Jul 2025

Viewed by 608

Abstract

Short-term precipitation forecasting is a core task in meteorological science, aiming to achieve accurate predictions by modeling the spatiotemporal evolution of radar echo sequences, thereby supporting meteorological services and disaster warning systems. However, existing spatiotemporal sequence prediction methods still struggle to disentangle complex [...] Read more.

Short-term precipitation forecasting is a core task in meteorological science, aiming to achieve accurate predictions by modeling the spatiotemporal evolution of radar echo sequences, thereby supporting meteorological services and disaster warning systems. However, existing spatiotemporal sequence prediction methods still struggle to disentangle complex spatiotemporal dependencies effectively and fail to capture the nonlinear chaotic characteristics of precipitation systems. This often results in ambiguous predictions, attenuation of echo intensity, and spatial localization errors. To address these challenges, this paper proposes a unified spatiotemporal sequence prediction framework based on spatiotemporal masking, which comprises two stages: self-supervised pre-training and task-oriented fine-tuning. During pre-training, the model learns global structural features of meteorological systems from sparse contexts by randomly masking local spatiotemporal regions of radar images. In the fine-tuning stage, considering the importance of the temporal dimension in short-term precipitation forecasting and the complex long-range dependencies in spatiotemporal evolution of precipitation systems, we design an RNN-based cyclic temporal mask self-encoder model (MAE-RNN) and a transformer-based spatiotemporal attention model (STMT). The former focuses on capturing short-term temporal dynamics, while the latter simultaneously models long-range dependencies across space and time via a self-attention mechanism, thereby avoiding the smoothing effects on high-frequency details that are typical of conventional convolutional or recurrent structures. The experimental results show that STMT improves 3.73% and 2.39% in CSI and HSS key indexes compared with the existing advanced models, and generates radar echo sequences that are closer to the real data in terms of air mass morphology evolution and reflection intensity grading. Full article

(This article belongs to the Special Issue Application of Remote Sensing Data in Data Assimilation, Reanalysis and Artificial Intelligence for Mesoscale Numerical Weather Models (Second Edition))

► Show Figures

Figure 1

Search Results (138)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (138)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI