MDPI - Publisher of Open Access Journals

23 pages, 5079 KB

Open AccessArticle

Dual-Stream Transformer with Kalman-Based Sensor Fusion for Wearable Fall Detection

by Abheek Pradhan, Sana Alamgeer, Rakesh Suvvari, Syed Tousiful Haque and Anne H. H. Ngu

Big Data Cogn. Comput. 2026, 10(3), 90; https://doi.org/10.3390/bdcc10030090 - 17 Mar 2026

Wearable fall detection systems face a fundamental challenge: while gyroscope data provide valuable orientation cues, naively combining raw gyroscope and accelerometer signals can degrade performance due to noise contamination. To overcome this challenge, we present a dual-stream transformer architecture that incorporates (i) Kalman-based [...] Read more.

Wearable fall detection systems face a fundamental challenge: while gyroscope data provide valuable orientation cues, naively combining raw gyroscope and accelerometer signals can degrade performance due to noise contamination. To overcome this challenge, we present a dual-stream transformer architecture that incorporates (i) Kalman-based sensor fusion to convert noisy gyroscope angular velocities into stable orientation estimates (roll, pitch, yaw), maintaining an internal state of body pose, and (ii) processing accelerometer and orientation streams in separate encoder pathways before fusion to prevent cross-modal interference. Our architecture further integrates Squeeze-and-Excitation channel attention and Temporal Attention Pooling to focus on fall-critical temporal patterns. Evaluated on the SmartFallMM dataset using 21-fold leave-one-subject-out cross-validation, the dual-stream Kalman transformer achieves 91.10% F1, outperforming single-stream Kalman transformers (89.80% F1) by 1.30% and single-stream baseline transformers (88.96% F1) by 2.14%. We further evaluate the model in real time using a watch-based SmartFall App on five participants, maintaining an average F1 score of 83% and an accuracy of 90%. These results indicate robust performance in both offline and real-world deployment settings, establishing a new state-of-the-art for inertial-measurement-unit-based fall detection on commodity smartwatch devices. Full article

► Show Figures

Figure 1

26 pages, 12081 KB

Open AccessArticle

DEPART: Multi-Task Interpretable Depression and Parkinson’s Disease Detection from In-the-Wild Video Data

by Elena Ryumina, Alexandr Axyonov, Mikhail Dolgushin, Dmitry Ryumin and Alexey Karpov

Big Data Cogn. Comput. 2026, 10(3), 89; https://doi.org/10.3390/bdcc10030089 - 16 Mar 2026

Abstract

Automated video-based detection of cognitive disorders can enable a scalable non-invasive health monitoring. However, existing methods focus on a single disease and provide limited interpretability, whereas real-world videos often contain co-occurring conditions. We propose a novel unified multi-task method to detect depression and [...] Read more.

Automated video-based detection of cognitive disorders can enable a scalable non-invasive health monitoring. However, existing methods focus on a single disease and provide limited interpretability, whereas real-world videos often contain co-occurring conditions. We propose a novel unified multi-task method to detect depression and Parkinson’s disease (PD) from in-the-wild video data called DEPART (DEpression and PArkinson’s Recognition Technique). It performs body region extraction, Contrastive Language-Image Pre-training (CLIP)-based visual encoding, Transformer-based temporal modeling, and prototype-aware classification with a gated fusion technique. Gradient-based attention maps are used to visualize task-specific regions that drive predictions. Experiments on the In-the-Wild Speech Medical (WSM) corpus demonstrate competitive performance: the multi-task model achieves Recall of 82.39% for depression and 78.20% for PD, compared with 87.76% and 78.20%, for the best single-task models. The multi-task learning initially increases false positives for healthy persons in the PD subset, mainly due to annotation–modality mismatches, static visual content misinterpreted as motor impairments, and occasional body detection failures. After cleaning the test data, Recall for healthy individuals becomes comparable across models; the multi-task model improves Recall for both depression (from 82.39% to 87.50%) and PD (from 78.20% to 86.14%), suggesting better robustness for real-life clinical applications. Full article

► Show Figures

Figure 1

16 pages, 950 KB

Open AccessArticle

A CTC-Based Speech Recognition Network Fusing Local Convolution and Global Attention

by Huijuan Hu, Chenyang Tang, Ping Tan and He Xu

Sensors 2026, 26(6), 1865; https://doi.org/10.3390/s26061865 - 16 Mar 2026

Abstract

Integrating wav2vec 2.0 with Connectionist Temporal Classification (CTC) for automatic speech recognition (ASR) often involves a trade-off between capturing global semantic consistency and maintaining local feature discriminability. This study proposes DBA-wav2vec 2.0, an architecture designed to manage these modeling requirements by decoupling temporal [...] Read more.

Integrating wav2vec 2.0 with Connectionist Temporal Classification (CTC) for automatic speech recognition (ASR) often involves a trade-off between capturing global semantic consistency and maintaining local feature discriminability. This study proposes DBA-wav2vec 2.0, an architecture designed to manage these modeling requirements by decoupling temporal modeling into parallel local and global streams at the encoder–decoder interface. Depthwise separable convolutions are utilized to capture local acoustic structures, while a self-attention path is retained for long-range dependencies. A task-aware gating mechanism is introduced to integrate these heterogeneous features. By adjusting fusion weights based on acoustic input characteristics, the gate facilitates the refinement of posterior probability distributions, leading to more distinct alignment points. Experimental results on AISHELL-1 and ST-CMDS datasets show relative Character Error Rate (CER) reductions of 6.4% and 7.4%, respectively, compared to a baseline wav2vec 2.0 model. Further evaluations under varying speaking rates demonstrate a 15.3% relative improvement in fast-speech scenarios, suggesting that structural adaptation at the decoding interface can enhance the robustness of CTC-based systems against temporal variations. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

24 pages, 4894 KB

Open AccessArticle

Power Load Probabilistic Prediction Based on Multi-Value Quantile Regression and Timing Fusion Ensemble Learning Model

by Yuhang Liu, Fei Mei, Jun Zhang, Xiang Dai and Wen Li

Entropy 2026, 28(3), 329; https://doi.org/10.3390/e28030329 - 16 Mar 2026

Abstract

The core component to ensure the refined and safe operation of distribution network scheduling is 10 kV bus load probabilistic prediction. However, existing probabilistic prediction methods suffer from insufficient dynamic feature extraction and compromised prediction reliability caused by quantile crossing. To address these [...] Read more.

The core component to ensure the refined and safe operation of distribution network scheduling is 10 kV bus load probabilistic prediction. However, existing probabilistic prediction methods suffer from insufficient dynamic feature extraction and compromised prediction reliability caused by quantile crossing. To address these issues, this paper proposes a 10 kV bus load probabilistic prediction method integrating multi-value quantile regression (MQR) and a temporal fusion ensemble learning model (ELM). Firstly, a temporal fusion ensemble learning model is constructed, which integrates multiple temporal fusion network (TFN) sub-models through a stacking framework to parallel extract multi-dimensional temporal features of loads, effectively enhancing its feature capture capability for complex load data. Secondly, MQR is introduced as the core objective function to synchronously generate multi-quantile load forecasting results, comprehensively depicting the load probability distribution. Finally, a Listwise Maximum Likelihood Estimation (ListMLE) ranking constraint mechanism is embedded, which optimizes quantile ordering through monotonicity constraints, significantly reducing the degree of quantile crossing and improving the interpretability of forecasting results. The results show that the MQR-ELM algorithm achieves a Prediction Interval Coverage Probability of 94.624% (close to the nominal coverage rate of 95%), a Prediction Interval Averaged Width of 588.526, a Crossing Degree Index of only 0.0476, and a Continuous Ranked Probability Score as low as 84.931. All core indicators are significantly superior to those of the comparative algorithms. Full article

(This article belongs to the Topic Game Theory and Artificial Intelligence Methods in Sustainable and Renewable Energy Power Systems)

► Show Figures

Figure 1

21 pages, 3485 KB

Open AccessArticle

Research on BiLSTM–Transformer Power Load Forecasting Method Based on Dynamic Adaptive Fusion

by Jialong Xu, Lei Zhang and Zhenxiong Zhang

Energies 2026, 19(6), 1473; https://doi.org/10.3390/en19061473 - 15 Mar 2026

Abstract

Power load forecasting is a core technical component for achieving safe, stable, and economic operation in smart grids. This paper proposes a hybrid BiLSTM–Transformer forecasting method based on a Dynamic Adaptive Fusion (DAF) module. The core of this method involves utilizing the DAF [...] Read more.

Power load forecasting is a core technical component for achieving safe, stable, and economic operation in smart grids. This paper proposes a hybrid BiLSTM–Transformer forecasting method based on a Dynamic Adaptive Fusion (DAF) module. The core of this method involves utilizing the DAF module to adaptively weight different feature channels to highlight key influencing factors, while simultaneously employing a temporal attention mechanism to capture the contributions of various time steps. Building on this, the model effectively combines the strengths of BiLSTM networks in capturing bidirectional dependencies with the capability of Transformer models to extract global contextual features, thereby achieving a multi-level dynamic fusion of load characteristics. Experiments on real-world grid datasets demonstrate that the proposed method achieves a significant performance improvement over traditional models, particularly in terms of load peak prediction accuracy and stability. This provides effective technical support for the refined scheduling of power systems. Full article

► Show Figures

Figure 1

24 pages, 2850 KB

Open AccessArticle

A Psychoacoustic Feature Extraction and Spatio-Temporal Analysis Framework for Continuous Aircraft Noise Monitoring

by Tianlun He, Jiayu Hou and Da Chen

Sensors 2026, 26(6), 1842; https://doi.org/10.3390/s26061842 - 14 Mar 2026

Abstract

Aircraft noise monitoring systems deployed at major airports typically rely on scalar energy-based indicators, which primarily describe integrated sound energy but provide limited representation of the spectral–temporal structure and perceptual attributes of aircraft noise. To address this limitation, this study proposes a sensor-based [...] Read more.

Aircraft noise monitoring systems deployed at major airports typically rely on scalar energy-based indicators, which primarily describe integrated sound energy but provide limited representation of the spectral–temporal structure and perceptual attributes of aircraft noise. To address this limitation, this study proposes a sensor-based psychoacoustic feature extraction and spatiotemporal analysis framework for continuous aircraft noise monitoring under high-density operational conditions. An automatic noise monitoring system compliant with ISO 20906 was deployed to synchronously acquire acoustic waveforms and ADS-B trajectory data. A cascaded spatiotemporal fusion algorithm was developed to associate noise events with aircraft flight paths, followed by a model-stratified multidimensional IQR-based data cleaning strategy to suppress environmental interference and non-stationary outliers. Based on the cleaned dataset, a suite of psychoacoustic features—including loudness, sharpness, roughness, fluctuation strength, and tonality—was extracted to characterize the perceptual structure of aircraft noise beyond conventional energy metrics. Experimental results demonstrate that, under equivalent sound exposure levels, psychoacoustic features retain substantial discriminative information that is lost in scalar energy indicators. The coefficients of variation for fluctuation strength and tonality reach 43.2% and 22.1%, respectively, corresponding to 15–69 times higher sensitivity compared to traditional energy-based metrics. Furthermore, nonlinear manifold mapping using UMAP reveals clear topological separation between new-generation and legacy aircraft models in the psychoacoustic feature space, whereas severe overlap persists in energy-based representations. Correlation analysis further indicates decoupling between macro-level physical design parameters (e.g., bypass ratio, thrust) and perceptual feature dimensions, highlighting the limitations of energy-centric monitoring schemes. The proposed framework demonstrates the feasibility of integrating psychoacoustic feature extraction into continuous sensor-based aircraft noise monitoring systems. It provides a scalable signal processing pipeline for enhancing the resolution and interpretability of aircraft noise measurements in complex operational environments. Full article

(This article belongs to the Section Environmental Sensing)

► Show Figures

Figure 1

23 pages, 6722 KB

Open AccessArticle

TLE-FEDformer: A Frequency-Domain Transformer Framework for Multi-Sensor Multi-Temporal Flood Inundation Mapping

by Pouya Ahmadi, Mohammad Javad Valadan Zoej, Mehdi Mokhtarzade, Nazila Kardan, Parya Ahmadi and Ebrahim Ghaderpour

Remote Sens. 2026, 18(6), 895; https://doi.org/10.3390/rs18060895 - 14 Mar 2026

Abstract

Floods are among the most devastating natural hazards, intensified by climate change and rapid urbanization. This study introduces a novel deep learning framework, Transfer Learning-Enhanced FEDformer (TLE-FEDformer), designed for accurate and temporally consistent flood inundation mapping. The framework integrates pre-trained Xception backbones for [...] Read more.

Floods are among the most devastating natural hazards, intensified by climate change and rapid urbanization. This study introduces a novel deep learning framework, Transfer Learning-Enhanced FEDformer (TLE-FEDformer), designed for accurate and temporally consistent flood inundation mapping. The framework integrates pre-trained Xception backbones for robust multi-sensor feature extraction from Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 optical imagery, a cross-modal fusion module to align heterogeneous modalities, and the Frequency Enhanced Decomposed Transformer (FEDformer) for efficient frequency-domain temporal modeling. This architecture effectively captures long-range dependencies and flood dynamics including onset, peak, duration, and recession, while addressing challenges such as cloud contamination, speckle noise, and limited labeled data. Comprehensive experiments demonstrate superior performance, achieving an overall accuracy of 98.12%, an F1-score of 98.55%, and an Intersection over Union (IoU) of 97.38%, outperforming baselines including Convolutional Neural Networks, Capsule Networks, and transfer learning alone. Ablation studies validate the contributions of each component, while sensitivity analyses confirm robustness across hyperparameters. Uncertainty quantification via Monte Carlo dropout highlights high confidence in core flooded regions. Preliminary generalization tests on independent events yield IoU > 94%, indicating strong transferability. TLE-FEDformer advances operational flood monitoring by providing reliable, scalable, and temporally consistent mapping from multi-sensor remote sensing data. This approach offers significant potential for real-time disaster response, early warning systems, and damage assessment in flood-prone regions worldwide. Full article

(This article belongs to the Special Issue Intelligent Methods and Deep Learning Advances with Multimodal Remote Sensing Data for Environmental Hazard Applications)

► Show Figures

Figure 1

19 pages, 1198 KB

Open AccessArticle

GSMTNet: Dual-Stream Video Anomaly Detection via Gated Spatio-Temporal Graph and Multi-Scale Temporal Learning

by Di Jiang, Huicheng Lai, Guxue Gao, Dan Ma and Liejun Wang

Electronics 2026, 15(6), 1200; https://doi.org/10.3390/electronics15061200 - 13 Mar 2026

Viewed by 122

Abstract

Video Anomaly Detection aims to identify video segments containing abnormal events. However, detecting anomalies relies more heavily on temporal modeling, particularly when anomalies exhibit only subtle deviations from normal events. However, most existing methods inadequately model the heterogeneity in spatiotemporal relationships, especially the [...] Read more.

Video Anomaly Detection aims to identify video segments containing abnormal events. However, detecting anomalies relies more heavily on temporal modeling, particularly when anomalies exhibit only subtle deviations from normal events. However, most existing methods inadequately model the heterogeneity in spatiotemporal relationships, especially the dynamic interactions between human pose and video appearance. To address this, we propose GSMTNet, a dual-stream heterogeneous unsupervised network integrating gated spatio-temporal graph convolution and multi-scale temporal learning. First, we introduce a dynamic graph structure learning module, which leverages gated spatio-temporal graph convolutions with manifold transformations to model latent spatial relationships via human pose graphs. This is coupled with a normalizing flow-based density estimation module to model the probability distribution of normal samples in a latent space. Second, we design a hybrid dilated temporal module that employs multi-scale temporal feature learning to simultaneously capture long- and short-term dependencies, thereby enhancing the separability between normal patterns and potential deviations. Finally, we propose a dual-stream fusion module to hierarchically integrate features learned from pose graphs and raw video sequences, followed by a prediction head that computes anomaly scores from the fused features. Extensive experiments demonstrate state-of-the-art performance, achieving 86.81% AUC on ShanghaiTech and 70.43% on UBnormal, outperforming existing methods in rare anomaly scenarios. Full article

(This article belongs to the Special Issue Advanced Scene Understanding Methods and Applications in Multi-Modal Data)

► Show Figures

Figure 1

18 pages, 4314 KB

Open AccessArticle

Remaining Useful Life Prediction for Rotating Machinery via Multi-Graph-Based Spatiotemporal Feature Fusion

by Xiangang Cao, Chenjian Gao and Xinyuan Zhang

Appl. Sci. 2026, 16(6), 2738; https://doi.org/10.3390/app16062738 - 13 Mar 2026

Viewed by 81

Abstract

Rotating machinery serves as a critical component in various engineering systems, making accurate prediction of its Remaining Useful Life (RUL) essential for ensuring operational stability. To address the technical limitations of mainstream RUL prediction models comprehensively capturing spatial correlations among multiple sensors, this [...] Read more.

Rotating machinery serves as a critical component in various engineering systems, making accurate prediction of its Remaining Useful Life (RUL) essential for ensuring operational stability. To address the technical limitations of mainstream RUL prediction models comprehensively capturing spatial correlations among multiple sensors, this paper proposes a multi-graph-structured spatiotemporal feature fusion model for RUL prediction of rotating machinery. Breaking through the constraints of constructing a single correlation graph, the model first builds two distinct graphs—a prior correlation graph based on the structural mechanism of the rotating machinery and a similarity correlation graph derived from monitoring data distribution characteristics. These dual-perspective graphs collectively characterize the potential spatial dependencies among multiple sensors. Subsequently, a Graph Attention Network (GAT) is introduced to aggregate spatial features from both graphs, and a feature concatenation fusion strategy is adopted to achieve a comprehensive representation of the inter-sensor spatial dependencies. Finally, a Long Short-Term Memory (LSTM) network is employed to extract temporal evolution features from the operational data. The effective fusion of these spatial and temporal features enhances the model’s RUL prediction performance. Simulation experiments conducted on the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset validated the robustness of the proposed method. Full article

► Show Figures

Figure 1

33 pages, 4366 KB

Open AccessArticle

Structured and Factorized Multi-Modal Representation Learning for Physiological Affective State and Music Preference Inference

by Wenli Qu and Mu-Jiang-Shan Wang

Symmetry 2026, 18(3), 488; https://doi.org/10.3390/sym18030488 - 12 Mar 2026

Viewed by 125

Abstract

Emotions and affective responses are core intervention targets in music therapy. Through acoustic elements, music can evoke emotional responses at physiological and neurological levels, influencing cognition and behavior while providing an important dimension for evaluating therapeutic efficacy. However, emotions are inherently abstract and [...] Read more.

Emotions and affective responses are core intervention targets in music therapy. Through acoustic elements, music can evoke emotional responses at physiological and neurological levels, influencing cognition and behavior while providing an important dimension for evaluating therapeutic efficacy. However, emotions are inherently abstract and difficult to represent directly. Artificial intelligence models therefore provide a promising tool for modeling and quantifying such abstract affective states from physiological signals. In this paper, we propose a structured and explicitly factorized multi-modal representation learning framework for joint affective state and preference inference. Instead of entangling heterogeneous dynamics within monolithic encoders, the framework decomposes representation learning into cross-channel interaction modeling and intra-channel temporal–spectral organization modeling. The framework integrates electroencephalography (EEG), peripheral physiological signals (GSR, BVP, EMG, respiration, and temperature), and eye-movement data (EOG) within a unified temporal modeling paradigm. At its core, a Dynamic Token Feature Extractor (DTFE) transforms raw time series into compact token representations and explicitly factorizes representation learning into (i) explicit channel-wise cross-series interaction modeling and (ii) temporal–spectral refinement via learnable frequency-domain gating. These complementary structural modules are implemented through Cross-Series Intersection (CSI) and Intra-Series Intersection (ISI), which perform low-rank channel dependency learning and adaptive spectral modulation, respectively. A hierarchical cross-modal fusion strategy integrates modality-level tokens in a representation-consistent and interaction-aware manner, enabling coordinated modeling of neural, autonomic, and attentional responses. The entire framework is optimized under a unified multi-task objective for valence, arousal, and liking prediction. Experiments on the DEAP dataset demonstrate consistent improvements over state-of-the-art methods. The model achieves 98.32% and 98.45% accuracy for valence and arousal prediction, 97.96% for quadrant classification in single-task evaluation, and 92.8%, 91.8%, and 93.6% accuracy for valence, arousal, and liking in joint multi-task settings. Overall, this work establishes a structure-aware and factorized multi-modal representation learning framework for robust affective decoding and intelligent music therapy systems. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

23 pages, 4778 KB

Open AccessArticle

A Dual-Attentional Gated Residual Framework for Robust Travel Time Prediction

by Jiajun Wu, Yongchuan Zhang, Yiduo Bai, Jun Xia and Yong He

ISPRS Int. J. Geo-Inf. 2026, 15(3), 120; https://doi.org/10.3390/ijgi15030120 - 12 Mar 2026

Viewed by 131

Abstract

Travel time prediction (TTP) is a fundamental pillar of intelligent transportation systems (ITS). However, deploying highly parameterized deep learning models in data-scarce environments—referred to as the “cold-start” problem—remains a critical bottleneck, frequently leading to overfitting and severe error accumulation on ultra-long trajectories. To [...] Read more.

Travel time prediction (TTP) is a fundamental pillar of intelligent transportation systems (ITS). However, deploying highly parameterized deep learning models in data-scarce environments—referred to as the “cold-start” problem—remains a critical bottleneck, frequently leading to overfitting and severe error accumulation on ultra-long trajectories. To surmount these limitations, this study proposes the Dual-Attentional Gated Residual Network (DAGRN), a data-efficient forecasting framework driven by a novel topology-temporal coordination mechanism. Specifically, the framework introduces three integrated innovations: (1) transforming the primal network into a physics-aware Line Graph to explicitly filter out illegal movements and dynamically modulating topological propagation via Feature-wise Linear Modulation (FiLM); (2) coupling a Bidirectional GRU backbone with a Multi-Head Attention module to simultaneously capture global trends and localized intersection delays; (3) employing a Gated Residual Fusion mechanism that preserves dimensional consistency and facilitates gradient flow in extensive sequences. To rigorously validate the model’s robustness, we conduct evaluations on a highly constrained, stratified dataset comprising merely 2000 trajectories. Experimental results demonstrate that DAGRN achieves state-of-the-art predictive precision with an RMSE of 415.485 s and an R² of 0.848, significantly outperforming 12 advanced baseline models and reducing error by up to 13.8% against the strongest graph baseline. Comprehensive ablation studies confirm the absolute necessity of the Multi-Head Attention module, whose removal causes the most severe performance degradation (RMSE surging to 521.495 s). Ultimately, DAGRN presents a readily deployable solution for sparse-data ITS regimes, actively paving the way for future hybrid integrations with microscopic traffic simulations and evolutionary road network optimization algorithms. Full article

► Show Figures

Figure 1

31 pages, 7238 KB

Open AccessArticle

Multimodal Fault Diagnosis of Rolling Bearings Based on GRU–ResNet–CBAM

by Kunbo Xu, Jingyang Zhang, Dongjun Liu, Chaoge Wang, Ran Wang and Funa Zhou

Machines 2026, 14(3), 318; https://doi.org/10.3390/machines14030318 - 11 Mar 2026

Viewed by 95

Abstract

Rolling bearings exhibit nonlinear and non-stationary fault signals under complex working conditions, rendering single-modal representation insufficient for accurate diagnosis. To address this limitation, this paper proposes a novel parallel multimodal fusion fault diagnosis model based on a Gated Recurrent Unit (GRU), a Residual [...] Read more.

Rolling bearings exhibit nonlinear and non-stationary fault signals under complex working conditions, rendering single-modal representation insufficient for accurate diagnosis. To address this limitation, this paper proposes a novel parallel multimodal fusion fault diagnosis model based on a Gated Recurrent Unit (GRU), a Residual Network (ResNet), and a Convolutional Block Attention Module (CBAM). First, a systematic multimodal representation selection framework is introduced, identifying the Markov Transition Field (MTF) as the optimal two-dimensional (2D) image modality due to its superior texture clarity and noise resistance compared to other methods. Second, parallel dual-branch architecture is designed to simultaneously process heterogeneous data. The 1D-GRU branch captures long-range temporal dependencies directly from raw vibration signals, while the 2D ResNet-CBAM branch extracts deep spatial features from the MTF images, adaptively focusing on key fault regions. These heterogeneous features are then fused through concatenation to retain complementary diagnostic information. Experimental validation on the Case Western Reserve University (CWRU) dataset demonstrates that the proposed model achieves a 99.57% accuracy in a 10-classification task. Furthermore, it exhibits significant parameter efficiency and outstanding robustness, with the accuracy decreasing by no more than 1.2% under noise interference and cross-load scenarios, comprehensively outperforming existing single-modal and advanced fusion methods. Full article

(This article belongs to the Special Issue Health Condition Monitoring, Intelligent Operation and Maintenance of Wind Turbines)

► Show Figures

Figure 1

27 pages, 4985 KB

Open AccessArticle

Hybrid Spatio-Temporal Deep Learning Models for Multi-Task Forecasting in Renewable Energy Systems

by Gulnaz Tolegenova, Alma Zakirova, Maksat Kalimoldayev and Zhanar Akhayeva

Computers 2026, 15(3), 183; https://doi.org/10.3390/computers15030183 - 11 Mar 2026

Viewed by 183

Abstract

Short-term forecasting of solar and wind power generation is critical for smart grid management but challenging due to non-stationarity and extreme generation events. This study addresses a multi-task learning problem: regression-based forecasting of power output and binary detection of extreme events defined by [...] Read more.

Short-term forecasting of solar and wind power generation is critical for smart grid management but challenging due to non-stationarity and extreme generation events. This study addresses a multi-task learning problem: regression-based forecasting of power output and binary detection of extreme events defined by a quantile-based threshold (q = 0.90). A hybrid spatio-temporal model, DP-STH++, is proposed, implementing parallel causal fusion of LSTM, GRU, a causal Conv1D stack, and a lightweight causal transformer. The architecture employs regression and classification heads, while an uncertainty-weighted mechanism stabilizes multitask optimization in the regression tasks; extreme event detection performance is evaluated using AUC. Training and evaluation follow a leakage-safe protocol with chronological data processing, calendar feature integration, time-aware splitting, and training-only estimation of scaling parameters and extreme thresholds. Experimental results obtained with a one-hour forecasting horizon and a 24 h context window demonstrate that DP-STH++ achieves the best regression performance on the hold-out set (RMSE = 257.18, MAE = 174.86–287.90, MASE = 0.2438, R² = 0.9440) and the highest extreme event detection accuracy (AUC = 0.9896), ranking 1st among all compared architectures. In time-series cross-validation, the model retains the leading position with a mean MASE = 0.3883 and AUC = 0.9709. The advantages are particularly pronounced for wind power forecasting, where DP-STH++ simultaneously minimizes regression errors and maximizes AUC = 0.9880–0.9908. Full article

(This article belongs to the Special Issue AI Applications for Smart Grid Energy Management and Industrial Electrical Systems)

► Show Figures

Graphical abstract

23 pages, 2294 KB

Open AccessArticle

Electric Load Forecasting for a Quicklime Company Using a Temporal Fusion Transformer

by Jersson X. Leon-Medina, Diego A. Tibaduiza, Claudia Patricia Siachoque Celys, Bernardo Umbarila Suarez and Francesc Pozo

Algorithms 2026, 19(3), 208; https://doi.org/10.3390/a19030208 - 10 Mar 2026

Viewed by 122

Abstract

Accurate short-term electric load forecasting is essential for the operation and management of energy-intensive manufacturing processes such as quicklime production, for which power demand is driven by stage-based operation, fixed schedules, and abrupt load transitions. This study presents a data-driven forecasting framework based [...] Read more.

Accurate short-term electric load forecasting is essential for the operation and management of energy-intensive manufacturing processes such as quicklime production, for which power demand is driven by stage-based operation, fixed schedules, and abrupt load transitions. This study presents a data-driven forecasting framework based on a Temporal Fusion Transformer (TFT) model applied to real industrial measurements collected during 2024 from an operating quicklime production plant. The dataset comprises hourly average power demand records (kW) measured at a plant level, stage-dependent motor operation, and a fixed working schedule from 08:00 to 18:00 (Monday to Friday), with weekends and non-operational hours characterized by near-zero load. Coke consumption during the calcination stage is included as an additional contextual variable. The TFT model is trained for multi-horizon forecasting and provides probabilistic prediction intervals through quantile regression. Weekly evaluations demonstrate that the proposed approach accurately captures start–stop behavior, peak-load periods, and structured inactivity intervals. In addition to point-wise accuracy metrics, cumulative energy is evaluated by integrating hourly power over the forecasting horizon, allowing the assessment of energy preservation at the operational level. The resulting energy deviation reaches 4.78% for the full horizon and 5.25% when restricted to active production hours, confirming strong consistency between predicted and actual cumulative energy. A comparative analysis against LSTM, GRU, and N-BEATS models shows that recurrent architectures achieve lower MAE and RMSE values, while the TFT model delivers superior cumulative energy consistency, highlighting a trade-off between instantaneous accuracy and operational energy fidelity. Overall, the results demonstrate that the proposed TFT-based framework provides a robust and practically relevant solution for short-term industrial electric load forecasting and decision support in stage-driven manufacturing systems under real operating conditions. Full article

(This article belongs to the Special Issue 2026 and 2027 Selected Papers from Algorithms Editorial Board Members)

► Show Figures

Figure 1

25 pages, 4347 KB

Open AccessArticle

A Gated Attention-Based Multi-Model Fusion Framework for Dynamic Topic Evolution and Complaint-Driven Latent Issue Mining in Online Tourism Reviews

by Liangwu Xu, Xiangjin Ran, Lili Yao and Zhaoji Lin

Information 2026, 17(3), 270; https://doi.org/10.3390/info17030270 - 9 Mar 2026

Viewed by 179

Abstract

To address the limitations of static and coarse-grained analysis in mining online tourism reviews, this study proposes a gated attention-based multi-model fusion framework for dynamic topic evolution and complaint-driven latent issue pattern mining. Using 300,000 reviews from Ctrip and Meituan, we fuse global [...] Read more.

To address the limitations of static and coarse-grained analysis in mining online tourism reviews, this study proposes a gated attention-based multi-model fusion framework for dynamic topic evolution and complaint-driven latent issue pattern mining. Using 300,000 reviews from Ctrip and Meituan, we fuse global semantics from Sentence-BERT with attention (SBERT-Attention), local features from Bidirectional Encoder Representations from Transformers–Text Convolutional Neural Network (BERT-TextCNN), and topic distributions from the Biterm Topic Model (BTM) via a learnable gating mechanism. The fused model achieves an F1-score of 92.3% in review classification. We partition the corpus quarterly and apply Uniform Manifold Approximation and Projection (UMAP) followed by K-means++ clustering to the fused vectors, yielding interpretable topics, including Scenery, Transportation, Amenities, Management, Culture, and Value for Money, and enabling dynamic topic discovery over time. River map visualizations and negative review analysis reveal seasonal evolution patterns and recurring complaint patterns associated with specific topics. The framework enables dynamic, interpretable semantic mining, advancing intelligent processing of short-text user content and offering a generalizable approach for temporal knowledge discovery in smart tourism and beyond. Full article

(This article belongs to the Topic The Applications of Artificial Intelligence in Tourism)

► Show Figures

Graphical abstract

Search Results (1,381)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,381)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI