Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (133)

Search Parameters:
Keywords = multimodal spatio-temporal data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
35 pages, 5497 KB  
Article
Robust Localization of Flange Interface for LNG Tanker Loading and Unloading Under Variable Illumination a Fusion Approach of Monocular Vision and LiDAR
by Mingqin Liu, Han Zhang, Jingquan Zhu, Yuming Zhang and Kun Zhu
Appl. Sci. 2026, 16(2), 1128; https://doi.org/10.3390/app16021128 - 22 Jan 2026
Abstract
The automated localization of the flange interface in LNG tanker loading and unloading imposes stringent requirements for accuracy and illumination robustness. Traditional monocular vision methods are prone to localization failure under extreme illumination conditions, such as intense glare or low light, while LiDAR, [...] Read more.
The automated localization of the flange interface in LNG tanker loading and unloading imposes stringent requirements for accuracy and illumination robustness. Traditional monocular vision methods are prone to localization failure under extreme illumination conditions, such as intense glare or low light, while LiDAR, despite being unaffected by illumination, suffers from limitations like a lack of texture information. This paper proposes an illumination-robust localization method for LNG tanker flange interfaces by fusing monocular vision and LiDAR, with three scenario-specific innovations beyond generic multi-sensor fusion frameworks. First, an illumination-adaptive fusion framework is designed to dynamically adjust detection parameters via grayscale mean evaluation, addressing extreme illumination (e.g., glare, low light with water film). Second, a multi-constraint flange detection strategy is developed by integrating physical dimension constraints, K-means clustering, and weighted fitting to eliminate background interference and distinguish dual flanges. Third, a customized fusion pipeline (ROI extraction-plane fitting-3D circle center solving) is established to compensate for monocular depth errors and sparse LiDAR point cloud limitations using flange radius prior. High-precision localization is achieved via four key steps: multi-modal data preprocessing, LiDAR-camera spatial projection, fusion-based flange circle detection, and 3D circle center fitting. While basic techniques such as LiDAR-camera spatiotemporal synchronization and K-means clustering are adapted from prior works, their integration with flange-specific constraints and illumination-adaptive design forms the core novelty of this study. Comparative experiments between the proposed fusion method and the monocular vision-only localization method are conducted under four typical illumination scenarios: uniform illumination, local strong illumination, uniform low illumination, and low illumination with water film. The experimental results based on 20 samples per illumination scenario (80 valid data sets in total) show that, compared with the monocular vision method, the proposed fusion method reduces the Mean Absolute Error (MAE) of localization accuracy by 33.08%, 30.57%, and 75.91% in the X, Y, and Z dimensions, respectively, with the overall 3D MAE reduced by 61.69%. Meanwhile, the Root Mean Square Error (RMSE) in the X, Y, and Z dimensions is decreased by 33.65%, 32.71%, and 79.88%, respectively, and the overall 3D RMSE is reduced by 64.79%. The expanded sample size verifies the statistical reliability of the proposed method, which exhibits significantly superior robustness to extreme illumination conditions. Full article
Show Figures

Figure 1

19 pages, 4184 KB  
Article
Bearing Anomaly Detection Method Based on Multimodal Fusion and Self-Adversarial Learning
by Han Liu, Yong Qin and Dilong Tu
Sensors 2026, 26(2), 629; https://doi.org/10.3390/s26020629 - 17 Jan 2026
Viewed by 175
Abstract
In the context of bearing anomaly detection, challenges such as imbalanced sample distribution and complex operational conditions present significant difficulties for data-driven deep learning models. These issues often result in overfitting and high false positive rates in complex real-world scenarios. This paper proposes [...] Read more.
In the context of bearing anomaly detection, challenges such as imbalanced sample distribution and complex operational conditions present significant difficulties for data-driven deep learning models. These issues often result in overfitting and high false positive rates in complex real-world scenarios. This paper proposes a strategy that leverages multimodal fusion and Self-Adversarial Training (SAT) to construct and train a deep learning model. First, the one-dimensional bearing vibration time-series data are converted into Gramian Angular Difference Field (GADF) images, and multimodal feature fusion is performed with the original time-series data to capture richer spatiotemporal correlation features. Second, a composite data augmentation strategy combining time-domain and image-domain transformations is employed to effectively expand the anomaly samples, mitigating data scarcity and class imbalance. Finally, the SAT mechanism is introduced, where adversarial samples are generated within the fused feature space to compel the model to learn more generalized and robust feature representations, thereby significantly enhancing its performance in realistic and noisy environments. Experimental results demonstrate that the proposed method outperforms traditional baseline models across key metrics such as accuracy, precision, recall, and F1-score in abnormal bearing anomaly detection. It exhibits exceptional robustness against rail-specific interferences, offering a specialized solution strictly tailored for the unique, high-noise operational environments of intelligent railway maintenance. Full article
(This article belongs to the Special Issue Sensor-Based Fault Diagnosis and Prognosis)
Show Figures

Figure 1

20 pages, 5073 KB  
Article
SAWGAN-BDCMA: A Self-Attention Wasserstein GAN and Bidirectional Cross-Modal Attention Framework for Multimodal Emotion Recognition
by Ning Zhang, Shiwei Su, Haozhe Zhang, Hantong Yang, Runfang Hao and Kun Yang
Sensors 2026, 26(2), 582; https://doi.org/10.3390/s26020582 - 15 Jan 2026
Viewed by 166
Abstract
Emotion recognition from physiological signals is pivotal for advancing human–computer interaction, yet unimodal pipelines frequently underperform due to limited information, constrained data diversity, and suboptimal cross-modal fusion. Addressing these limitations, the Self-Attention Wasserstein Generative Adversarial Network with Bidirectional Cross-Modal Attention (SAWGAN-BDCMA) framework is [...] Read more.
Emotion recognition from physiological signals is pivotal for advancing human–computer interaction, yet unimodal pipelines frequently underperform due to limited information, constrained data diversity, and suboptimal cross-modal fusion. Addressing these limitations, the Self-Attention Wasserstein Generative Adversarial Network with Bidirectional Cross-Modal Attention (SAWGAN-BDCMA) framework is proposed. This framework reorganizes the learning process around three complementary components: (1) a Self-Attention Wasserstein GAN (SAWGAN) that synthesizes high-quality Electroencephalography (EEG) and Photoplethysmography (PPG) to expand diversity and alleviate distributional imbalance; (2) a dual-branch architecture that distills discriminative spatiotemporal representations within each modality; and (3) a Bidirectional Cross-Modal Attention (BDCMA) mechanism that enables deep two-way interaction and adaptive weighting for robust fusion. Evaluated on the DEAP and ECSMP datasets, SAWGAN-BDCMA significantly outperforms multiple contemporary methods, achieving 94.25% accuracy for binary and 87.93% for quaternary classification on DEAP. Furthermore, it attains 97.49% accuracy for six-class emotion recognition on the ECSMP dataset. Compared with state-of-the-art multimodal approaches, the proposed framework achieves an accuracy improvement ranging from 0.57% to 14.01% across various tasks. These findings offer a robust solution to the long-standing challenges of data scarcity and modal imbalance, providing a profound theoretical and technical foundation for fine-grained emotion recognition and intelligent human–computer collaboration. Full article
(This article belongs to the Special Issue Advanced Signal Processing for Affective Computing)
Show Figures

Figure 1

14 pages, 1872 KB  
Article
An AI-Driven Trainee Performance Evaluation in XR-Based CPR Training System for Enhancing Personalized Proficiency
by Junhyung Kwon and Won-Tae Kim
Electronics 2026, 15(2), 376; https://doi.org/10.3390/electronics15020376 - 15 Jan 2026
Viewed by 162
Abstract
Cardiac arrest is a life-threatening emergency requiring immediate intervention, with bystander-initiated Cardiopulmonary resuscitation (CPR) being critical for survival, especially in out-of-hospital situations where medical help is often delayed. Given that over 70% of out-of-hospital cases occur in private residences, there is a growing [...] Read more.
Cardiac arrest is a life-threatening emergency requiring immediate intervention, with bystander-initiated Cardiopulmonary resuscitation (CPR) being critical for survival, especially in out-of-hospital situations where medical help is often delayed. Given that over 70% of out-of-hospital cases occur in private residences, there is a growing imperative to provide widespread CPR training to the public. However, conventional instructor-led CPR training faces inherent limitations regarding spatiotemporal constraints and the lack of personalized feedback. To address these issues, this paper proposes an AI-integrated XR-based CPR training system designed as an advanced auxiliary tool for skill acquisition. The system integrates vision-based pose estimation with multimodal sensor data to assess the trainee’s posture and compression metrics in accordance with Korean regional CPR guidelines. Moreover, it utilizes a Large Language Model to evaluate verbal protocols, including requesting an emergency call that aligns with the guidelines. Experimental validation of the proof-of-concept reveals a verbal evaluation accuracy of 88% and a speech recognition accuracy of approximately 95%. Furthermore, the optimized concurrent architecture provides a real-time response latency under 0.5 s, and the automated marker-based tracking ensures precise spatial registration without manual calibration. These results confirm the technical feasibility of the system as a complementary solution for basic life support education. Full article
(This article belongs to the Special Issue Virtual Reality Applications in Enhancing Human Lives)
Show Figures

Figure 1

21 pages, 2506 KB  
Article
Collaborative Dispatch of Power–Transportation Coupled Networks Based on Physics-Informed Priors
by Zhizeng Kou, Yingli Wei, Shiyan Luan, Yungang Wu, Hancong Guo, Bochao Yang and Su Su
Electronics 2026, 15(2), 343; https://doi.org/10.3390/electronics15020343 - 13 Jan 2026
Viewed by 148
Abstract
Under China’s “dual-carbon” strategic goals and the advancement of smart city development, the rapid adoption of electric vehicles (EVs) has deepened the spatiotemporal coupling between transportation networks and distribution grids, posing new challenges for integrated energy systems. To address this, we propose a [...] Read more.
Under China’s “dual-carbon” strategic goals and the advancement of smart city development, the rapid adoption of electric vehicles (EVs) has deepened the spatiotemporal coupling between transportation networks and distribution grids, posing new challenges for integrated energy systems. To address this, we propose a collaborative optimization framework for power–transportation coupled networks that integrates multi-modal data with physical priors. The framework constructs a joint feature space from traffic flow, pedestrian density, charging behavior, and grid operating states, and employs hypergraph modeling—guided by power flow balance and traffic flow conservation principles—to capture high-order cross-domain coupling. For prediction, spatiotemporal graph convolution combined with physics-informed attention significantly improves the accuracy of EV charging load forecasting. For optimization, a hierarchical multi-agent strategy integrating federated learning and the Alternating Direction Method of Multipliers (ADMM) enables privacy-preserving, distributed charging load scheduling. Case studies conducted on a 69-node distribution network using real traffic and charging data demonstrate that the proposed method reduces the grid’s peak–valley difference by 20.16%, reduces system operating costs by approximately 25%, and outperforms mainstream baseline models in prediction accuracy, algorithm convergence speed, and long-term operational stability. This work provides a practical and scalable technical pathway for the deep integration of energy and transportation systems in future smart cities. Full article
Show Figures

Figure 1

28 pages, 12746 KB  
Article
Spatiotemporal Dynamics of Forest Biomass in the Hainan Tropical Rainforest Based on Multimodal Remote Sensing and Machine Learning
by Zhikuan Liu, Qingping Ling, Wenlu Zhao, Zhongke Feng, Huiqing Pei, Pietro Grimaldi and Zixuan Qiu
Forests 2026, 17(1), 85; https://doi.org/10.3390/f17010085 - 8 Jan 2026
Viewed by 201
Abstract
Tropical rainforests play a vital role in maintaining global ecological balance, carbon cycling, and biodiversity conservation, making research on their biomass dynamics scientifically significant. This study integrates multi-source remote sensing data, including canopy height derived from GEDI and ICESat-2 satellite-borne lidar, Landsat imagery, [...] Read more.
Tropical rainforests play a vital role in maintaining global ecological balance, carbon cycling, and biodiversity conservation, making research on their biomass dynamics scientifically significant. This study integrates multi-source remote sensing data, including canopy height derived from GEDI and ICESat-2 satellite-borne lidar, Landsat imagery, and environmental variables, to estimate forest biomass dynamics in Hainan’s tropical rainforests at a 30 m spatial resolution, involving a correlation analysis of factors influencing spatiotemporal changes in Hainan Tropical Rainforest biomass. The research aims to investigate the spatiotemporal variations in forest biomass and identify key environmental drivers influencing biomass accumulation. Four machine learning algorithms—Backpropagation Neural Network (BP), Convolutional Neural Network (CNN), Random Forest (RF), and Gradient Boosting Decision Tree (GBDT)—were applied to estimate biomass across five forest types from 2003 to 2023. Results indicate the Random Forest model achieved the highest accuracy (R2 = 0.82). Forest biomass and carbon stocks in Hainan Tropical Rainforest National Park increased significantly, with total carbon stocks rising from 29.03 million tons of carbon to 42.47 million tons of carbon—a 46.36% increase over 20 years. These findings demonstrate that integrating multimodal remote sensing data with advanced machine learning provides an effective approach for accurately assessing biomass dynamics, supporting forest management and carbon sink evaluations in tropical rainforest ecosystems. Full article
Show Figures

Figure 1

21 pages, 2865 KB  
Article
Multimodal Clustering and Spatiotemporal Analysis of Wearable Sensor Data for Occupational Health Risk Monitoring
by Yangsheng Wang, Shukun Lai, Honglin Mu, Shenyang Xu, Rong Hu and Chih-Yu Hsu
Technologies 2026, 14(1), 38; https://doi.org/10.3390/technologies14010038 - 5 Jan 2026
Viewed by 303
Abstract
Accurate interpretation of multimodal wearable data remains challenging in occupational environments due to heterogeneous sensing modalities, motion artifacts, and dynamic work conditions. This study proposes and validates an adaptive multimodal clustering framework for occupational health monitoring. The framework jointly models physiological, activity, and [...] Read more.
Accurate interpretation of multimodal wearable data remains challenging in occupational environments due to heterogeneous sensing modalities, motion artifacts, and dynamic work conditions. This study proposes and validates an adaptive multimodal clustering framework for occupational health monitoring. The framework jointly models physiological, activity, and location data from 24 highway-maintenance workers, incorporating a silhouette-guided feature-weighting mechanism, multi-scale temporal change-point detection, and KDE-based spatial analysis. Specifically, the analysis identified three distinct and interpretable behavioral–physiological states that exhibit significant physiological differences (p < 0.001). Notably, it reveals a predominant yet heterogeneous baseline state alongside acute high-intensity and episodic surge states, offering a nuanced view of occupational risk beyond single-modality thresholds. The integrated framework provides a principled analytical workflow for spatiotemporal health risk assessment in field settings, particularly for vibration-intensive work scenarios, while highlighting the complementary role of physiological indicators in low- or static-motion tasks. This framework is particularly effective for vibration-intensive tasks involving powered tools. However, to mitigate potential biases in detecting static heavy-load activities with limited wrist motion (e.g., lifting or carrying), future extensions should incorporate complementary weighting of physiological indicators such as heart rate variability. Full article
Show Figures

Graphical abstract

37 pages, 1846 KB  
Review
Visualization Techniques for Spray Monitoring in Unmanned Aerial Spraying Systems: A Review
by Jungang Ma, Hua Zhuo, Peng Wang, Pengchao Chen, Xiang Li, Mei Tao and Zongyin Cui
Agronomy 2026, 16(1), 123; https://doi.org/10.3390/agronomy16010123 - 4 Jan 2026
Viewed by 285
Abstract
Unmanned Aerial Spraying Systems (UASS) has rapidly advanced precision crop protection. However, the spray performance of UASSs is influenced by nozzle atomization, rotor-induced airflow, and external environmental conditions. These factors cause strong spatiotemporal coupling and high uncertainty. As a result, visualization-based monitoring techniques [...] Read more.
Unmanned Aerial Spraying Systems (UASS) has rapidly advanced precision crop protection. However, the spray performance of UASSs is influenced by nozzle atomization, rotor-induced airflow, and external environmental conditions. These factors cause strong spatiotemporal coupling and high uncertainty. As a result, visualization-based monitoring techniques are now essential for understanding these dynamics and supporting spray modeling and drift-mitigation design. This review highlights developments in spray visualization technologies along the “droplet–airflow–target” chain mechanism in UASS spraying. We first outline the physical fundamentals of droplet formation, liquid-sheet breakup, droplet size distribution, and transport mechanisms in rotor-induced flow. Dominant processes are identified across near-field, mid-field, and far-field scales. Next, we summarize major visualization methods. These include optical imaging (PDPA/PDIA, HSI, DIH), laser-based scattering and ranging (LD, LiDAR), and flow-field visualization (PIV). We compare their spatial resolution, measurement range, 3D reconstruction capabilities, and possible sources of error. We then review wind-tunnel trials, field experiments, and point-cloud reconstruction studies. These studies show how downwash flow and tip vortices affect plume structure, canopy disturbance, and deposition patterns. Finally, we discuss emerging intelligent analysis for large-scale monitoring—such as image-based droplet recognition, multimodal data fusion, and data-driven modeling. We outline future directions, including unified feature systems, vortex-coupled models, and embedded closed-loop spray control. This review is a comprehensive reference for advancing UASS analysis, drift assessment, spray optimization, and smart support systems. Full article
(This article belongs to the Special Issue New Trends in Agricultural UAV Application—2nd Edition)
Show Figures

Figure 1

26 pages, 3866 KB  
Article
PALC-Net: A Partial Convolution Attention-Enhanced CNN-LSTM Network for Aircraft Engine Remaining Useful Life Prediction
by Lingrui Wu, Shikai Song, Hanfang Li, Chaozhu Hu and Youxi Luo
Electronics 2026, 15(1), 131; https://doi.org/10.3390/electronics15010131 - 27 Dec 2025
Viewed by 161
Abstract
Remaining Useful Life (RUL) prediction for aeroengines represents a core challenge in Prognostics and Health Management (PHM), with significant implications for condition-based maintenance, operational cost reduction, and flight safety enhancement. Current deep learning-based approaches encounter three major limitations when handling multi-source sensor data: [...] Read more.
Remaining Useful Life (RUL) prediction for aeroengines represents a core challenge in Prognostics and Health Management (PHM), with significant implications for condition-based maintenance, operational cost reduction, and flight safety enhancement. Current deep learning-based approaches encounter three major limitations when handling multi-source sensor data: conventional convolution operations struggle to model heterogeneous sensor feature distributions, leading to computational redundancy; simplistic multimodal fusion strategies often induce semantic conflicts; and high model complexity hinders industrial deployment. To address these issues, this paper proposes a novel Partial Convolution Attention-enhanced CNN-LSTM Network (PALC-Net). We introduce a partial convolution mechanism that applies convolution to only half of the input channels while preserving identity mappings for the remainder. This design retains representational power while substantially lowering computational overhead. A dual-branch feature extraction architecture is developed: the temporal branch employs a PConv-CNN-LSTM architecture to capture spatio-temporal dependencies, while the statistical branch utilizes multi-scale sliding windows to extract physical degradation indicators—such as mean, standard deviation, and trend. Additionally, an adaptive fusion module based on cross-attention is designed, where heterogeneous features are projected into a unified semantic space via Query-Key-Value mappings. A sigmoid gating mechanism is incorporated to enable dynamic weight allocation, effectively mitigating inter-modal conflicts. Extensive experiments on the NASA C-MAPSS dataset demonstrate that PALC-Net achieves state-of-the-art performance across all four subsets. Notably, on the FD003 subset, it attains an MAE of 7.70 and an R2 of 0.9147, significantly outperforming existing baselines. Ablation studies validate the effectiveness and synergistic contributions of the partial convolution, attention mechanism, and multimodal fusion modules. This work offers an accurate and efficient solution for aeroengine RUL prediction, achieving an effective balance between engineering practicality and algorithmic sophistication. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 5125 KB  
Article
Estimating Soil Moisture Using Multimodal Remote Sensing and Transfer Optimization Techniques
by Jingke Liu, Lin Liu, Weidong Yu and Xingbin Wang
Remote Sens. 2026, 18(1), 84; https://doi.org/10.3390/rs18010084 - 26 Dec 2025
Viewed by 379
Abstract
Surface soil moisture (SSM) is essential for crop growth, irrigation management, and drought monitoring. However, conventional field-based measurements offer limited spatial and temporal coverage, making it difficult to capture environmental variability at scale. This study introduces a multimodal soil moisture estimation framework that [...] Read more.
Surface soil moisture (SSM) is essential for crop growth, irrigation management, and drought monitoring. However, conventional field-based measurements offer limited spatial and temporal coverage, making it difficult to capture environmental variability at scale. This study introduces a multimodal soil moisture estimation framework that combines synthetic aperture radar (SAR), optical imagery, vegetation indices, digital elevation models (DEM), meteorological data, and spatio-temporal metadata. To strengthen model performance and adaptability, an intermediate fine-tuning strategy is applied to two datasets comprising 10,571 images and 3772 samples. This approach improves generalization and transferability across regions. The framework is evaluated across diverse agro-ecological zones, including farmlands, alpine grasslands, and environmentally fragile areas, and benchmarked against single-modality methods. Results with RMSE 4.5834% and R2 0.8956 show consistently high accuracy and stability, enabling the production of reliable field-scale soil moisture maps. By addressing the spatial and temporal challenges of soil monitoring, this framework provides essential information for precision irrigation. It supports site-specific water management, promotes efficient water use, and enhances drought resilience at both farm and regional scales. Full article
Show Figures

Graphical abstract

23 pages, 5039 KB  
Article
A3DSimVP: Enhancing SimVP-v2 with Audio and 3D Convolution
by Junfeng Yang, Mingrui Long, Hongjia Zhu, Limei Liu, Wenzhi Cao, Qin Li and Han Peng
Electronics 2026, 15(1), 112; https://doi.org/10.3390/electronics15010112 - 25 Dec 2025
Viewed by 247
Abstract
In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a [...] Read more.
In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a critical challenge. Traditional error control strategies, such as Forward Error Correction (FEC) and Automatic Repeat Request (ARQ), often introduce excessive latency or bandwidth overhead. Meanwhile, receiver-side concealment methods struggle under high motion or significant packet loss, motivating the exploration of predictive models. SimVP-v2, with its efficient convolutional architecture and Gated Spatiotemporal Attention (GSTA) mechanism, provides a strong baseline by reducing complexity and achieving competitive prediction performance. Despite its merits, SimVP-v2’s reliance on 2D convolutions for implicit temporal aggregation limits its capacity to capture complex motion trajectories and long-term dependencies. This often results in artifacts such as motion blur, detail loss, and accumulated errors. Furthermore, its single-modality design ignores the complementary contextual cues embedded in the audio stream. To overcome these issues, we propose A3DSimVP (Audio- and 3D-Enhanced SimVP-v2), which integrates explicit spatio-temporal modeling with multimodal feature fusion. Architecturally, we replace the 2D depthwise separable convolutions within the GSTA module with their 3D counterparts, introducing a redesigned GSTA-3D module that significantly improves motion coherence across frames. Additionally, an efficient audio–visual fusion strategy supplements visual features with contextual audio guidance, thereby enhancing the model’s robustness and perceptual realism. We validate the effectiveness of A3DSimVP’s improvements through extensive experiments on the KTH dataset. Our model achieves a PSNR of 27.35 dB, surpassing the 27.04 of the SimVP-v2 baseline. Concurrently, our improved A3DSimVP model reduces the loss metrics on the KTH dataset, achieving an MSE of 43.82 and an MAE of 385.73, both lower than the baseline. Crucially, our LPIPS metric is substantially lowered to 0.22. These data tangibly confirm that A3DSimVP significantly enhances both structural fidelity and perceptual quality while maintaining high predictive accuracy. Notably, A3DSimVP attains faster inference speeds than the baseline with only a marginal increase in computational overhead. These results establish A3DSimVP as an efficient and robust solution for latency-critical video applications. Full article
(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)
Show Figures

Figure 1

28 pages, 789 KB  
Review
An Overview of Spatiotemporal Network Forecasting: Current Research Status and Methodological Evolution
by Chenchen Yang, Wenbing Zhang and Yingjiang Zhou
Mathematics 2026, 14(1), 18; https://doi.org/10.3390/math14010018 - 21 Dec 2025
Viewed by 541
Abstract
Time series and spatio-temporal forecasting are fundamental tasks for complex system modeling and intelligent decision-making, with broad applications in transportation, meteorology, finance, healthcare, and public safety. Compared with simple univariate time series, real-world spatio-temporal data exhibit rich temporal dynamics and intricate spatial interactions, [...] Read more.
Time series and spatio-temporal forecasting are fundamental tasks for complex system modeling and intelligent decision-making, with broad applications in transportation, meteorology, finance, healthcare, and public safety. Compared with simple univariate time series, real-world spatio-temporal data exhibit rich temporal dynamics and intricate spatial interactions, leading to heterogeneity, non-stationarity, and evolving topologies. Addressing these challenges requires modeling frameworks that can simultaneously capture temporal evolution, spatial correlations, and cross-domain regularities. This survey provides a comprehensive synthesis of forecasting methods, spanning statistical algorithms, traditional machine learning approaches, neural architectures, and recent generative and causal paradigms. We review the methodological evolution from classical linear models to deep learning–based temporal modules and emphasize the role of attention-based Transformers as general-purpose sequence architectures. In parallel, we distinguish these architectural advances from pre-trained foundation models for time series and spatio-temporal data (e.g., large models trained across diverse domains), which leverage self-supervised objectives and exhibit strong zero-/few-shot transfer capabilities. We organize the review along both data-type and architectural dimensions—single long-term time series, Euclidean-structured spatio-temporal data, and graph-structured spatio-temporal data—while also examining advanced paradigms such as diffusion models, causal modeling, multimodal-driven frameworks, and pre-trained foundation models. Through this taxonomy, we highlight common strengths and limitations across approaches, including issues of scalability, robustness, real-time efficiency, and interpretability. Finally, we summarize open challenges and future directions, with a particular focus on the joint evolution of graph-based, causal, diffusion, and foundation-model paradigms for next-generation spatio-temporal forecasting. Full article
(This article belongs to the Special Issue Advanced Machine Learning Research in Complex System)
Show Figures

Figure 1

27 pages, 6190 KB  
Article
Multimodal Temporal Fusion for Next POI Recommendation
by Fang Liu, Jiangtao Li and Tianrui Li
Algorithms 2026, 19(1), 3; https://doi.org/10.3390/a19010003 - 20 Dec 2025
Viewed by 339
Abstract
The objective of the next POI recommendation is using the historical check-in sequences of users to learn the preferences and habits of users, providing a list of POIs that users will be inclined to visit next. Then, there are some limitations in existing [...] Read more.
The objective of the next POI recommendation is using the historical check-in sequences of users to learn the preferences and habits of users, providing a list of POIs that users will be inclined to visit next. Then, there are some limitations in existing POI recommendation algorithms. On the one hand, after obtaining the user’s preferences for the current period, if we consider the entire historical check-in sequence, including future check-in information, it is susceptible to the influence of noisy data, thereby reducing the accuracy of recommendations. On the other hand, the current methods generally rely on modeling long- and short-term preferences within a fixed time window, which possibly leads to an inability to capture users’ behavior characteristics at different time scales. As a result, we proposed a Multimodal Temporal Fusion for Next POI Recommendation(MTFNR). Firstly, to understand users’ preferences and habits at different periods, multiple hypergraph neural networks are constructed to analyze user behavior patterns at different stages, and in order to avoid introducing interference factors, only the check-in sequences visited in the current period are considered to reduce the impact of noise on the model’s recommendation performance. Secondly, modeling the next POI recommendation task through the fusion of time information and long- and short-term preferences in order to gain a more comprehensive understanding of users’ preferences and habits, enhance the timeliness of recommendations, and improve the accuracy of recommendations. Lastly, introducing spatio-temporal interval information into the GRU model, capturing dependencies in sequences to improve the overall performance of the model. Extensive experiments on the real LBSN datasets demonstrated the superior performance of the MTFNR model. The experimental results indicate that Top-10 recall improved 2.81% to 15.97% compared to current methods. Full article
(This article belongs to the Special Issue Graph and Hypergraph Algorithms and Applications)
Show Figures

Figure 1

22 pages, 8263 KB  
Article
Research on Propeller Defect Diagnosis of Rotor UAVs Based on MDI-STFFNet
by Beining Cui, Dezhi Jiang, Xinyu Wang, Lv Xiao, Peisen Tan, Yanxia Li and Zhaobin Tan
Symmetry 2026, 18(1), 3; https://doi.org/10.3390/sym18010003 - 19 Dec 2025
Viewed by 254
Abstract
To address flight safety risks from rotor defects in rotorcraft drones operating in complex low-altitude environments, this study proposes a high-precision diagnostic model based on the Multimodal Data Input and Spatio-Temporal Feature Fusion Network (MDI-STFFNet). The model uses a dual-modality coupling mechanism that [...] Read more.
To address flight safety risks from rotor defects in rotorcraft drones operating in complex low-altitude environments, this study proposes a high-precision diagnostic model based on the Multimodal Data Input and Spatio-Temporal Feature Fusion Network (MDI-STFFNet). The model uses a dual-modality coupling mechanism that integrates vibration and air pressure signals, forming a “single-path temporal, dual-path representational” framework. The one-dimensional vibration signal and the five-channel pressure array are mapped into a texture space via phase space reconstruction and color-coded recurrence plots, followed by extraction of transient spatial features using a pre-trained ResNet-18 model. Parallel LSTM networks capture long-term temporal dependencies, while a parameter-free 1D max-pooling layer compresses redundant pressure data, reducing LSTM parameter growth. The CSW-FM module enables adaptive fusion across modal scales via shared-weight mapping and learnable query vectors that dynamically assign spatiotemporal weights. Experiments on a self-built dataset with seven defect types show that the model achieves 99.01% accuracy, improving by 4.46% and 1.98% over single-modality vibration and pressure inputs. Ablation studies confirm the benefits of spatiotemporal fusion and soft weighting in accuracy and robustness. The model provides a scalable, lightweight solution for UAV power system fault diagnosis under high-noise and varying conditions. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

14 pages, 6479 KB  
Article
Automating Air Pollution Map Analysis with Multi-Modal AI and Visual Context Engineering
by Szymon Cogiel, Mateusz Zareba, Tomasz Danek and Filip Arnaut
Atmosphere 2026, 17(1), 2; https://doi.org/10.3390/atmos17010002 - 19 Dec 2025
Viewed by 351
Abstract
The increasing volume of data from IoT sensors has made manual inspection time-consuming and prone to bias, particularly for spatiotemporal air pollution maps. While rule-based methods are adequate for simple datasets or individual maps, they are insufficient for interpreting multi-year time series data [...] Read more.
The increasing volume of data from IoT sensors has made manual inspection time-consuming and prone to bias, particularly for spatiotemporal air pollution maps. While rule-based methods are adequate for simple datasets or individual maps, they are insufficient for interpreting multi-year time series data with 1 h timestamps, which require both domain-specific expertise and significant time investment. This limitation is especially critical in environmental monitoring, where analyzing long-term spatiotemporal PM2.5 maps derived from 52 low-cost sensors remains labor-intensive and susceptible to human error. This study investigates the potential of generative artificial intelligence, specifically multi-modal large language models (MLLMs), for interpreting spatiotemporal PM2.5 maps. Both open-source models (Janus-Pro and LLaVA-1.5) and commercial large language models (GPT-4o and Gemini 2.5 Pro) were evaluated. The initial results showed a limited performance, highlighting the difficulty of extracting meaningful information directly from raw sensor-derived maps. To address this, a visual context engineering framework was introduced, comprising systematic optimization of colormaps, normalization of intensity ranges, and refinement of map layers and legends to improve clarity and interpretability for AI models. Evaluation using the GEval metric demonstrated that visual context engineering increased interpretation accuracy (defined as the detection of PM2.5 spatial extrema) by over 32.3% (relative improvement). These findings provide strong evidence that tailored visual preprocessing enables MLLMs to effectively interpret complex environmental time series data, representing a novel approach that bridges data-driven modeling with ecological monitoring and offers a scalable solution for automated, reliable, and reproducible analysis of high-resolution air quality datasets. Full article
(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)
Show Figures

Figure 1

Back to TopTop