Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (180)

Search Parameters:
Keywords = single head attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 11608 KB  
Article
YOLO-MSPM: A Precise and Lightweight Cotton Verticillium Wilt Detection Network
by Xinbo Zhao, Jianan Chi, Fei Wang, Xuan Li, Xingcan Yuwen, Tong Li, Yi Shi and Liujun Xiao
Agriculture 2025, 15(19), 2013; https://doi.org/10.3390/agriculture15192013 - 26 Sep 2025
Viewed by 257
Abstract
Cotton is one of the world’s most important economic crops, and its yield and quality have a significant impact on the agricultural economy. However, Verticillium wilt of cotton, as a widely spread disease, severely affects the growth and yield of cotton. Due to [...] Read more.
Cotton is one of the world’s most important economic crops, and its yield and quality have a significant impact on the agricultural economy. However, Verticillium wilt of cotton, as a widely spread disease, severely affects the growth and yield of cotton. Due to the typically small and densely distributed characteristics of this disease, its identification poses considerable challenges. In this study, we introduce YOLO-MSPM, a lightweight and accurate detection framework, designed on the YOLOv11 architecture to efficiently identify cotton Verticillium wilt. In order to achieve a lightweight model, MobileNetV4 is introduced into the backbone network. Moreover, a single-head self-attention (SHSA) mechanism is integrated into the C2PSA block, allowing the network to emphasize critical areas of the feature maps and thus enhance its ability to represent features effectively. Furthermore, the PC3k2 module combines pinwheel-shaped convolution (PConv) with C3k2, and the mobile inverted bottleneck convolution (MBConv) module is incorporated into the detection head of YOLOv11. Such adjustments improve multi-scale information integration, enhance small-target recognition, and effectively reduce computation costs. According to the evaluation, YOLO-MSPM achieves precision (0.933), recall (0.920), mAP50 (0.970), and mAP50-95 (0.797), each exceeding the corresponding performance of YOLOv11n. In terms of model lightweighting, the YOLO-MSPM model has 1.773 M parameters, which is a 31.332% reduction compared to YOLOv11n. Its GFLOPs and model size are 5.4 and 4.0 MB, respectively, representing reductions of 14.286% and 27.273%. The study delivers a lightweight yet accurate solution to support the identification and monitoring of cotton Verticillium wilt in environments with limited resources. Full article
Show Figures

Figure 1

18 pages, 1617 KB  
Article
GNN-MFF: A Multi-View Graph-Based Model for RTL Hardware Trojan Detection
by Senjie Zhang, Shan Zhou, Panpan Xue, Lu Kong and Jinbo Wang
Appl. Sci. 2025, 15(19), 10324; https://doi.org/10.3390/app151910324 - 23 Sep 2025
Viewed by 364
Abstract
The globalization of hardware design flows has increased the risk of Hardware Trojan (HT) insertion during the design phase. Graph-based learning methods have shown promise for HT detection at the Register Transfer Level (RTL). However, most existing approaches rely on representing RTL designs [...] Read more.
The globalization of hardware design flows has increased the risk of Hardware Trojan (HT) insertion during the design phase. Graph-based learning methods have shown promise for HT detection at the Register Transfer Level (RTL). However, most existing approaches rely on representing RTL designs through a single graph structure. This single-view modeling paradigm inherently constrains the model’s ability to perceive complex behavioral patterns, consequently limiting detection performance. To address these limitations, we propose GNN-MFF, an innovative multi-view feature fusion model based on Graph Neural Networks (GNNs). Our approach centers on joint multi-view modeling of RTL designs to achieve a more comprehensive representation. Specifically, we construct complementary graph-structural views: the Abstract Syntax Tree (AST) capturing structure information, and the Data Flow Graph (DFG) modeling logical dependency relationships. For each graph structure, customized GNN architectures are designed to effectively extract its features. Furthermore, we develop a feature fusion framework that leverages a multi-head attention mechanism to deeply explore and integrate heterogeneous features from distinct views, thereby enhancing the model’s capacity to structurally perceive anomalous logic patterns. Evaluated on an extended Trust-Hub-based HT benchmark dataset, our model achieves an average F1-score of 97.08% in automated detection of unseen HTs, surpassing current state-of-the-art methods. Full article
Show Figures

Figure 1

16 pages, 1008 KB  
Article
Mother–Preterm Infant Contingent Interactions During Supported Infant-Directed Singing in the NICU—A Feasibility Study
by Shulamit Epstein, Shmuel Arnon, Gabriela Markova, Trinh Nguyen, Stefanie Hoehl, Liat Eitan, Sofia Bauer-Rusek, Dana Yakobson and Christian Gold
Children 2025, 12(9), 1273; https://doi.org/10.3390/children12091273 - 22 Sep 2025
Viewed by 428
Abstract
Background: Supported infant-directed singing (IDS) for parents and their preterm infants has proven beneficial for parents and preterm infants’ health and relationship building. Studying parent–infant contingent interactions through behavioral observations is an established method for assessing the quality of interactions. Very few studies [...] Read more.
Background: Supported infant-directed singing (IDS) for parents and their preterm infants has proven beneficial for parents and preterm infants’ health and relationship building. Studying parent–infant contingent interactions through behavioral observations is an established method for assessing the quality of interactions. Very few studies have measured contingency between parent and preterm infants in the neonatal period during supported IDS. Methods: We conducted a feasibility study to assess the possibility of analyzing parent–very preterm infant dyads’ contingency during supported IDS in the NICU. We recruited four mother–infant dyads and video-recorded a single music therapy (MT) session before their discharge from the hospital. Two independent researchers coded three selected segments (beginning, middle, and end) from each video, according to adapted behavioral scales with inter-rater agreement analysis. Contingency between infant and maternal behaviors was analyzed. Results: Twelve video segments were coded. High inter-rater agreements (Cohen’s kappa) were found for infant eye-opening (0.93), hand positions (0.79), and head orientation (0.94), as well as maternal head orientation (0.95) and vocalizations (0.95). During supported IDS, increased infant head orientation toward the mother, eyes closed, as well as maternal head orientation toward the infant (all p < 0.001), were recorded compared to no IDS. Direction of the maternal head toward her infant was contingent on the infant’s closed eyes, extended hands, and head not toward mother. Conclusions: This feasibility study demonstrates contingency between mothers and their preterm infants’ specific behaviors during IDS. These interactions can be analyzed through video segments with high inter-rater agreement. The method described might help in evaluating other modalities that might be related to contingency. Recent advances in AI can make this tool easier to accomplish, with further studies to evaluate the importance of contingency for child development. The findings suggest that supported IDS influences infant attention and regulation. Full article
(This article belongs to the Section Pediatric Neonatology)
Show Figures

Figure 1

21 pages, 3009 KB  
Article
A Synergistic Fault Diagnosis Method for Rolling Bearings: Variational Mode Decomposition Coupled with Deep Learning
by Shuzhen Wang, Xintian Su, Jinghan Li, Fei Li, Mingwei Li, Yafei Ren, Guoqiang Wang, Nianfeng Shi and Huafei Qian
Electronics 2025, 14(18), 3714; https://doi.org/10.3390/electronics14183714 - 19 Sep 2025
Viewed by 556
Abstract
To address the limitations of the traditional methods that are used to extract features from non-stationary signals and capture temporal dependency relationships, a rolling bearing fault diagnosis method combining variational mode decomposition (VMD) and deep learning is proposed. A hybrid VMD-CNN-Transformer model is [...] Read more.
To address the limitations of the traditional methods that are used to extract features from non-stationary signals and capture temporal dependency relationships, a rolling bearing fault diagnosis method combining variational mode decomposition (VMD) and deep learning is proposed. A hybrid VMD-CNN-Transformer model is constructed, where VMD is used to adaptively decompose bearing vibration signals into multiple intrinsic mode functions (IMFs). The convolutional neural network (CNN) captures the local features of each modal time series, while the multi-head self-attention mechanism of the Transformer captures the global dependencies of each mode, enabling the global analysis and fusion of features from each mode. Finally, a fully connected layer is used to classify the 10 fault types. The experimental results on the Case Western Reserve University bearing dataset demonstrate that the model achieves a fault diagnosis accuracy of 99.48%, which is significantly higher than that of single or traditional combined methods, providing a new technical path for the intelligent diagnosis of rolling bearing faults. Full article
Show Figures

Figure 1

33 pages, 13243 KB  
Article
Maize Yield Prediction via Multi-Branch Feature Extraction and Cross-Attention Enhanced Multimodal Data Fusion
by Suning She, Zhiyun Xiao and Yulong Zhou
Agronomy 2025, 15(9), 2199; https://doi.org/10.3390/agronomy15092199 - 16 Sep 2025
Viewed by 455
Abstract
This study conducted field experiments in 2024 in Meidaizhao Town, Tumed Right Banner, Baotou City, Inner Mongolia Autonomous Region, adopting a plant-level sampling design with 10 maize plots selected as sampling areas (20 plants per plot). At four critical growth stages—jointing, heading, filling, [...] Read more.
This study conducted field experiments in 2024 in Meidaizhao Town, Tumed Right Banner, Baotou City, Inner Mongolia Autonomous Region, adopting a plant-level sampling design with 10 maize plots selected as sampling areas (20 plants per plot). At four critical growth stages—jointing, heading, filling, and maturity—multimodal data, including that covering leaf spectra, root-zone soil spectra, and leaf chlorophyll and nitrogen content, were synchronously collected from each plant. In response to the prevalent limitations of the existing yield prediction methods, such as insufficient accuracy and limited generalization ability due to reliance on single-modal data, this study takes the acquired multimodal maize data as the research object and innovatively proposes a multimodal fusion prediction network. First, to handle the heterogeneous nature of multimodal data, a parallel feature extraction architecture is designed, utilizing independent feature extraction branches—leaf spectral branch, soil spectral branch, and biochemical parameter branch—to preserve the distinct characteristics of each modality. Subsequently, a dual-path feature fusion method, enhanced by a cross-attention mechanism, is introduced to enable dynamic interaction and adaptive weight allocation between cross-modal features, specifically between leaf spectra–soil spectra and leaf spectra–biochemical parameters, thereby significantly improving maize yield prediction accuracy. The experimental results demonstrate that the proposed model outperforms single-modal approaches by effectively leveraging complementary information from multimodal data, achieving an R2 of 0.951, an RMSE of 8.68, an RPD of 4.50, and an MAE of 5.28. Furthermore, the study reveals that deep fusion between soil spectra, leaf biochemical parameters, and leaf spectral data substantially enhances prediction accuracy. This work not only validates the effectiveness of multimodal data fusion in maize yield prediction but also provides valuable insights for accurate and non-destructive yield prediction. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

21 pages, 588 KB  
Article
Research on an MOOC Recommendation Method Based on the Fusion of Behavioral Sequences and Textual Semantics
by Wenxin Zhao, Lei Zhao and Zhenbin Liu
Appl. Sci. 2025, 15(18), 10024; https://doi.org/10.3390/app151810024 - 13 Sep 2025
Viewed by 371
Abstract
To address the challenges of user behavior sparsity and insufficient utilization of course semantics on MOOC platforms, this paper proposes a personalized recommendation method that integrates user behavioral sequences with course textual semantic features. First, shallow word-level features from course titles are extracted [...] Read more.
To address the challenges of user behavior sparsity and insufficient utilization of course semantics on MOOC platforms, this paper proposes a personalized recommendation method that integrates user behavioral sequences with course textual semantic features. First, shallow word-level features from course titles are extracted using FastText, and deep contextual semantic representations from course descriptions are obtained via a fine-tuned BERT model. The two sets of semantic features are concatenated to form a multi-level semantic representation of course content. Next, the fused semantic features are mapped into the same vector space as course ID embeddings through a linear projection layer and combined with the original course ID embeddings via an additive fusion strategy, enhancing the model’s semantic perception of course content. Finally, the fused features are fed into an improved SASRec model, where a multi-head self-attention mechanism is employed to model the evolution of user interests, enabling collaborative recommendations across behavioral and semantic modalities. Experiments conducted on the MOOCCubeX dataset (1.26 million users, 632 courses) demonstrated that the proposed method achieved NDCG@10 and HR@10 scores of 0.524 and 0.818, respectively, outperforming SASRec and semantic single-modality baselines. This study offers an efficient yet semantically rich recommendation solution for MOOC scenarios. Full article
Show Figures

Figure 1

39 pages, 9593 KB  
Article
An Integrated AI Framework for Occupational Health: Predicting Burnout, Long COVID, and Extended Sick Leave in Healthcare Workers
by Maria Valentina Popa, Călin Gheorghe Buzea, Irina Luciana Gurzu, Camer Salim, Bogdan Gurzu, Dragoș Ioan Rusu, Lăcrămioara Ochiuz and Letiția Doina Duceac
Healthcare 2025, 13(18), 2266; https://doi.org/10.3390/healthcare13182266 - 10 Sep 2025
Viewed by 593
Abstract
Background: Healthcare workers face multiple, interlinked occupational health risks—burnout, post-COVID-19 sequelae (Long COVID), and extended medical leave. These outcomes often share predictors, contribute to each other, and, together, impact workforce capacity. Yet, existing tools typically address them in isolation. Objective: The objective of [...] Read more.
Background: Healthcare workers face multiple, interlinked occupational health risks—burnout, post-COVID-19 sequelae (Long COVID), and extended medical leave. These outcomes often share predictors, contribute to each other, and, together, impact workforce capacity. Yet, existing tools typically address them in isolation. Objective: The objective of this study to develop and deploy an integrated, explainable artificial intelligence (AI) framework that predicts these three outcomes using the same structured occupational health dataset, enabling unified workforce risk monitoring. Methods: We analyzed data from 1244 Romanian healthcare professionals with 14 demographic, occupational, lifestyle, and comorbidity features. For each outcome, we trained a separate predictive model within a common framework: (1) a lightweight transformer neural network with hyperparameter optimization, (2) a transformer with multi-head attention, and (3) a stacked ensemble combining transformer, XGBoost, and logistic regression. The data were SMOTE-balanced and evaluated on held-out test sets using Accuracy, ROC-AUC, and F1-score, with 10,000-iteration bootstrap testing for statistical significance. Results: The stacked ensemble achieved the highest performance: ROC AUC = 0.70 (burnout), 0.93 (Long COVID), and 0.93 (extended leave). The F1 scores were >0.89 for Long COVID and extended leave, whereas the performance gains for burnout were comparatively modest, reflecting the multidimensional and heterogeneous nature of burnout as a binary construct. The gains over logistic regression were statistically significant (p < 0.0001 for Long COVID and extended leave; p = 0.0355 for burnout). The SHAP analysis identified overlapping top predictors—tenure, age, job role, cancer history, pulmonary disease, and obesity—supporting the value of a unified framework. Conclusions: We trained separate models for each occupational health risk but deployed them in a single, real-time web application. This integrated approach improves efficiency, enables multi-outcome workforce surveillance, and supports proactive interventions in healthcare settings. Full article
Show Figures

Figure 1

15 pages, 3118 KB  
Communication
Two-Stage Marker Detection–Localization Network for Bridge-Erecting Machine Hoisting Alignment
by Lei Li, Zelong Xiao and Taiyang Hu
Sensors 2025, 25(17), 5604; https://doi.org/10.3390/s25175604 - 8 Sep 2025
Viewed by 609
Abstract
To tackle the challenges of complex construction environment interference (e.g., lighting variations, occlusion, and marker contamination) and the demand for high-precision alignment during the hoisting process of bridge-erecting machines, this paper presents a two-stage marker detection–localization network tailored to hoisting alignment. The proposed [...] Read more.
To tackle the challenges of complex construction environment interference (e.g., lighting variations, occlusion, and marker contamination) and the demand for high-precision alignment during the hoisting process of bridge-erecting machines, this paper presents a two-stage marker detection–localization network tailored to hoisting alignment. The proposed network adopts a “coarse detection–fine estimation” phased framework; the first stage employs a lightweight detection module, which integrates a dynamic hybrid backbone (DHB) and dynamic switching mechanism to efficiently filter background noise and generate coarse localization boxes of marker regions. Specifically, the DHB dynamically switches between convolutional and Transformer branches to handle features of varying complexity (using depthwise separable convolutions from MobileNetV3 for low-level geometric features and lightweight Transformer blocks for high-level semantic features). The second stage constructs a Transformer-based homography estimation module, which leverages multi-head self-attention to capture long-range dependencies between marker keypoints and the scene context. By integrating enhanced multi-scale feature interaction and position encoding (combining the absolute position and marker geometric priors), this module achieves the end-to-end learning of precise homography matrices between markers and hoisting equipment from the coarse localization boxes. To address data scarcity in construction scenes, a multi-dimensional data augmentation strategy is developed, including random homography transformation (simulating viewpoint changes), photometric augmentation (adjusting brightness, saturation, and contrast), and background blending with bounding box extraction. Experiments on a real bridge-erecting machine dataset demonstrate that the network achieves detection accuracy (mAP) of 97.8%, a homography estimation reprojection error of less than 1.2 mm, and a processing frame rate of 32 FPS. Compared with traditional single-stage CNN-based methods, it significantly improves the alignment precision and robustness in complex environments, offering reliable technical support for the precise control of automated hoisting in bridge-erecting machines. Full article
Show Figures

Figure 1

22 pages, 11395 KB  
Article
A SHDAViT-MCA Block-Based Network for Remote-Sensing Semantic Change Detection
by Weiqi Ren, Zhigang Zhang, Shaowen Liu, Haoran Xu, Zheng Ma, Rui Gao, Qingming Kong, Shoutian Dong and Zhongbin Su
Remote Sens. 2025, 17(17), 3026; https://doi.org/10.3390/rs17173026 - 1 Sep 2025
Viewed by 757
Abstract
This study addresses the challenge of accurately detecting agricultural land-use changes in bi-temporal remote sensing imagery, which is hindered by cross-temporal interference, multi-scale feature modeling limitations, and poor large-area scalability. The study proposes the Semantic Change Detection (SCD) with Single-Head Dual-Attention Vision Transformer [...] Read more.
This study addresses the challenge of accurately detecting agricultural land-use changes in bi-temporal remote sensing imagery, which is hindered by cross-temporal interference, multi-scale feature modeling limitations, and poor large-area scalability. The study proposes the Semantic Change Detection (SCD) with Single-Head Dual-Attention Vision Transformer (SHDAViT) and Multidimensional Collaborative Attention (MCA) Block-Based Network (SMBNet). The SHDAViT module enhances local-global feature aggregation through a single-head self-attention mechanism combined with channel–spatial dual attention. The MCA module mitigates cross-temporal style discrepancies by modeling cross-dimensional feature interactions, fusing bi-temporal information to accentuate true change regions. SHDAViT extracts discriminative features from each phase image, MCA aligns and fuses these features to suppress noise and amplify effective change signals. Evaluated on the newly developed AgriCD dataset and the JL1 benchmark, SMBNet outperforms five mainstream methods (BiSRNet, Bi-SRUNet++, HRSCD.str3, HRSCD.str4, and CDSC), achieving state-of-the-art performance, with F1 scores of 91.18% (AgriCD) and 86.44% (JL1), demonstrating superior accuracy in detecting subtle farmland transitions. Experimental results confirm the framework’s robustness against label imbalance and environmental variations, offering a practical solution for agricultural monitoring. Full article
Show Figures

Graphical abstract

32 pages, 5623 KB  
Article
Motion Planning for Autonomous Driving in Unsignalized Intersections Using Combined Multi-Modal GNN Predictor and MPC Planner
by Ajitesh Gautam, Yuping He and Xianke Lin
Machines 2025, 13(9), 760; https://doi.org/10.3390/machines13090760 - 25 Aug 2025
Viewed by 801
Abstract
This article presents an interaction-aware motion planning framework that integrates a graph neural network (GNN) based multi-modal trajectory predictor with a model predictive control (MPC) based planner. Unlike past studies that predict a single future trajectory per agent, our algorithm outputs three distinct [...] Read more.
This article presents an interaction-aware motion planning framework that integrates a graph neural network (GNN) based multi-modal trajectory predictor with a model predictive control (MPC) based planner. Unlike past studies that predict a single future trajectory per agent, our algorithm outputs three distinct trajectories for each surrounding road user, capturing different interaction scenarios (e.g., yielding, non-yielding, and aggressive driving behaviors). We design a GNN-based predictor with bi-directional gated recurrent unit (Bi-GRU) encoders for agent histories, VectorNet-based lane encoding for map context, an interaction-aware attention mechanism, and multi-head decoders to predict trajectories for each mode. The MPC-based planner employs a bicycle model and solves a constrained optimal control problem using CasADi and IPOPT (Interior Point OPTimizer). All three predicted trajectories per agent are fed to the planner; the primary prediction is thus enforced as a hard safety constraint, while the alternative trajectories are treated as soft constraints via penalty slack variables. The designed motion planning algorithm is examined in real-world intersection scenarios from the INTERACTION dataset. Results show that the multi-modal trajectory predictor covers possible interaction outcomes, and the planner produces smoother and safer trajectories compared to a single-trajectory baseline. In high-conflict situations, the multi-modal trajectory predictor anticipates potential aggressive behaviors of other drivers, reducing harsh braking and maintaining safe distances. The innovative method by integrating the GNN-based multi-modal trajectory predictor with the MPC-based planner is the backbone of the effective motion planning algorithm for robust, safe, and comfortable autonomous driving in complex intersections. Full article
(This article belongs to the Special Issue Design and Application of Underwater Vehicles and Robots)
Show Figures

Figure 1

23 pages, 1657 KB  
Article
High-Precision Pest Management Based on Multimodal Fusion and Attention-Guided Lightweight Networks
by Ziye Liu, Siqi Li, Yingqiu Yang, Xinlu Jiang, Mingtian Wang, Dongjiao Chen, Tianming Jiang and Min Dong
Insects 2025, 16(8), 850; https://doi.org/10.3390/insects16080850 - 16 Aug 2025
Viewed by 1057
Abstract
In the context of global food security and sustainable agricultural development, the efficient recognition and precise management of agricultural insect pests and their predators have become critical challenges in the domain of smart agriculture. To address the limitations of traditional models that overly [...] Read more.
In the context of global food security and sustainable agricultural development, the efficient recognition and precise management of agricultural insect pests and their predators have become critical challenges in the domain of smart agriculture. To address the limitations of traditional models that overly rely on single-modal inputs and suffer from poor recognition stability under complex field conditions, a multimodal recognition framework has been proposed. This framework integrates RGB imagery, thermal infrared imaging, and environmental sensor data. A cross-modal attention mechanism, environment-guided modality weighting strategy, and decoupled recognition heads are incorporated to enhance the model’s robustness against small targets, intermodal variations, and environmental disturbances. Evaluated on a high-complexity multimodal field dataset, the proposed model significantly outperforms mainstream methods across four key metrics, precision, recall, F1-score, and mAP@50, achieving 91.5% precision, 89.2% recall, 90.3% F1-score, and 88.0% mAP@50. These results represent an improvement of over 6% compared to representative models such as YOLOv8 and DETR. Additional ablation studies confirm the critical contributions of key modules, particularly under challenging scenarios such as low light, strong reflections, and sensor data noise. Moreover, deployment tests conducted on the Jetson Xavier edge device demonstrate the feasibility of real-world application, with the model achieving a 25.7 FPS inference speed and a compact size of 48.3 MB, thus balancing accuracy and lightweight design. This study provides an efficient, intelligent, and scalable AI solution for pest surveillance and biological control, contributing to precision pest management in agricultural ecosystems. Full article
Show Figures

Figure 1

21 pages, 2639 KB  
Article
A Hybrid Model of Multi-Head Attention Enhanced BiLSTM, ARIMA, and XGBoost for Stock Price Forecasting Based on Wavelet Denoising
by Qingliang Zhao, Hongding Li, Xiao Liu and Yiduo Wang
Mathematics 2025, 13(16), 2622; https://doi.org/10.3390/math13162622 - 15 Aug 2025
Viewed by 669
Abstract
The stock market plays a crucial role in the financial system, with its price movements reflecting macroeconomic trends. Due to the influence of multifaceted factors such as policy shifts and corporate performance, stock prices exhibit nonlinearity, high noise, and non-stationarity, making them difficult [...] Read more.
The stock market plays a crucial role in the financial system, with its price movements reflecting macroeconomic trends. Due to the influence of multifaceted factors such as policy shifts and corporate performance, stock prices exhibit nonlinearity, high noise, and non-stationarity, making them difficult to model accurately using a single approach. To enhance forecasting accuracy, this study proposes a hybrid forecasting framework that integrates wavelet denoising, multi-head attention-based BiLSTM, ARIMA, and XGBoost. Wavelet transform is first employed to enhance data quality. The multi-head attention BiLSTM captures nonlinear temporal dependencies, ARIMA models linear trends in residuals, and XGBoost improves the recognition of complex patterns. The final prediction is obtained by combining the outputs of all models through an inverse-error weighted ensemble strategy. Using the CSI 300 Index as an empirical case, we construct a multidimensional feature set including both market and technical indicators. Experimental results show that the proposed model clearly outperforms individual models in terms of RMSE, MAE, MAPE, and R2. Ablation studies confirm the importance of each module in performance enhancement. The model also performs well on individual stock data (e.g., Fuyao Glass), demonstrating promising generalization ability. This research provides an effective solution for improving stock price forecasting accuracy and offers valuable insights for investment decision-making and market regulation. Full article
Show Figures

Figure 1

23 pages, 6938 KB  
Article
Intelligent Detection of Cognitive Stress in Subway Train Operators Using Multimodal Electrophysiological and Behavioral Signals
by Xinyi Yang and Lu Yu
Symmetry 2025, 17(8), 1298; https://doi.org/10.3390/sym17081298 - 11 Aug 2025
Viewed by 567
Abstract
Subway train operators face the risk of cumulative cognitive stress due to factors such as visual fatigue from prolonged high-speed tunnel driving, irregular shift patterns, and the monotony of automated operations. This can lead to cognitive decline and human error accidents. Current monitoring [...] Read more.
Subway train operators face the risk of cumulative cognitive stress due to factors such as visual fatigue from prolonged high-speed tunnel driving, irregular shift patterns, and the monotony of automated operations. This can lead to cognitive decline and human error accidents. Current monitoring of cognitive stress risk predominantly relies on single-modal methods, which are susceptible to environmental interference and offer limited accuracy. This study proposes an intelligent multimodal framework for cognitive stress monitoring by leveraging the symmetry principles in physiological and behavioral manifestations. The symmetry of photoplethysmography (PPG) waveforms and the bilateral symmetry of head movements serve as critical indicators reflecting autonomic nervous system homeostasis and cognitive load. By integrating these symmetry-based features, this study constructs a spatiotemporal dynamic feature set through fusing physiological signals such as PPG and galvanic skin response (GSR) with head and facial behavioral features. Furthermore, leveraging deep learning techniques, a hybrid PSO-CNN-GRU-Attention model is developed. Within this model, the Particle Swarm Optimization (PSO) algorithm dynamically adjusts hyperparameters, and an attention mechanism is introduced to weight multimodal features, enabling precise assessment of cognitive stress states. Experiments were conducted using a full-scale subway driving simulator, collecting data from 50 operators to validate the model’s feasibility. Results demonstrate that the complementary nature of multimodal physiological signals and behavioral features effectively overcomes the limitations of single-modal data, yielding significantly superior model performance. The PSO-CNN-GRU-Attention model achieved a predictive coefficient of determination (R2) of 0.89029 and a mean squared error (MSE) of 0.00461, outperforming the traditional BiLSTM model by approximately 22%. This research provides a high-accuracy, non-invasive solution for detecting cognitive stress in subway operators, offering a scientific basis for occupational health management and the formulation of safe driving intervention strategies. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

21 pages, 2965 KB  
Article
Inspection Method Enabled by Lightweight Self-Attention for Multi-Fault Detection in Photovoltaic Modules
by Shufeng Meng and Tianxu Xu
Electronics 2025, 14(15), 3019; https://doi.org/10.3390/electronics14153019 - 29 Jul 2025
Viewed by 592
Abstract
Bird-dropping fouling and hotspot anomalies remain the most prevalent and detrimental defects in utility-scale photovoltaic (PV) plants; their co-occurrence on a single module markedly curbs energy yield and accelerates irreversible cell degradation. However, markedly disparate visual–thermal signatures of the two phenomena impede high-fidelity [...] Read more.
Bird-dropping fouling and hotspot anomalies remain the most prevalent and detrimental defects in utility-scale photovoltaic (PV) plants; their co-occurrence on a single module markedly curbs energy yield and accelerates irreversible cell degradation. However, markedly disparate visual–thermal signatures of the two phenomena impede high-fidelity concurrent detection in existing robotic inspection systems, while stringent onboard compute budgets also preclude the adoption of bulky detectors. To resolve this accuracy–efficiency trade-off for dual-defect detection, we present YOLOv8-SG, a lightweight yet powerful framework engineered for mobile PV inspectors. First, a rigorously curated multi-modal dataset—RGB for stains and long-wave infrared for hotspots—is assembled to enforce robust cross-domain representation learning. Second, the HSV color space is leveraged to disentangle chromatic and luminance cues, thereby stabilizing appearance variations across sensors. Third, a single-head self-attention (SHSA) block is embedded in the backbone to harvest long-range dependencies at negligible parameter cost, while a global context (GC) module is grafted onto the detection head to amplify fine-grained semantic cues. Finally, an auxiliary bounding box refinement term is appended to the loss to hasten convergence and tighten localization. Extensive field experiments demonstrate that YOLOv8-SG attains 86.8% mAP@0.5, surpassing the vanilla YOLOv8 by 2.7 pp while trimming 12.6% of parameters (18.8 MB). Grad-CAM saliency maps corroborate that the model’s attention consistently coincides with defect regions, underscoring its interpretability. The proposed method, therefore, furnishes PV operators with a practical low-latency solution for concurrent bird-dropping and hotspot surveillance. Full article
Show Figures

Figure 1

18 pages, 2028 KB  
Article
Research on Single-Tree Segmentation Method for Forest 3D Reconstruction Point Cloud Based on Attention Mechanism
by Lishuo Huo, Zhao Chen, Lingnan Dai, Dianchang Wang and Xinrong Zhao
Forests 2025, 16(7), 1192; https://doi.org/10.3390/f16071192 - 19 Jul 2025
Viewed by 540
Abstract
The segmentation of individual trees holds considerable significance in the investigation and management of forest resources. Utilizing smartphone-captured imagery combined with image-based 3D reconstruction techniques to generate corresponding point cloud data can serve as a more accessible and potentially cost-efficient alternative for data [...] Read more.
The segmentation of individual trees holds considerable significance in the investigation and management of forest resources. Utilizing smartphone-captured imagery combined with image-based 3D reconstruction techniques to generate corresponding point cloud data can serve as a more accessible and potentially cost-efficient alternative for data acquisition compared to conventional LiDAR methods. In this study, we present a Sparse 3D U-Net framework for single-tree segmentation which is predicated on a multi-head attention mechanism. The mechanism functions by projecting the input data into multiple subspaces—referred to as “heads”—followed by independent attention computation within each subspace. Subsequently, the outputs are aggregated to form a comprehensive representation. As a result, multi-head attention facilitates the model’s ability to capture diverse contextual information, thereby enhancing performance across a wide range of applications. This framework enables efficient, intelligent, and end-to-end instance segmentation of forest point cloud data through the integration of multi-scale features and global contextual information. The introduction of an iterative mechanism at the attention layer allows the model to learn more compact feature representations, thereby significantly enhancing its convergence speed. In this study, Dongsheng Bajia Country Park and Jiufeng National Forest Park, situated in Haidian District, Beijing, China, were selected as the designated test sites. Eight representative sample plots within these areas were systematically sampled. Forest stand sequential photographs were captured using an iPhone, and these images were processed to generate corresponding point cloud data for the respective sample plots. This methodology was employed to comprehensively assess the model’s capability for single-tree segmentation. Furthermore, the generalization performance of the proposed model was validated using the publicly available dataset TreeLearn. The model’s advantages were demonstrated across multiple aspects, including data processing efficiency, training robustness, and single-tree segmentation speed. The proposed method achieved an F1 score of 91.58% on the customized dataset. On the TreeLearn dataset, the method attained an F1 score of 97.12%. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

Back to TopTop