Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (545)

Search Parameters:
Keywords = multi-headed self-attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 6122 KB  
Article
Automated Detection and Classification of Lunar Linear Tectonic Features Using a Deep Learning Method
by Xiaoyang Liu, Yang Luo, Jianhui Wang, Denggao Qiu, Jianguo Yan, Wensong Zhang and Yaowen Luo
Remote Sens. 2026, 18(9), 1330; https://doi.org/10.3390/rs18091330 - 26 Apr 2026
Viewed by 143
Abstract
On the lunar surface, wrinkle ridges, grabens, and lobate scarps represent key tectonic landforms that reflect the evolution of the Moon’s stress field and its tectonic processes. However, these linear structures often exhibit weak textures, low contrast, and large scale variations, making manual [...] Read more.
On the lunar surface, wrinkle ridges, grabens, and lobate scarps represent key tectonic landforms that reflect the evolution of the Moon’s stress field and its tectonic processes. However, these linear structures often exhibit weak textures, low contrast, and large scale variations, making manual interpretation inefficient and subjective. To address this issue, this study introduces an improved YOLOv8 model, termed HL-YOLOv8, for the automated detection of lunar linear features. The model incorporates a multiscale lightweight channel attention (C2f_MLCA) module into the backbone network to enhance the extraction of fine-grained and weak-texture features and integrates a multihead self-attention (C2f_MHSA) module in the feature fusion stage to improve the modelling of long-range spatial dependencies. In addition, the combination of a dual focal loss and a diversified data augmentation strategy effectively mitigates the detection difficulties caused by class imbalance and weak-feature samples. The experimental results obtained using the global LROC-WAC image dataset demonstrate that HL-YOLOv8 significantly outperforms the baseline YOLOv8 and other comparative models in terms of precision, recall, and mAP@0.5. Specifically, the proposed model achieved an average precision of 73.5%, an average recall of 73.1%, and an average mAP@0.5 of 74.6% on the evaluation dataset, showing particularly strong performance in detecting elongated grabens and boundary-blurred lobate scarps. The global distribution maps derived from the model predictions indicate that HL-YOLOv8 can be applied to comprehensively reconstruct the spatial patterns of the three types of linear structures and identify potential new features in high-latitude and geologically complex regions, demonstrating excellent generalizability and robustness. This study provides an efficient and reliable framework for the automated identification and global mapping of lunar linear features and offers a transferable methodological reference for the tectonic interpretation of terrestrial planets. Full article
29 pages, 4546 KB  
Article
Beyond Scale Variability: Dynamic Cross-Scale Modeling and Efficient Sparse Heads for Wind Turbine Blade Defect Detection
by Xingxing Fan, Manxiang Gao, Yong Wang, Haining Tang, Fengyong Sun and Changpo Song
Processes 2026, 14(9), 1367; https://doi.org/10.3390/pr14091367 - 24 Apr 2026
Viewed by 114
Abstract
Images of wind turbine blades captured by drones often feature complex backgrounds, and small targets such as minor defects or images have low resolution, leading to reduced recognition rates. To address environments with complex feature backgrounds, this paper proposes the PPS-MSDeim model. Based [...] Read more.
Images of wind turbine blades captured by drones often feature complex backgrounds, and small targets such as minor defects or images have low resolution, leading to reduced recognition rates. To address environments with complex feature backgrounds, this paper proposes the PPS-MSDeim model. Based on the lightweight end-to-end detection framework DEIM-N, it introduces three core innovations to tackle the challenge of detecting small, irregular defects on wind turbine blades against complex backgrounds. First, we design an inverted multi-scale deep separable convolutional module (MDSC). After compressing channels via a bottleneck layer, it concurrently processes 3 × 3, 5 × 5, and 7 × 7 inverted deep separable convolutions. By first fusing channel information and then extracting multi-receiver-field spatial features, this approach enhances the ability to characterize morphologically variable defects while reducing computational overhead. The MDSC is then embedded into the backbone network HGNetv2. Second, we construct a Multi-Scale Feature Aggregation and Diffusion Pyramid Network (MFADPN). Through a Multi-Scale Feature Aggregation Module (MSFAM), it directly fuses features from layers P2 to P5, achieving deep integration of high-level semantics and low-level details. Combining dilated convolutions with expansion ratios of 1, 3, and 5 captures multi-level context, and a Sobel edge branch is introduced to enhance defect contours; subsequently, a feature diffusion operation is performed to distribute the enhanced features back to each level, shortening information paths and preventing signal decay; simultaneously, a high-resolution detection head is added to P2 and the P5 head is removed to improve sensitivity for small object detection. Finally, we propose the PPSformer module to replace the original Transformer encoding layer. It uses patch embedding to convert images into sequences and introduces a multi-head probabilistic sparse self-attention mechanism that focuses only on key-value pairs during attention computation. This design efficiently captures irregularly varying feature information and globally detects data anomalies induced by external defects. This study uses real engineering data sets, and the results show that PPS-MSDeim, based on DEIM, increased mAP@0.5 by 6.7%, reaching 95.1%. mAP@0.5–0.95 increased by 12.0%, reaching 70.1%. This indicates that the proposed method has a significant advantage in detecting defects in wind turbine blades. Full article
19 pages, 2502 KB  
Article
Automatic Sleep Staging with Long-Term Temporal Modeling Using Single-Channel EEG
by Qiyu Yang, Dejun Zhang and Yi Huang
Appl. Sci. 2026, 16(9), 4092; https://doi.org/10.3390/app16094092 - 22 Apr 2026
Viewed by 295
Abstract
With the increasing demand for sleep health monitoring, automatic sleep staging using single-channel electroencephalogram (EEG) signals has become increasingly prominent due to its clinical practicality. Existing methods have achieved notable progress, but they often fail to adequately capture long-term temporal dependencies and struggle [...] Read more.
With the increasing demand for sleep health monitoring, automatic sleep staging using single-channel electroencephalogram (EEG) signals has become increasingly prominent due to its clinical practicality. Existing methods have achieved notable progress, but they often fail to adequately capture long-term temporal dependencies and struggle to characterize transition phases. We propose SleepLT, an automated sleep staging framework that integrates multi-scale wavelet decomposition (MWD) and multi-head latent Fourier attention (MLFA). The MLFA module incorporates Fourier analysis into self-attention mechanisms and employs a partially weight-sharing bottleneck to optimize Key/Value generation, effectively capturing sleep rhythms. Extensive experiments on SleepEDF-78 and SHHS datasets demonstrate strong and consistent performance, with Macro F1 improvements of 2.1–3.2% over the compared baselines. Visualizations confirm that SleepLT enhances inter-class discriminability between sleep stages, robustly detects salient waveforms, and effectively captures transitions through long-sequence modeling. These results indicate that SleepLT is effective for automatic sleep staging from single-channel EEG, particularly in improving the recognition of ambiguous transitional stages such as N1 and REM. Full article
(This article belongs to the Special Issue Applied Multimodal AI: Methods and Applications Across Domains)
Show Figures

Figure 1

18 pages, 39608 KB  
Article
Denoising Domain Adversarial Network Based on Attention Mechanism for Motor Fault Diagnosis in Real Industrial Environment
by Linjie Jin, Zhengqing Liu, Dawei Gu, Baisong Pan, Qiucheng Wang and Mohammad Fard
Machines 2026, 14(5), 462; https://doi.org/10.3390/machines14050462 - 22 Apr 2026
Viewed by 216
Abstract
Acoustic signal-based fault diagnosis offers a promising non-contact approach for rotating machinery. However, its practical application is usually affected by environmental noise. This paper presented a Denoising Attention Domain Adversarial Network (DDAN) for the robust fault diagnosis of wheel hub motors under severe [...] Read more.
Acoustic signal-based fault diagnosis offers a promising non-contact approach for rotating machinery. However, its practical application is usually affected by environmental noise. This paper presented a Denoising Attention Domain Adversarial Network (DDAN) for the robust fault diagnosis of wheel hub motors under severe noise interference. The proposed framework consists of the following two core modules: a DenseNet-based denoising module that adaptively suppresses background noise while retaining critical fault features, and a Stacked Autoencoder Domain Adversarial Network (SADAN) that integrates channel attention, spatial attention, and multi-head self-attention (MHSA) for refined feature extraction and classification. Such a hierarchical attention mechanism facilitates effective local noise suppression and global dependency capture. Validation on a hub motor fault dataset and publicly available online dataset demonstrates that compared to existing methods, DDAN achieves superior diagnostic accuracy across various noise levels and signal-to-noise ratios, improving SNR from -15.97 dB to 1.24 dB, achieving 82.71% accuracy under low SNR condition, and reaching 84.93% and 83.75% accuracy in cross-domain generalization tests. Furthermore, the comparison of the diagnostic accuracy of audio signals from different acoustic acquisition devices further verifies the practicality and potential of the system in low-cost industrial deployment. Full article
(This article belongs to the Section Electrical Machines and Drives)
Show Figures

Figure 1

18 pages, 9261 KB  
Article
MSResBiMamba: A Deep Cascaded Architecture for EEG Signal Decoding
by Ruiwen Jiang, Yi Zhou and Jingxiang Zhang
Mathematics 2026, 14(8), 1348; https://doi.org/10.3390/math14081348 - 17 Apr 2026
Viewed by 176
Abstract
Electroencephalogram (EEG) signals serve as the core information carrier for brain–computer interfaces (BCIs); however, their highly non-stationary nature, extremely low signal-to-noise ratio, and significant inter-individual variability pose considerable challenges for signal decoding. Existing deep learning methods struggle to strike a balance between multi-scale, [...] Read more.
Electroencephalogram (EEG) signals serve as the core information carrier for brain–computer interfaces (BCIs); however, their highly non-stationary nature, extremely low signal-to-noise ratio, and significant inter-individual variability pose considerable challenges for signal decoding. Existing deep learning methods struggle to strike a balance between multi-scale, fine-grained feature extraction and efficient long-range temporal modeling. To overcome this limitation, this study proposes a novel deep cascaded architecture, MSResBiMamba, which deeply integrates multi-scale spatiotemporal feature learning with cutting-edge long-sequence modeling techniques. The model first utilizes an enhanced multi-scale spatiotemporal convolutional network (MS-CNN) combined with a SE-channel attention mechanism to adaptively extract local multi-band features and dynamically suppress redundant artefacts. Subsequently, it innovatively introduces an enhanced bidirectional Mamba (Bi-Mamba) module to efficiently capture non-causal long-range temporal dependencies with linear computational complexity, whilst cascading multi-head self-attention mechanisms to establish global higher-order feature interactions. Extensive experiments on the BCI Competition IV-2a dataset demonstrate that MSResBiMamba achieves outstanding classification performance in multi-class motor imagery tasks, significantly outperforming traditional methods and existing state-of-the-art neural networks. Ablation studies and t-SNE visualisations further confirm the model’s robustness in feature decoupling and cross-subject applications, providing a high-precision, high-efficiency decoding solution for BCI systems. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

24 pages, 10870 KB  
Article
MV-HAGCN: Prediction of miRNA-Disease Association Based on Multi-View Hybrid Attention Graph Convolutional Network
by Konglin Xing, Yujing Zhang and Wen Zhu
Int. J. Mol. Sci. 2026, 27(8), 3533; https://doi.org/10.3390/ijms27083533 - 15 Apr 2026
Viewed by 248
Abstract
Accurate identification of disease-associated microRNAs (miRNAs) is crucial for elucidating pathogenic mechanisms and advancing therapeutic discovery. Although computational methods, particularly those based on biological networks, have become essential tools for predicting miRNA-disease associations, existing approaches often struggle to comprehensively learn from heterogeneous data [...] Read more.
Accurate identification of disease-associated microRNAs (miRNAs) is crucial for elucidating pathogenic mechanisms and advancing therapeutic discovery. Although computational methods, particularly those based on biological networks, have become essential tools for predicting miRNA-disease associations, existing approaches often struggle to comprehensively learn from heterogeneous data and optimize feature representations. To overcome these limitations, we propose the Multi-view Hybrid Attention Graph Convolutional Network (MV-HAGCN). This framework constructs a comprehensive heterogeneous network by integrating multi-source biological information, simultaneously capturing miRNA similarity and disease similarity. We design a hierarchical attention mechanism to enable refined feature learning: first, the Efficient Channel Attention (ECA) module prioritizes information-rich input features, ensuring the model focuses on high-value biological characteristics. Subsequently, the Multi-Head Self-Attention Graph Convolutional Network operates on these refined features. Through iterative message passing and multi-head self-attention, it captures not only direct first-order relationships between nodes but also explicitly models and infers complex, indirect higher-order relationships within the network. This hierarchical design progressively refines feature representations, from channel-level recalibration to global structural dependency modeling, enabling the model to capture both local and high-order relational patterns. Furthermore, a dynamic weight learning strategy adaptively integrates multi-perspective similarity matrices, achieving superior feature complementarity and synergy. Finally, the high-order node representations learned through multi-layer graph convolutions are fed into a multi-layer perceptron for integration and nonlinear transformation, enabling precise prediction of potential miRNA-disease associations. Comprehensive evaluation through five-fold cross-validation on HMDD v2.0 and v3.2 benchmark datasets demonstrates that MV-HAGCN consistently outperforms existing state-of-the-art methods in predictive performance. Case studies targeting key diseases such as breast cancer, lung tumors, and pancreatic disorders revealed that the top 50 miRNAs associated with each of these three conditions were all validated in databases, confirming the practical value of this model in screening candidate miRNAs with high biological relevance. Full article
(This article belongs to the Collection Feature Papers in Molecular Informatics)
Show Figures

Figure 1

23 pages, 4649 KB  
Article
A Mechanism-Disentangled Two-Stage Forecasting Framework with Multi-Source Signal Fusion for Respiratory Hospitalizations
by Zhengze Li, Fanyu Meng, Haoxiang Liu and Jing Bian
Electronics 2026, 15(8), 1656; https://doi.org/10.3390/electronics15081656 - 15 Apr 2026
Viewed by 166
Abstract
Accurate forecasting of respiratory virus-associated hospitalization rates per 100,000 population is essential for healthcare capacity planning, yet remains challenging during the COVID-19 era due to abrupt distribution shifts and symptom overlap among influenza-like illnesses caused by multiple pathogens. We propose a two-stage deep [...] Read more.
Accurate forecasting of respiratory virus-associated hospitalization rates per 100,000 population is essential for healthcare capacity planning, yet remains challenging during the COVID-19 era due to abrupt distribution shifts and symptom overlap among influenza-like illnesses caused by multiple pathogens. We propose a two-stage deep learning framework that disentangles stable pre-pandemic seasonal dynamics from COVID-19-induced excess hospitalizations. A lightweight GRU is first trained on pre-pandemic surveillance data to model baseline influenza/RSV-driven seasonality, after which an excess model learns from the residual series and integrates multiple online search trends (flu, COVID-19, and fever) using a standard multi-head self-attention mechanism. While we use COVID-19-era data as a case study, the proposed baseline–excess decomposition is not disease-specific and is intended to generalize to future large-scale respiratory outbreaks or pandemics that induce abrupt regime shifts. Experiments on U.S. weekly respiratory hospitalization rate data curated from CDC surveillance networks (AME) show that the proposed approach achieves strong accuracy on a chronological COVID-era split (2020–2025), reaching R2=0.907 with MAPE = 19.22%. Beyond point forecasts, we further evaluate an expanding-window rolling-origin protocol and report calibrated prediction intervals via split conformal prediction, supporting deployment-oriented uncertainty quantification. By decoupling baseline and excess components and fusing behavioral trend signals in a disciplined manner, this framework improves predictive performance under regime shift while providing interpretable excess estimates for timely situational awareness and healthcare resource planning. Full article
Show Figures

Figure 1

27 pages, 4774 KB  
Article
Hybrid Temporal Convolutional Networks and Long Short-Term Memory Model for Accurate and Sustainable Wind–Solar Power Forecasting Leveraging Time-Frequency Joint Analysis and Multi-Head Self-Attention
by Yue Liu, Qinglin Cheng, Haiying Sun, Yaming Qi and Lingli Meng
Sustainability 2026, 18(8), 3904; https://doi.org/10.3390/su18083904 - 15 Apr 2026
Viewed by 303
Abstract
Accurate forecasting of wind and photovoltaic power remains challenging due to the strong nonlinearity, nonstationarity, and seasonal heterogeneity of renewable generation series. To address this issue, this study proposes a hybrid forecasting framework integrating time–frequency joint analysis (TFAA), temporal convolutional networks (TCN), long [...] Read more.
Accurate forecasting of wind and photovoltaic power remains challenging due to the strong nonlinearity, nonstationarity, and seasonal heterogeneity of renewable generation series. To address this issue, this study proposes a hybrid forecasting framework integrating time–frequency joint analysis (TFAA), temporal convolutional networks (TCN), long short-term memory (LSTM), and multi-head self-attention (MHSA). Wavelet transform is used to extract frequency-domain representations, which are jointly encoded with the original time-domain sequence through a dual-branch architecture and adaptively fused. The fused features are then processed by a TCN-LSTM backbone to capture both long-range dependencies and short-term dynamics, while MHSA is introduced to enhance global contextual modeling. Experiments on wind-farm and photovoltaic datasets from China, together with external validation on the NREL WIND Toolkit and the GEFCom2014 Solar benchmark, show that the proposed model achieves the best overall seasonal performance and maintains competitive improvements on public benchmarks. Additional ablation studies, repeated-run statistical validation, persistence-based skill-score analysis, prediction-interval evaluation, ramp-event assessment, meteorological-driver enrichment, permutation-based driver attribution, regime-conditioned error diagnostics, and transferability evidence analysis further confirm the effectiveness, robustness, physical consistency, and practical applicability of the proposed framework. The results indicate that the proposed model provides a reliable and operationally relevant solution for short-term wind and photovoltaic power forecasting. These findings further support sustainable renewable-energy integration, smart-grid dispatch, and low-carbon power-system operation. Full article
Show Figures

Figure 1

15 pages, 1420 KB  
Article
DC-MEPV: Dual-Channel Assisted Music Emotion Perception and Visualization in Acousto-Optic Synergistic Intelligent Cockpits
by Wei Shen, Xingang Mou, Songqing Le, Zhixing Zong and Jiaji Li
Appl. Sci. 2026, 16(8), 3800; https://doi.org/10.3390/app16083800 - 13 Apr 2026
Viewed by 300
Abstract
We propose a Dual-Channel assisted Music Emotion Perception and Visualization (DC-MEPV) framework designed for ambient lighting in intelligent vehicle cockpits, addressing the increasing demand for advanced human–machine interaction in the automotive industry. This framework consists of three main components: the Multi-Scale Feature Extraction [...] Read more.
We propose a Dual-Channel assisted Music Emotion Perception and Visualization (DC-MEPV) framework designed for ambient lighting in intelligent vehicle cockpits, addressing the increasing demand for advanced human–machine interaction in the automotive industry. This framework consists of three main components: the Multi-Scale Feature Extraction Block (MSFEB), the Global Sequence Modeling Block (GSMB), and the Emotional Color Visualization Algorithm (ECV-Algo). The MSFEB extracts valence and arousal (V-A) features from dual channels at multiple temporal scales, with each channel employing a hybrid neural network architecture to capture multi-scale emotional representations. The GSMB integrates positional encoding, bidirectional long short-term memory (BiLSTM) networks, and multi-head self-attention mechanisms to dynamically model global emotional sequences. The ECV algorithm utilizes personalized emotion–color association rules to achieve expressive emotion-driven lighting visualization based on a continuous mapping from emotion space to color space. We conducted comprehensive comparison and ablation experiments to evaluate the model’s emotion perception performance, and designed three metrics to evaluate the quality of the generated visualizations. The model outperformed other networks in both comparative and ablation experiments. Additionally, the generated lights demonstrated strong performance in terms of CIEDE2000 variation rates, unique color ratios, and joint histogram entropy. DC-MEPV achieved excellent performance in emotion perception and visualizations on the DEAM and PMEmo datasets. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

26 pages, 7110 KB  
Article
Research on an Automatic Detection Method for Response Keypoints of Three-Dimensional Targets in Directional Borehole Radar Profiles
by Xiaosong Tang, Maoxuan Xu, Feng Yang, Jialin Liu, Suping Peng and Xu Qiao
Remote Sens. 2026, 18(7), 1102; https://doi.org/10.3390/rs18071102 - 7 Apr 2026
Viewed by 436
Abstract
During the interpretation of Borehole Radar (BHR) B-scan profiles, the accurate determination of the azimuth of geological targets in three-dimensional space is a critical issue for achieving precise anomaly localization and spatial structure inversion. However, existing directional BHR anomaly localization methods exhibit limited [...] Read more.
During the interpretation of Borehole Radar (BHR) B-scan profiles, the accurate determination of the azimuth of geological targets in three-dimensional space is a critical issue for achieving precise anomaly localization and spatial structure inversion. However, existing directional BHR anomaly localization methods exhibit limited intelligence, insufficient adaptability to multi-site data, and weak generalization capability, rendering them inadequate for engineering applications under complex geological conditions. To address these challenges, a robust deep learning model, termed BSS-Pose-BHR, is developed based on YOLOv11n-pose for keypoint detection in directional BHR profiles. The model incorporates three key optimizations: Bi-Level Routing Attention (BRA) replaces Multi-Head Self-Attention (MHSA) in the backbone to improve computational efficiency; Conv_SAMWS enhances keypoint-related feature weighting in the backbone and neck; and Spatial and Channel Reconstruction Convolution (SCConv) is integrated into the detection head to reduce redundancy and strengthen local feature extraction, thereby improving suitability for keypoint detection tasks. In addition, a three-dimensional electromagnetic model of limestone containing a certain density of clay particles is established to construct a simulation dataset. On the simulated test set, compared with current mainstream deep learning approaches and conventional directional borehole radar anomaly localization algorithms, BSS-Pose-BHR achieves superior performance, with an mAP50(B) of 0.9686, an mAP50–95(B) of 0.7712, an mAP50(P) of 0.9951, and an mAP50–95(P) of 0.9952. Ablation experiments demonstrate that each proposed module contributes significantly to performance improvement. Compared with the baseline, BSS-Pose-BHR improves mAP50(B) by 5.39% and mAP50(P) by 0.86%, while increasing model weight by only 1.05 MB, thereby achieving a reasonable trade-off between detection accuracy and complexity. Furthermore, indoor physical model experiments validate the effectiveness of the method on measured data. Robustness experiments under different Peak Signal-to-Noise Ratio (PSNR) conditions and varying missing-trace rates indicate that BSS-Pose-BHR maintains high detection accuracy under moderate noise and data loss, demonstrating strong engineering applicability and practical value. Full article
Show Figures

Figure 1

31 pages, 6459 KB  
Article
Cooperative Hybrid Domain Network for Salient Object Detection in Optical Remote Sensing Images
by Yi Gu, Jianhang Zhou and Lelei Yan
Remote Sens. 2026, 18(7), 1087; https://doi.org/10.3390/rs18071087 - 4 Apr 2026
Viewed by 354
Abstract
Salient Object Detection (SOD) in Optical Remote Sensing Images (ORSIs) aims to localize and segment visually prominent objects amidst complex backgrounds and extreme scale variations. However, we observe that current frequency-aware methods typically rely on a naive feature aggregation paradigm, merging frequency and [...] Read more.
Salient Object Detection (SOD) in Optical Remote Sensing Images (ORSIs) aims to localize and segment visually prominent objects amidst complex backgrounds and extreme scale variations. However, we observe that current frequency-aware methods typically rely on a naive feature aggregation paradigm, merging frequency and spatial features via simple concatenation, addition, or direct combination. This shallow interaction overlooks the inherent semantic misalignment between the two domains, resulting in feature redundancy and poor boundary delineation. To address this limitation, we propose the Cooperative Hybrid Domain Network (CHDNet), a framework designed to facilitate synergistic cooperation between heterogeneous domains. Specifically, we propose the Cross-Domain Multi-Head Self-Attention (CD-MHSA) mechanism as a semantic bridge following the encoder. It employs a dimension expansion strategy to construct a Unified Interaction Manifold and utilizes a Frequency Anchor Interaction mechanism to achieve precise modulation of spatial textures using global spectral cues. Furthermore, to address the dual challenges of lacking explicit interpretation mechanisms for semantic co-occurrence and the susceptibility of topological structures to fracture in complex scenes during the decoding phase, we design a Multi-Branch Cooperative Decoder (MBCD) comprising three parallel paths: edge semantics, global relations, and reverse correction. This module dynamically integrates these heterogeneous clues through a Cooperative Fusion Strategy, combining explicit global dependency modeling with dual-domain reverse mining. Extensive experiments on multiple benchmark datasets demonstrate that the proposed CHDNet achieves performance superior to state-of-the-art (SOTA) methods. Full article
Show Figures

Figure 1

29 pages, 1303 KB  
Article
An Enhanced Traffic Classifier Based on Self-Supervised Feature Learning
by Shaoqing Jiang, Xin Luo, Hongyi Wang, Gang Chen and Hongwei Zhao
Appl. Sci. 2026, 16(7), 3493; https://doi.org/10.3390/app16073493 - 3 Apr 2026
Viewed by 365
Abstract
Encrypted network traffic classification is an important research topic in the field of network security. Although deep learning-based methods have made progress, they still face three main challenges: first, the semantic information in encrypted traffic is inadequately represented, making it difficult for existing [...] Read more.
Encrypted network traffic classification is an important research topic in the field of network security. Although deep learning-based methods have made progress, they still face three main challenges: first, the semantic information in encrypted traffic is inadequately represented, making it difficult for existing methods to effectively capture the hierarchical interaction relationships between packet-level and flow-level features; second, models rely on large amounts of labeled data for supervised training, resulting in high training costs and limited generalization ability in new scenarios; third, in existing self-supervised methods, the functions of the encoder and decoder are coupled, which restricts the full potential of the encoder’s representation learning. To address these issues, this paper proposes an Enhanced Traffic Classifier (ETC) based on self-supervised feature learning. The model first constructs a multi-level interactive traffic representation matrix, converting raw traffic into structured grayscale images that fuse packet-level and flow-level temporal features, thereby addressing the problem of missing semantic information. On this basis, an improved Masked Image Modeling Vision Transformer architecture is adopted. Through a three-stage decoupled design of encoder–regressor–decoder, the encoder focuses solely on feature extraction, the regressor performs masked representation prediction, and the decoder is only responsible for image reconstruction, thereby fully unleashing the encoder’s feature learning capability. Furthermore, during the fine-tuning stage, an Attentive Probing classification mechanism is introduced to replace the traditional linear classification head. By using learnable class query vectors to dynamically focus on semantic regions relevant to the classification target, the model’s recognition accuracy and robustness are further improved. Experiments are conducted on five public datasets, including USTC-TFC2016 and CICIoT2022, as well as a self-built Human-Internet dataset. The results show that ETC significantly outperforms mainstream methods such as YaTC and ET-BERT in core metrics including accuracy and F1-score, while also demonstrating strong generalization in few-shot scenarios. Full article
Show Figures

Figure 1

13 pages, 3260 KB  
Article
Efficient Deep Image Prior with Spatial-Channel Attention Transformer
by Weiwei Lin, Zeqing Zhang, Jin Lin and Ying You
Mathematics 2026, 14(7), 1185; https://doi.org/10.3390/math14071185 - 1 Apr 2026
Viewed by 431
Abstract
The deep image prior (DIP) suggests that it is possible to train a randomly initialized network with a suitable architecture to solve inverse imaging problems by simply optimizing its parameters to reconstruct a single degraded image. However, the prior knowledge exploited by vanilla [...] Read more.
The deep image prior (DIP) suggests that it is possible to train a randomly initialized network with a suitable architecture to solve inverse imaging problems by simply optimizing its parameters to reconstruct a single degraded image. However, the prior knowledge exploited by vanilla DIP relies on basic local convolutions, which inevitably limits the performance of inverse imaging tasks to the generative capacity of the model. Furthermore, image information is often not only related to neighboring pixels but also dependent on global color features and spatial distribution. Simple local convolutions used in inverse imaging cannot capture precise fine-grained details. Moreover, DIP is an unsupervised process but requires iterations to learn inverse imaging, consuming computational power and limiting the adaptation of global attention. To solve these problems, this article explores an efficient global prior module—a tri-directional multi-head self-attention mechanism—aiming to learn pixel-wise correlations along three directions: horizontal, vertical, and channel-wise. Our observations found that global learning can effectively enhance the detail information of edge pixels, making images more vivid and textures clearer. In addition, tri-directional multi-head self-attention can efficiently replace the global perception ability of pixel-level self-attention. Finally, we demonstrate that global learning can effectively improve the imaging effect of inverse imaging problems and enhance the information of texture edge pixels. Moreover, tri-directional multi-head self-attention can effectively alleviate the computation redundancy of pixel-level self-attention, thus achieving efficient and high-quality inverse imaging tasks. The principle of this method lies in global feature capture and efficient attention modeling, striking a balance between detail fidelity and computational practicality. Full article
Show Figures

Figure 1

28 pages, 4366 KB  
Article
Temporal Transformer with Conditional Tabular GAN for Credit Card Fraud Detection: A Sequential Deep Learning Approach
by Jiaying Chen, Yiwen Liang, Jingyi Liu and Mengjie Zhou
Mathematics 2026, 14(7), 1183; https://doi.org/10.3390/math14071183 - 1 Apr 2026
Viewed by 739
Abstract
Credit card fraud detection remains a critical challenge in financial security, characterized by severe class imbalance and the need to capture complex temporal patterns in transaction sequences. Traditional machine learning approaches treat transactions as independent events, failing to model the sequential nature of [...] Read more.
Credit card fraud detection remains a critical challenge in financial security, characterized by severe class imbalance and the need to capture complex temporal patterns in transaction sequences. Traditional machine learning approaches treat transactions as independent events, failing to model the sequential nature of user behavior and suffering from inadequate handling of minority class samples. In this paper, we propose an integrated framework that combines generative modeling and time-aware sequential learning for credit card fraud detection. Our approach addresses two fundamental limitations: (1) we model transaction histories as temporal sequences using a Transformer-based architecture that captures both long-term dependencies and abrupt behavioral changes through multi-head self-attention mechanisms, and (2) we employ CTGAN to generate high-quality synthetic fraudulent samples, providing more effective oversampling than conventional techniques like SMOTE. The Time-Aware Transformer incorporates temporal encoding and position-aware attention to preserve transaction order and time intervals, while CTGAN learns the complex conditional distributions of fraudulent transactions to produce realistic synthetic samples. We evaluate our framework on the IEEE-CIS Fraud Detection dataset, demonstrating significant improvements over representative classical and sequential deep-learning baselines. Experimental results show that our method achieves superior performance with an AUC-ROC of 0.982, precision of 0.891, recall of 0.876, and F1-score of 0.883, outperforming the representative baselines considered in this study, including traditional machine learning models, standalone deep learning architectures, and supervised sequential neural models. Ablation studies confirm the individual contributions of both the sequential modeling component and the generative oversampling strategy. Our work demonstrates that combining temporal sequence modeling with generative synthesis provides a robust solution for imbalanced fraud detection, with potential applications extending to other domains requiring sequential pattern recognition under extreme class imbalance. Full article
Show Figures

Figure 1

24 pages, 3985 KB  
Article
A Transformer-Based Variational Autoencoder for Training Data Generation in Spindle Motor Vibration-Based Anomaly Detection
by Jaeyoung Kim and Youngbae Hwang
Sensors 2026, 26(7), 2176; https://doi.org/10.3390/s26072176 - 31 Mar 2026
Viewed by 407
Abstract
In high-speed spindle motors operating above 10,000 rpm, vibration analysis is essential for detecting mechanical anomalies. However, data scarcity and imbalance, especially for rare fault conditions, limit the performance of deep learning-based anomaly detection models. In this study, we define sample scarcity as [...] Read more.
In high-speed spindle motors operating above 10,000 rpm, vibration analysis is essential for detecting mechanical anomalies. However, data scarcity and imbalance, especially for rare fault conditions, limit the performance of deep learning-based anomaly detection models. In this study, we define sample scarcity as the limited availability of real labeled vibration sequences for model training, i.e., only 5000 normal and 5000 faulty samples collected from three spindle motors (10,000 real samples in total). We propose a Transformer-based Variational Autoencoder (T-VAE) to generate realistic triaxial acceleration sequences for spindle motor health monitoring. The model integrates positional encoding and multi-head self-attention to capture long-range temporal dependencies in multivariate time-series data, and applies a KL annealing strategy to improve training stability. Using 5000 normal and 5000 faulty vibration samples collected from three spindle motors, the model generates 100,000 synthetic samples per class, which are used to augment training for a downstream CNN–LSTM classifier. Without augmentation, the classifier achieved 95.73% pass detection on normal samples and 81.40% fail detection on faulty samples. After augmentation with Transformer-VAE, performance increased to 98.07% pass detection for normal data and 97.99% fail detection for faulty data. For prediction, we evaluate on an independent dataset of 25,000 normal and 25,000 faulty sequences obtained from eleven different spindle motors not used in training (cross-spindle). The results demonstrate that the T-VAE effectively alleviates the data scarcity problem and significantly improves anomaly detection accuracy for high-speed spindle motor vibration signals. This approach can be directly applied to predictive maintenance systems in real-world manufacturing environments. Full article
Show Figures

Figure 1

Back to TopTop