Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (11,490)

Search Parameters:
Keywords = feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 1594 KB  
Article
RMP-YOLO: Robust Multi-Scale Pedestrian Detection for Dense Scenarios
by Chenyang Gui, Zhangyu Fan, Taibin Duan and Junhao Wen
Sensors 2026, 26(9), 2621; https://doi.org/10.3390/s26092621 - 23 Apr 2026
Abstract
With the rapid advancement of autonomous driving in modern society, dense pedestrian detection technology has encountered performance bottlenecks. To address this, we propose a robust and lightweight pedestrian detection algorithm, RMP-YOLO, designed to efficiently detect small, occluded, and low-light objects. Firstly, RFAConv is [...] Read more.
With the rapid advancement of autonomous driving in modern society, dense pedestrian detection technology has encountered performance bottlenecks. To address this, we propose a robust and lightweight pedestrian detection algorithm, RMP-YOLO, designed to efficiently detect small, occluded, and low-light objects. Firstly, RFAConv is utilized as the core component of the backbone network, combining standard convolution with attention mechanisms and using group convolution to extract features from the spatial receptive field. Secondly, MobileViTv3 is introduced into the backbone to combine CNNs with Transformers. The model is further enhanced by adjusting feature fusion, introducing residual connections, and optimizing local representation with deep convolutional layers. Finally, the PIoUv2 loss function is employed for bounding-box regression, significantly reducing detection errors for small-scale pedestrians in crowded environments. Experimental results demonstrate that RMP-YOLO improves mAP@0.5 by 1.3% on a custom dataset and 0.91% on the WiderPerson dataset. Crucially, it maintains high efficiency with only 3.71 million parameters and 6.29 GFLOPs, meeting the deployment requirements for low computational power and high precision. Full article
(This article belongs to the Section Sensing and Imaging)
28 pages, 10821 KB  
Article
RadarsBEV: A Joint Multi-Radar Fusion and Target Detection Network via Gaussian Attention in Arbitrary Configurations
by Zuyuan Guo, Wujun Li, Guoxin Zhang, Hongfu Li, Jiesong He, Kah Chan Teh and Wei Yi
Remote Sens. 2026, 18(9), 1290; https://doi.org/10.3390/rs18091290 - 23 Apr 2026
Abstract
Multi-radar fusion is fundamental for robust, all-weather perception for diverse applications. However, current fusion paradigms face structural and computational bottlenecks. Traditional statistical frameworks suffer from an explosion of dimensional calculation, where computational complexity scales with the number of active sensor nodes. Concurrently, existing [...] Read more.
Multi-radar fusion is fundamental for robust, all-weather perception for diverse applications. However, current fusion paradigms face structural and computational bottlenecks. Traditional statistical frameworks suffer from an explosion of dimensional calculation, where computational complexity scales with the number of active sensor nodes. Concurrently, existing statistical and deep learning fusion models exhibit systemic brittleness; their rigid topological binding to predefined sensor counts leads to a drop in performance during sensor dropouts. Furthermore, generic attention mechanisms suffer a phenomenological mismatch with radar signals, neglecting the spatial features of radar targets and leading to false alarms. To overcome these limitations, we propose RadarsBEV, a scalable end-to-end multi-radar detection framework. By decoupling per-sensor feature extraction from the central spatial fusion process, RadarsBEV achieves permutation invariance. This design breaks the scalability limit and enables graceful degradation utilizing residual nodes without system downtime. Crucially, we introduce a physics-aware Gaussian cross-attention mechanism. By guiding sparse feature sampling through predicted two-dimensional Gaussian target geometry, this mechanism decouples attention weights from clutter signal. Extensive experiments on high-fidelity simulations and real-world datasets demonstrate that RadarsBEV achieves better detection performance. Notably, the framework exhibits robust configuration zero-shot generalization, adapting to entirely unseen spatial layouts and degraded operational environments without fine-tuning. Full article
Show Figures

Figure 1

10 pages, 418 KB  
Article
Empirical Analysis of Internal Hallucination Detection in Quantized LLMs: Layer Dynamics and White-Box Benchmarks
by Haohua Liu and Jinli Xu
Electronics 2026, 15(9), 1802; https://doi.org/10.3390/electronics15091802 - 23 Apr 2026
Abstract
As large language models (LLMs) move onto resource-constrained devices, maintaining factual reliability without adding another expensive decoding pass becomes a practical inference problem. Instead of introducing another complex hallucination detector, this paper presents an empirical study of which low-cost white-box features remain useful [...] Read more.
As large language models (LLMs) move onto resource-constrained devices, maintaining factual reliability without adding another expensive decoding pass becomes a practical inference problem. Instead of introducing another complex hallucination detector, this paper presents an empirical study of which low-cost white-box features remain useful under a controlled single-pass benchmark. Across repeated candidate-answer reruns on Qwen2.5-1.5B-Instruct and Llama-3.2-1B-Instruct, truthful and incorrect internal states are most separable in the middle-to-late layers, with the peak consistently falling at 50–70% of total network depth across both model families. The depth-relative pattern is more stable than any single detector ranking: simple residual-space baselines, including Mahalanobis scoring, remain competitive with more elaborate residual-plus-spectral fusion features under the same protocol, although detector ranking still changes by task. A separate preliminary two-seed Qwen2.5-7B-Instruct BF16 probe under that same white-box benchmark reproduces the same middle-to-late peak, and auxiliary Int8 checks on Qwen2.5-1.5B and Qwen2.5-7B remain consistent with that same localization under moderate quantization. Taken together, the results point away from detector complexity and toward a more reproducible question of where hallucination cues emerge, which internal statistics remain reliable, and how cautiously such conclusions should be transferred to deployment settings. Full article
Show Figures

Figure 1

23 pages, 2440 KB  
Article
Detection of Small Debonding Defects in Metal–Rubber Bonded Structures Using an Enhanced EMAT and Multi-Feature Fusion Imaging
by Yang Fang, Xiaokai Wang, Yinqiang Qu, Hongen Chen and Zhenmao Chen
Sensors 2026, 26(9), 2617; https://doi.org/10.3390/s26092617 - 23 Apr 2026
Abstract
To improve the low sensitivity of electromagnetic acoustic testing (EMAT) to micro-debonding defects in metal–rubber bonded structures, this study proposes a detection framework combining a magnetic-field-enhanced focusing EMAT with entropy-weighted multi-feature fusion imaging. First, a Halbach-type focusing magnet was designed and evaluated through [...] Read more.
To improve the low sensitivity of electromagnetic acoustic testing (EMAT) to micro-debonding defects in metal–rubber bonded structures, this study proposes a detection framework combining a magnetic-field-enhanced focusing EMAT with entropy-weighted multi-feature fusion imaging. First, a Halbach-type focusing magnet was designed and evaluated through finite element simulations, showing a substantial enhancement of the effective bias magnetic field in the working region. Then, three complementary echo features, namely amplitude (AMP), time-domain integral (TDI), and power spectral density (PSD), were extracted from the acquired resonance signals and integrated using an adaptive entropy-weighted fusion strategy. Comparative and ablation analyses were further conducted to distinguish the respective contributions of probe enhancement and feature fusion, and to compare entropy-weighted fusion with single-feature imaging and equal-weight fusion. The results indicate that the focused probe mainly improves the defect-response strength at the hardware level, whereas feature fusion mainly improves image contrast, background suppression, and segmentation consistency at the image level. Among the compared methods and under the present experimental conditions, entropy-weighted fusion provides the best overall imaging performance. Under the present experimental conditions, the proposed framework enables reliable detection of 5 mm debonding defects in aluminum-alloy–rubber bonded specimens and 10 mm debonding defects in titanium-alloy–rubber bonded specimens. These results suggest that the combined use of magnetic-field focusing and adaptive multi-feature fusion is a promising approach for the detection and quantitative characterization of micro-debonding defects in metal–rubber bonded structures. Full article
(This article belongs to the Special Issue Electromagnetic Non-Destructive Testing and Evaluation: 2nd Edition)
21 pages, 4522 KB  
Article
An Adaptive Multi-Sensor Fusion Method with Skip Fusion and Dual Convolution for Bearing Fault Diagnosis
by Guoyong Wang, Qilin Zhang and Zhihang Ji
Electronics 2026, 15(9), 1799; https://doi.org/10.3390/electronics15091799 - 23 Apr 2026
Abstract
To improve the feature representation and cross-condition generalization of bearing fault diagnosis, this paper proposes an adaptive multi-sensor fusion network with a skip fusion module and a parameter-efficient dual-convolution diagnosis block. The vibration and current signals are first augmented by overlapping segmentation and [...] Read more.
To improve the feature representation and cross-condition generalization of bearing fault diagnosis, this paper proposes an adaptive multi-sensor fusion network with a skip fusion module and a parameter-efficient dual-convolution diagnosis block. The vibration and current signals are first augmented by overlapping segmentation and transformed into the frequency domain using FFT. Multi-scale depthwise convolutions are then employed in parallel branches to capture fault patterns at different receptive fields, and an attention-based skip fusion mechanism selectively aggregates cross-sensor features for complementary enhancement. After fusion, self-calibrated convolution and dilated convolution are alternately applied to strengthen discriminative representation without increasing model complexity. Experiments on multiple bearing datasets under both constant and variable operating conditions demonstrate that the proposed method achieves consistently higher accuracy and robustness than representative CNN-based baselines, verifying its effectiveness for practical bearing fault diagnosis. Full article
Show Figures

Figure 1

19 pages, 1577 KB  
Article
End-to-End Learnable Recurrence Plot for Sleep Stage Classification Using Non-Contact Ballistocardiography
by Jiseong Jeong and Sunyong Yoo
Electronics 2026, 15(9), 1798; https://doi.org/10.3390/electronics15091798 - 23 Apr 2026
Abstract
Accurate sleep stage classification is essential for evaluating sleep quality, yet clinical polysomnography is impractical for continuous home-based monitoring. Ballistocardiography (BCG) enables unobtrusive sleep monitoring through sensors embedded in sleep furniture; however, existing BCG-based approaches either rely on complex physiological feature extraction or [...] Read more.
Accurate sleep stage classification is essential for evaluating sleep quality, yet clinical polysomnography is impractical for continuous home-based monitoring. Ballistocardiography (BCG) enables unobtrusive sleep monitoring through sensors embedded in sleep furniture; however, existing BCG-based approaches either rely on complex physiological feature extraction or employ fixed-parameter signal-to-image transformations that cannot adapt to inter-subject variability. This study proposes a learnable recurrence plot (RP) framework for three-stage sleep classification (Wake, NREM, REM) from single-channel BCG signals. The Learnable RP introduces three innovations: multi-scale phase-space reconstruction at physiologically motivated time delays (τ = 5, 10, 20), differentiable per-scale thresholds optimized end-to-end, and attention-based spatial fusion of multi-scale recurrence maps. The framework was evaluated through 10-fold stratified cross-validation across six backbone architectures using 50 overnight recordings. The Learnable RP consistently outperformed four baseline transformation methods (GAF, MTF, Classical RP, Modified RP), achieving an aggregate mean accuracy of 73.60%, with EfficientNet-B5 reaching 78.91%. and 78.91%. Statistical validation across all 24 pairwise comparisons (4 baselines × 6 backbones) confirmed consistent superiority (all p < 0.001). The proposed framework achieves competitive performance without explicit physiological feature engineering, offering a viable path toward end-to-end unobtrusive sleep monitoring. Full article
(This article belongs to the Section Bioelectronics)
23 pages, 8014 KB  
Article
MSW-Mamba-Det: Multi-Scale Windowed State-Space Modeling for End-to-End Defect Detection in Photovoltaic Module Electroluminescence Images
by Xiaofeng Wang, Haojie Hu, Xiao Hao and Weiguang Ma
Sensors 2026, 26(9), 2616; https://doi.org/10.3390/s26092616 - 23 Apr 2026
Abstract
Electroluminescence (EL) imaging is widely used for photovoltaic (PV) module inspection, yet EL defect detection remains challenging due to the need for high-resolution inputs, low-contrast defects, and strong structured background patterns. To address these issues, we propose MSW-Mamba-Det, an end-to-end defect detection framework [...] Read more.
Electroluminescence (EL) imaging is widely used for photovoltaic (PV) module inspection, yet EL defect detection remains challenging due to the need for high-resolution inputs, low-contrast defects, and strong structured background patterns. To address these issues, we propose MSW-Mamba-Det, an end-to-end defect detection framework built on RT-DETR, comprising three components. (1) MSW-Mamba, a multi-scale windowed state-space module, adopts a Local/Stripe/Grid architecture to jointly model fine details and long-range dependencies; the Stripe branch strengthens directional continuity for elongated defects, while the Grid branch introduces coarse global context to improve cross-region consistency. Saliency- and gradient-guided gating is further used to suppress background-induced false responses. (2) DetailAware compensates for detail attenuation by restoring high-frequency textures and edges through multi-scale local enhancement, and applies pixel-wise adaptive gating to integrate global semantics and mitigate smoothing effects in deep representations. (3) PAFB (Pyramid Attention Fusion Block) aligns adjacent-scale features and improves multi-scale fusion, enhancing localization stability across defect sizes. Experiments on two public EL datasets show that MSW-Mamba-Det achieves AP50:95 of 60.4% on PV-Multi-Defect-main and 68.0% on PVEL-AD, improving over RT-DETR by 2.5 points (from 57.9% to 60.4%) and 2.2 points (from 65.8% to 68.0%), respectively. MSW-Mamba-Det also outperforms 12 representative baselines, including CNN-, Transformer-, and recent YOLO-based models, in AP50:95 on both datasets, with particularly strong performance on medium and large defects. These results demonstrate the effectiveness of the proposed modules for robust PV EL defect inspection under low-contrast and structured-background conditions. Full article
(This article belongs to the Section Sensing and Imaging)
25 pages, 1701 KB  
Article
Concrete Crack Detection in Extremely Dark Environments Based on Infrared-Visible Multi-Level Registration Fusion and Frequency Decoupling
by Zixiang Li, Weishuai Xie and Bingquan Xiang
Sensors 2026, 26(9), 2612; https://doi.org/10.3390/s26092612 - 23 Apr 2026
Abstract
To address the issues of difficult heterogeneous image registration and low segmentation accuracy caused by the severe lack of illumination and significant modal differences in concrete cracks in extremely dark environments, this paper proposes a two-stage processing framework of registration–fusion first, and decoupling–segmentation [...] Read more.
To address the issues of difficult heterogeneous image registration and low segmentation accuracy caused by the severe lack of illumination and significant modal differences in concrete cracks in extremely dark environments, this paper proposes a two-stage processing framework of registration–fusion first, and decoupling–segmentation later. In the registration and fusion stage, a registration algorithm based on morphological priors and multi-level quadtree spatial constraints is designed. This approach transforms the problem from pixel grayscale matching to spatial topological matching, achieving a feature fusion of high infrared saliency and high visible light sharpness. In the segmentation stage, a Latent Frequency-Decoupled Topological Network (LFDT-Net) is proposed. It utilizes Discrete Wavelet Transform (DWT) to achieve high-fidelity frequency decoupling of the low-frequency infrared backbone and the high-frequency visible light edges. Furthermore, a Cross-Frequency Guidance Module is utilized to eliminate double-edged artifacts, and a skeleton-aware topological loss function is introduced to constrain the topological integrity of the cracks. Experimental results on a self-built heterogeneous multi-modal crack dataset demonstrate that the proposed method significantly outperforms existing mainstream methods in registration accuracy, fusion quality, and segmentation accuracy. Achieving a mean Intersection over Union (mIoU) of 81.7%, the method effectively suppresses background noise in dark environments and precisely restores the microscopic edges and continuous topological structures of faint cracks. Full article
(This article belongs to the Special Issue AI-Based Visual Sensing for Object Detection)
16 pages, 2889 KB  
Article
Uncertainty-Aware Probabilistic Fusion Post-Processing for Continuous Wrist Motion Estimation in Myoelectric Control
by Sheng Feng, Guangyong Xu and Yinglin Li
Sensors 2026, 26(9), 2614; https://doi.org/10.3390/s26092614 - 23 Apr 2026
Abstract
Continuous wrist angle estimation based on surface electromyography (sEMG) is often affected by signal variability and prediction instability. Although regression models provide instantaneous outputs, their predictions may exhibit temporal fluctuations and limited robustness due to the non-stationary nature of sEMG signals. To address [...] Read more.
Continuous wrist angle estimation based on surface electromyography (sEMG) is often affected by signal variability and prediction instability. Although regression models provide instantaneous outputs, their predictions may exhibit temporal fluctuations and limited robustness due to the non-stationary nature of sEMG signals. To address this issue, we propose an uncertainty-aware probabilistic fusion post-processing framework for continuous wrist motion estimation. The proposed approach decouples regression and uncertainty modeling, enabling plug-in compatibility with feature-based regression models. A local Gaussian process regression (LGPR) model is employed to estimate predictive uncertainty from a sliding feature window. The instantaneous regression output is then fused with the LGPR prediction through a Bayesian-inspired Gaussian formulation, resulting in a closed-form adaptive gain that dynamically adjusts smoothing strength according to predictive variance. Experimental results from both open-loop wrist joint motion estimation and closed-loop myoelectric control tasks demonstrate that our method outperforms existing methods in key performance indicators, including task completion time, trajectory smoothness, and trajectory tracking error. Full article
(This article belongs to the Section Sensors and Robotics)
25 pages, 7920 KB  
Article
MBA-Former: A Boundary-Aware Transformer for Synergistic Multi-Modal Representation in Pine Wilt Disease Detection from High-Resolution Satellite Imagery
by Rui Hou, Yantao Zhou, Ying Wang, Zhiquan Huang, Jing Yao, Quanjun Jiao, Wenjiang Huang and Biyao Zhang
Forests 2026, 17(5), 517; https://doi.org/10.3390/f17050517 (registering DOI) - 23 Apr 2026
Abstract
Pine wilt disease (PWD) is a devastating biological forest disturbance, making its large-scale and high-precision remote sensing monitoring crucial for epidemic prevention and control. However, the performance of existing deep learning methods in high-resolution imagery is often limited by the confusion of spectral [...] Read more.
Pine wilt disease (PWD) is a devastating biological forest disturbance, making its large-scale and high-precision remote sensing monitoring crucial for epidemic prevention and control. However, the performance of existing deep learning methods in high-resolution imagery is often limited by the confusion of spectral features among disparate ground objects and the complexity of forest boundaries. To address these challenges, this study proposes an innovative, end-to-end deep learning architecture termed MBA-Former. Built upon the robust Swin Transformer V2 backbone, the model systematically integrates two highly adaptable functional modules: (1) a front-end intelligent fusion module designed to adaptively fuse heterogeneous features, and (2) a back-end boundary refinement module that refines segmentation contours via dual-task learning. To train and evaluate the model, fine-grained manual annotations were first performed on Gaofen-2 satellite imagery acquired from multiple typical epidemic areas across northern and southern China. Information-enhanced datasets were constructed by fusing the original spectral bands, typical vegetation indices, and texture features. A comprehensive performance evaluation was then conducted, specifically targeting typical challenging scenarios characterized by complex ground object boundaries. The experimental results demonstrate that the Multi-modal Boundary-Aware Transformer (MBA-Former) significantly outperforms current state-of-the-art models. It achieved a mean Intersection over Union (mIoU) of 81.74%, an IoU of 77.58% for the most critical infected tree category, and a Boundary F1-Score of 78.62%. Compared to the best-performing baseline model, Swin-Unet, these three metrics exhibited notable improvements of 2.88%, 3.55%, and 4.46%, respectively. These findings convincingly demonstrate that MBA-Former provides a highly accurate and robust solution for the large-scale, automated remote sensing monitoring of forest diseases, offering immense value in preventing significant economic losses and preserving forest ecosystem integrity. Full article
Show Figures

Figure 1

33 pages, 31971 KB  
Article
A Feature-Optimized Deep Learning Framework for Mapping and Spatial Characterization of Tea Plantations in Complex Mountain Landscapes
by Ruyi Wang, Jixian Zhang, Xiaoping Lu, Qi Kang, Bowen Chi, Junfeng Li, Yahang Li and Zhengfang Lou
Remote Sens. 2026, 18(9), 1281; https://doi.org/10.3390/rs18091281 - 23 Apr 2026
Abstract
The unchecked expansion of tea plantations onto steep, forest-adjacent slopes in subtropical mountains engenders a conflict between agricultural productivity and ecosystem integrity, particularly by exacerbating habitat fragmentation and soil erosion. While precise monitoring is essential to navigate this trade-off for sustainable management, accurate [...] Read more.
The unchecked expansion of tea plantations onto steep, forest-adjacent slopes in subtropical mountains engenders a conflict between agricultural productivity and ecosystem integrity, particularly by exacerbating habitat fragmentation and soil erosion. While precise monitoring is essential to navigate this trade-off for sustainable management, accurate inventorying remains a challenge due to the plantations’ strong phenological variability, heterogeneous canopy structures, and high spectral confusion with surrounding vegetation. This study proposes a feature-optimized deep learning framework for mapping and characterizing tea plantations in complex landscapes, using Xinyang City, China, as a study area. The framework integrates multi-temporal Sentinel-1/2 observations with a sequential Jeffries-Matusita (JM)-Pearson feature filtering strategy. This approach effectively condenses a 132-variable high-dimensional pool (including optical spectra, vegetation indices, textures, and SAR polarimetry) into a compact 28-feature subset (a 78.8% reduction), preserving critical phenological and structural cues while minimizing redundancy. These optimized predictors drive a hybrid VGG16–UNet++ segmentation network, which couples transfer-learning-based semantic encoding with detail-preserving dense skip fusion. Extensive experiments across 18 model–feature configurations demonstrate that the optimal setting achieves an Overall Accuracy of 97.82%, an F1-score of 0.9093, and a mean IoU of 0.7968. Notably, the method significantly reduces misclassification in rugged, cloud-prone terrain, yielding a User’s Accuracy of 91.14% for tea. Based on the generated wall-to-wall map, we derived two decision-support indicators: multi-threshold steep-slope exposure and a normalized tea–forest interface density. This framework provides actionable, high-precision spatial products to support slope-based zoning, ecological restoration, and sustainable management in fragile mountain agroforestry systems. Full article
33 pages, 17932 KB  
Article
Early Detection of Aggressive Human Behavior in Video Streams Using Deep Spatiotemporal Models
by Aida Issembayeva, Anargul Shaushenova, Ardak Nurpeisova, Aidar Ispussinov, Buldyryk Suleimenova, Anargul Bekenova, Aliya Satybaldieva, Aigul Zholmukhanova and Galiya Mauina
Computers 2026, 15(5), 267; https://doi.org/10.3390/computers15050267 - 23 Apr 2026
Abstract
In this paper, we propose a spatiotemporal approach for binary classification of violent and non-violent behavior in real-world settings. The experimental pipeline includes video preprocessing, stratified data splitting, generation of temporally structured clips, and comparative evaluation of baseline models, including a convolutional neural [...] Read more.
In this paper, we propose a spatiotemporal approach for binary classification of violent and non-violent behavior in real-world settings. The experimental pipeline includes video preprocessing, stratified data splitting, generation of temporally structured clips, and comparative evaluation of baseline models, including a convolutional neural network. We also developed a Residual Adaptive Motion Temporal Binary Heat Network model that combines frame color characteristics, residual motion descriptions, temporal feature fusion, an early risk assessment mechanism, and interpretable localization maps. Experiments were conducted on a balanced dataset of 2000 video clips. The proposed model demonstrated the best early warning performance: a supervision rate of 0.6, an F1 score of 0.9527, and a balanced accuracy of 0.9533. With full supervision, the F1 score was 0.9342, and the area under the receiver operating characteristic curve (AUC) was 0.9871. The practical significance of the work is that the proposed approach can be used as a decision support tool for the preliminary identification of potentially dangerous video fragments with subsequent manual verification, without the assumption of autonomous use in high-risk scenarios. Full article
(This article belongs to the Special Issue Deep Learning and Explainable Artificial Intelligence (2nd Edition))
Show Figures

Figure 1

26 pages, 8883 KB  
Article
Strip Steel Defect Detection Algorithm Integrating Dynamic Convolution and Attention
by Changchun Shao, Zhijie Chen and Jianjun Meng
Electronics 2026, 15(9), 1796; https://doi.org/10.3390/electronics15091796 - 23 Apr 2026
Abstract
To address the issues of low accuracy, high false positives, and missed detections in hot-rolled strip steel surface defect inspection, this paper proposes an improved detection model named DFEM-NET based on YOLOv8n. First, an efficient feature extraction module (DSC2f) based on Dynamic Snake [...] Read more.
To address the issues of low accuracy, high false positives, and missed detections in hot-rolled strip steel surface defect inspection, this paper proposes an improved detection model named DFEM-NET based on YOLOv8n. First, an efficient feature extraction module (DSC2f) based on Dynamic Snake Convolution is designed to enhance the model’s capability in capturing features of irregular and elongated defects. Second, a Feature Pyramid Shared Convolution module (FPSC) is constructed to expand the model’s receptive field and effectively suppress interference from complex backgrounds. Third, an Enhanced Feature Correction (EFC) strategy is adopted during the feature fusion stage to help the model better learn the detailed features of small defect targets. Finally, a Multi-Scale Attention Aggregation module (MSAA) is introduced before the detection head, enabling the network to focus on critical feature information and thereby comprehensively improve detection accuracy for target defects. Experimental results demonstrate that, compared to the baseline model YOLOv8n, DFEM-NET achieves a detection accuracy (mAP@0.5) of 83.5%, representing an increase of 4.8%; a recall rate of 76.4%, an increase of 3.3%; and a precision of 84.7%, an increase of 3.1%, without a significant increase in model complexity. Furthermore, generalization experiments conducted on the GC10-DET dataset confirm that the proposed algorithm exhibits exceptional generalization capability. Full article
25 pages, 1763 KB  
Article
Self-Supervision-Enabled Compounded Multi-Modal Feature-Learning Network for Classifying Depressive States with Fine-Grained Emotions Using Wearable Sensors
by Bhavani Ravi, Ibrahim Aljubayri, Usharani Thirunavukkarasu and Mohammad Zubair Khan
Biosensors 2026, 16(5), 233; https://doi.org/10.3390/bios16050233 - 23 Apr 2026
Abstract
Depression is a prevalent mental health disorder characterized by persistent sadness, loss of interest, and impaired daily functioning. Wearable monitoring systems have emerged as promising tools for continuous mental health assessment; however, they face challenges such as data privacy concerns, misclassification risks, and [...] Read more.
Depression is a prevalent mental health disorder characterized by persistent sadness, loss of interest, and impaired daily functioning. Wearable monitoring systems have emerged as promising tools for continuous mental health assessment; however, they face challenges such as data privacy concerns, misclassification risks, and limited ability to capture complex emotional states. To address these limitations, this study proposes a Self-Supervision-Enabled Compounded Multi-Modal Feature-Learning Network (S2-CFL) for depressive state classification using wearable sensor data and psychological self-reports. The framework integrates a Twin-Path Encoder–Decoder Network (TP-EDN) for extracting temporal features from raw signals and a Densely Connected Convolution Pyramidal Transformer Network (DC2-PTN) for learning spatial representations from signal-to-image transformations. A fusion mechanism combines multi-modal features to predict depressive states, valence, and arousal levels, while a Fine-Grained Emotion Classification Network (FGECN) is employed to categorize emotional states into multiple classes using supervised learning models. Experimental results demonstrate that the proposed multi-modal approach improves classification performance and provides interpretable insights into emotional and depressive patterns. Full article
(This article belongs to the Section Wearable Biosensors)
22 pages, 5140 KB  
Article
Application of Deep Multi-Scale Representation Learning Based on Eye-Tracking and Facial Expression Data in Cognitive Decline Assessment
by Yanfeng Xue, Xianpeng Luo, Shuai Guo and Tao Song
Sensors 2026, 26(9), 2600; https://doi.org/10.3390/s26092600 - 23 Apr 2026
Abstract
Digital biomarkers derived from eye-tracking and facial expression hold significant potential for the non-invasive screening of cognitive decline (CD). However, existing approaches predominantly rely on single-task or feature engineering-based unimodal methods, which struggle to capture complex temporal behavioral patterns. While deep learning (DL) [...] Read more.
Digital biomarkers derived from eye-tracking and facial expression hold significant potential for the non-invasive screening of cognitive decline (CD). However, existing approaches predominantly rely on single-task or feature engineering-based unimodal methods, which struggle to capture complex temporal behavioral patterns. While deep learning (DL) excels at extracting hierarchical features and intricate temporal dynamics from behavioral sequences, its application in this specific multimodal sensing domain remains exploratory. Addressing this gap, this study designed an assessment system integrating five multi-dimensional cognitive paradigms and collected eye-tracking and facial expression data from 20 healthy controls (HC) and 20 individuals with CD. For these multimodal sequences, we propose a deep neural network capable of multi-scale representation learning. By utilizing subspace exploration and multi-scale convolutions, this architecture extracts deep representations directly from data and incorporates a decision fusion mechanism to enhance diagnostic robustness. Experimental results demonstrate that our method achieves a 90% classification accuracy, outperforming machine learning models. Furthermore, statistical analyses conducted in this study validated several features associated with CD and also explored some novel potential behavioral patterns. This study confirms the feasibility of a DL framework based on eye-tracking and facial expression signals for identifying CD, providing a reference for developing objective and efficient digital screening tools. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Back to TopTop