Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (42)

Search Parameters:
Keywords = split-attention block

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 3115 KB  
Article
HST–MB–CREH: A Hybrid Spatio-Temporal Transformer with Multi-Branch CNN/RNN for Rare-Event-Aware PV Power Forecasting
by Guldana Taganova, Jamalbek Tussupov, Assel Abdildayeva, Mira Kaldarova, Alfiya Kazi, Ronald Cowie Simpson, Alma Zakirova and Bakhyt Nurbekov
Algorithms 2026, 19(2), 94; https://doi.org/10.3390/a19020094 - 23 Jan 2026
Viewed by 169
Abstract
We propose the Hybrid Spatio-Temporal Transformer with Multi-Branch CNN/RNN and Extreme-Event Head (HST–MB–CREH), a hybrid spatio-temporal deep learning architecture for joint short-term photovoltaic (PV) power forecasting and the detection of rare extreme events, to support the reliable operation of renewable-rich power systems. The [...] Read more.
We propose the Hybrid Spatio-Temporal Transformer with Multi-Branch CNN/RNN and Extreme-Event Head (HST–MB–CREH), a hybrid spatio-temporal deep learning architecture for joint short-term photovoltaic (PV) power forecasting and the detection of rare extreme events, to support the reliable operation of renewable-rich power systems. The model combines a spatio-temporal transformer encoder with three convolutional neural network (CNN)/recurrent neural network (RNN) branches (CNN → long short-term memory (LSTM), LSTM → gated recurrent unit (GRU), CNN → GRU) and a dense pathway for tabular meteorological and calendar features. A multitask output head simultaneously performs the regression of PV power and binary classification of extremes defined above the 95th percentile. We evaluate HST–MB–CREH on the publicly available Renewable Power Generation and Weather Conditions dataset with hourly resolutions from 2017 to 2022, using a 5-fold TimeSeriesSplit protocol to avoid temporal leakage and to cover multiple seasons. Compared with tree ensembles (RandomForest, XGBoost), recurrent baselines (Stacked GRU, LSTM), and advanced hybrid/transformer models (Hybrid Multi-Branch CNN–LSTM/GRU with Dense Path and Extreme-Event Head (HMB–CLED) and Spatio-Temporal Multitask Transformer with Extreme-Event Head (STM–EEH)), the proposed architecture achieves the best overall trade-off between accuracy and rare-event sensitivity, with normalized performance of RMSE_z = 0.2159 ± 0.0167, MAE_z = 0.1100 ± 0.0085, mean absolute percentage error (MAPE) = 9.17 ± 0.45%, R2 = 0.9534 ± 0.0072, and AUC_ext = 0.9851 ± 0.0051 across folds. Knowledge extraction is supported via attention-based analysis and permutation feature importance, which highlight the dominant role of global horizontal irradiance, diurnal harmonics, and solar geometry features. The results indicate that hybrid spatio-temporal multitask architectures can substantially improve both the forecast accuracy and robustness to extremes, making HST–MB–CREH a promising building block for intelligent decision-support tools in smart grids with a high share of PV generation. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

22 pages, 3809 KB  
Article
Research on Remote Sensing Image Object Segmentation Using a Hybrid Multi-Attention Mechanism
by Lei Chen, Changliang Li, Yixuan Gao, Yujie Chang, Siming Jin, Zhipeng Wang, Xiaoping Ma and Limin Jia
Appl. Sci. 2026, 16(2), 695; https://doi.org/10.3390/app16020695 - 9 Jan 2026
Viewed by 240
Abstract
High-resolution remote sensing images are gradually playing an important role in land cover mapping, urban planning, and environmental monitoring tasks. However, current segmentation approaches frequently encounter challenges such as loss of detail and blurred boundaries when processing high-resolution remote sensing imagery, owing to [...] Read more.
High-resolution remote sensing images are gradually playing an important role in land cover mapping, urban planning, and environmental monitoring tasks. However, current segmentation approaches frequently encounter challenges such as loss of detail and blurred boundaries when processing high-resolution remote sensing imagery, owing to their complex backgrounds and dense semantic content. In response to the aforementioned limitations, this study introduces HMA-UNet, a novel segmentation network built upon the UNet framework and enhanced through a hybrid attention strategy. The architecture’s innovation centers on a composite attention block, where a lightweight split fusion attention (LSFA) mechanism and a lightweight channel-spatial attention (LCSA) mechanism are synergistically integrated within a residual learning structure to replace the stacked convolutional structure in UNet, which can improve the utilization of important shallow features and eliminate redundant information interference. Comprehensive experiments on the WHDLD dataset and the DeepGlobe road extraction dataset show that our proposed method achieves effective segmentation in remote sensing images by fully utilizing shallow features and eliminating redundant information interference. The quantitative evaluation results demonstrate the performance of the proposed method across two benchmark datasets. On the WHDLD dataset, the model attains a mean accuracy, IoU, precision, and recall of 72.40%, 60.71%, 75.46%, and 72.41%, respectively. Correspondingly, on the DeepGlobe road extraction dataset, it achieves a mean accuracy of 57.87%, an mIoU of 49.82%, a mean precision of 78.18%, and a mean recall of 57.87%. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

27 pages, 4932 KB  
Article
Automated Facial Pain Assessment Using Dual-Attention CNN with Clinically Calibrated High-Reliability and Reproducibility Framework
by Albert Patrick Sankoh, Ali Raza, Khadija Parwez, Wesam Shishah, Ayman Alharbi, Mubeen Javed and Muhammad Bilal
Biomimetics 2026, 11(1), 51; https://doi.org/10.3390/biomimetics11010051 - 8 Jan 2026
Viewed by 414
Abstract
Accurate and quantitative pain assessment remains a major challenge in clinical medicine, especially for patients unable to verbalize discomfort. Conventional methods based on self-reports or clinician observation are subjective and inconsistent. This study introduces a novel automated facial pain assessment framework built on [...] Read more.
Accurate and quantitative pain assessment remains a major challenge in clinical medicine, especially for patients unable to verbalize discomfort. Conventional methods based on self-reports or clinician observation are subjective and inconsistent. This study introduces a novel automated facial pain assessment framework built on a dual-attention convolutional neural network (CNN) that achieves clinically calibrated, high-reliability performance and interpretability. The architecture combines multi-head spatial attention to localize pain-relevant facial regions with an enhanced channel attention block employing triple-pooling (average, max, and standard deviation) to capture discriminative intensity features. Regularization through label smoothing (α = 0.1) and AdamW optimization ensures calibrated, stable convergence. Evaluated on a clinically annotated dataset using subject-wise stratified sampling, the proposed model achieved a test accuracy of 90.19% ± 0.94%, with an average 5-fold cross-validation accuracy of 83.60% ± 1.55%. The model further attained an F1-score of 0.90 and Cohen’s κ = 0.876, with macro- and micro-AUCs of 0.991 and 0.992, respectively. The evaluation covers five pain classes (No Pain, Mid Pain, Moderate Pain, Severe Pain, and Very Pain) using subject-wise splits comprising 5840 total images and 1160 test samples. Comparative benchmarking and ablation experiments confirm each module’s contribution, while Grad-CAM visualizations highlight physiologically relevant facial regions. The results demonstrate a robust, explainable, and reproducible framework suitable for integration into real-world automated pain-monitoring systems. Inspired by biological pain perception mechanisms and human facial muscle responses, the proposed framework aligns with biomimetic sensing principles by emulating how localized facial cues contribute to pain interpretation. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) in Biomedical Engineering: 2nd Edition)
Show Figures

Figure 1

18 pages, 2327 KB  
Article
Preliminary Study on Synergistic Effects of Humic Acid and Seaweed Extract on Cereal Crop Yield and Competitiveness with Wild Weed Beets (Beta vulgaris L.)
by Zainulabdeen Kh. Al-Musawi, Husam S. M. Khalaf, Ali A. Hassouni, Rusul R. Shakir, Viktória Vona and István Mihály Kulmány
Plants 2025, 14(24), 3770; https://doi.org/10.3390/plants14243770 - 11 Dec 2025
Viewed by 554
Abstract
Crop–weed competition markedly reduces cereal yield. Integrative weed management approaches, involving the use of humic acid (HA) and seaweed extract (SWE), have gained attention as herbicide efficacy declines and environmental concerns grow. However, potential synergistic effects between HA and SWE have not yet [...] Read more.
Crop–weed competition markedly reduces cereal yield. Integrative weed management approaches, involving the use of humic acid (HA) and seaweed extract (SWE), have gained attention as herbicide efficacy declines and environmental concerns grow. However, potential synergistic effects between HA and SWE have not yet been investigated. We evaluated the effects of HA, SWE, and their combination (HA+SWE) on the growth, yield, and competitive ability of cereals against wild weed beets (Beta vulgaris L.). A single-season field experiment was conducted using a split-plot design within a randomised complete block to assess the effects of treatment amendments on wheat, barley, and oats. The results showed that HA and HA+SWE organic amendments consistently improved grain yield and biomass across crop species. SWE responses varied across species, indicating species-dependent sensitivity. In addition, HA enhanced barley weed suppression, highlighting its dual roles in improving crop vigour and reducing weed proliferation. In contrast, SWE modestly increased spike length in oats, emphasising its effect on crop growth characteristics. Overall, these preliminary findings support targeted biostimulant use to enhance cereal yield and integrate weed management into sustainable cropping systems. Full article
Show Figures

Figure 1

20 pages, 6447 KB  
Article
ASPCCNet: A Lightweight Pavement Crack Classification Network Based on Augmented ShuffleNet
by Gui Yu, Xuan Zuo, Xinyi Wang, Shiyu Chen and Shuangxi Gao
Symmetry 2025, 17(12), 2095; https://doi.org/10.3390/sym17122095 - 6 Dec 2025
Viewed by 269
Abstract
Pavement cracks are a critical indicator for assessing structural health and forecasting deterioration trends. Accurate and automated crack classification is of paramount importance for the intelligent maintenance of road structures. Inspired by the principles of symmetry—which often lead to robust and efficient structures [...] Read more.
Pavement cracks are a critical indicator for assessing structural health and forecasting deterioration trends. Accurate and automated crack classification is of paramount importance for the intelligent maintenance of road structures. Inspired by the principles of symmetry—which often lead to robust and efficient structures in both nature and engineering—this paper proposes ASPCCNet, a lightweight network that embeds these principles into its core design. The network centers on a novel building block, AugShuffleBlock, which embodies a symmetry-informed design through the integration of Partial Convolution (PConv), a tunable channel splitting mechanism (AugShuffle), and the Channel Prior Convolutional Attention (CPCA). This design achieves efficient feature extraction and fusion with minimal computational overhead. Experimental results on the public RCCD dataset demonstrate that ASPCCNet significantly outperforms mainstream lightweight models, achieving an F1-score of 0.816, which is 6.4% to 10.9% higher than other mainstream models, with only 0.294 M parameters and 48.68 MFLOPs. This work showcases how a symmetry-guided design philosophy can be leveraged to achieve a superior balance between accuracy and efficiency for real-time edge deployment. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

31 pages, 7049 KB  
Article
Objective Emotion Assessment Using a Triple Attention Network for an EEG-Based Brain–Computer Interface
by Lihua Zhang, Xin Zhang, Xiu Zhang, Changyi Yu and Xuguang Liu
Brain Sci. 2025, 15(11), 1167; https://doi.org/10.3390/brainsci15111167 - 29 Oct 2025
Cited by 1 | Viewed by 831
Abstract
Background: The assessment of emotion recognition holds growing significance in research on the brain–computer interface and human–computer interaction. Among diverse physiological signals, electroencephalography (EEG) occupies a pivotal position in affective computing due to its exceptional temporal resolution and non-invasive acquisition. However, EEG signals [...] Read more.
Background: The assessment of emotion recognition holds growing significance in research on the brain–computer interface and human–computer interaction. Among diverse physiological signals, electroencephalography (EEG) occupies a pivotal position in affective computing due to its exceptional temporal resolution and non-invasive acquisition. However, EEG signals are inherently complex, characterized by substantial noise contamination and high variability, posing considerable challenges to accurate assessment. Methods: To tackle these challenges, we propose a Triple Attention Network (TANet), a triple-attention EEG emotion recognition framework that integrates Conformer, Convolutional Block Attention Module (CBAM), and Mutual Cross-Modal Attention (MCA). The Conformer component captures temporal feature dependencies, CBAM refines spatial channel representations, and MCA performs cross-modal fusion of differential entropy and power spectral density features. Results: We evaluated TANet on two benchmark EEG emotion datasets, DEAP and SEED. On SEED, using a subject-specific cross-validation protocol, the model reached an average accuracy of 98.51 ± 1.40%. On DEAP, we deliberately adopted a segment-level splitting paradigm—in line with influential state-of-the-art methods—to ensure a direct and fair comparison of model architecture under an identical evaluation protocol. This approach, designed specifically to assess fine-grained within-trial pattern discrimination rather than cross-subject generalization, yielded accuracies of 99.69 ± 0.15% and 99.67 ± 0.13% for the valence and arousal dimensions, respectively. Compared with existing benchmark approaches under similar evaluation protocols, TANet delivers substantially better results, underscoring the strong complementary effects of its attention mechanisms in improving EEG-based emotion recognition performance. Conclusions: This work provides both theoretical insights into multi-dimensional attention for physiological signal processing and practical guidance for developing high-performance, robust EEG emotion assessment systems. Full article
(This article belongs to the Section Neurotechnology and Neuroimaging)
Show Figures

Figure 1

26 pages, 435 KB  
Review
Pest Detection in Edible Crops at the Edge: An Implementation-Focused Review of Vision, Spectroscopy, and Sensors
by Dennys Jhon Báez-Sánchez, Julio Montesdeoca, Brayan Saldarriaga-Mesa, Gaston Gaspoz, Santiago Tosetti and Flavio Capraro
Sensors 2025, 25(21), 6620; https://doi.org/10.3390/s25216620 - 28 Oct 2025
Viewed by 1154
Abstract
Early pest detection in edible crops demands sensing solutions that can run at the edge under tight power, budget, and maintenance constraints. This review synthesizes peer-reviewed work (2015–2025) on three modality families—vision/AI, spectroscopy/imaging spectroscopy, and indirect sensors—restricted to edible crops and studies reporting [...] Read more.
Early pest detection in edible crops demands sensing solutions that can run at the edge under tight power, budget, and maintenance constraints. This review synthesizes peer-reviewed work (2015–2025) on three modality families—vision/AI, spectroscopy/imaging spectroscopy, and indirect sensors—restricted to edible crops and studies reporting some implementation or testing (n = 178; IEEE Xplore and Scopus). Each article was scored with a modality-aware performance–cost–implementability (PCI) rubric using category-specific weights, and the inter-reviewer reliability was quantified with weighted Cohen’s κ. We translated the evidence into compact decision maps for common deployment profiles (low-power rapid rollout; high-accuracy cost-flexible; and block-scale scouting). Across the corpus, vision/AI and well-engineered sensor systems more often reached deployment-leaning PCI (≥3.5: 32.0% and 33.3%, respectively) than spectroscopy (18.2%); the median PCI was 3.20 (AI), 3.17 (sensors), and 2.60 (spectroscopy). A Pareto analysis highlighted detector/attention models near (P,C,I)(4,5,4); sensor nodes spanning balanced (4,4,4) and ultra-lean (2,5,4) trade-offs; and the spectroscopy split between the early-warning strength (5,4,3) and portability (4,3,4). The inter-rater agreement was substantial for sensors and spectroscopy (pooled quadratic κ = 0.73–0.83; up to 0.93 by dimension) and modest for imaging/AI (PA vs. Author 2: κquadratic=0.300.44), supporting rubric stability with adjacency-dominated disagreements. The decision maps operationalize these findings, helping practitioners select a fit-for-purpose modality and encouraging a minimum PCI metadata set to enable reproducible, deployment-oriented comparisons. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

26 pages, 1882 KB  
Article
TAT-SARNet: A Transformer-Attentive Two-Stream Soccer Action Recognition Network with Multi-Dimensional Feature Fusion and Hierarchical Temporal Classification
by Abdulrahman Alqarafi and Bassam Almogadwy
Mathematics 2025, 13(18), 3011; https://doi.org/10.3390/math13183011 - 17 Sep 2025
Viewed by 1127
Abstract
(1) Background: Soccer action recognition (SAR) is essential in modern sports analytics, supporting automated performance evaluation, tactical strategy analysis, and detailed player behavior modeling. Although recent advances in deep learning and computer vision have enhanced SAR capabilities, many existing methods remain limited to [...] Read more.
(1) Background: Soccer action recognition (SAR) is essential in modern sports analytics, supporting automated performance evaluation, tactical strategy analysis, and detailed player behavior modeling. Although recent advances in deep learning and computer vision have enhanced SAR capabilities, many existing methods remain limited to coarse-grained classifications, grouping actions into broad categories such as attacking, defending, or goalkeeping. These models often fall short in capturing fine-grained distinctions, contextual nuances, and long-range temporal dependencies. Transformer-based approaches offer potential improvements but are typically constrained by the need for large-scale datasets and high computational demands, limiting their practical applicability. Moreover, current SAR systems frequently encounter difficulties in handling occlusions, background clutter, and variable camera angles, which contribute to misclassifications and reduced accuracy. (2) Methods: To overcome these challenges, we propose TAT-SARNet, a structured framework designed for accurate and fine-grained SAR. The model begins by applying Sparse Dilated Attention (SDA) to emphasize relevant spatial dependencies while mitigating background noise. Refined spatial features are then processed through the Split-Stream Feature Processing Module (SSFPM), which separately extracts appearance-based (RGB) and motion-based (optical flow) features using ResNet and 3D CNNs. These features are temporally refined by the Multi-Granular Temporal Processing (MGTP) module, which integrates ResIncept Patch Consolidation (RIPC) and Progressive Scale Construction Module (PSCM) to capture both short- and long-range temporal patterns. The output is then fused via the Context-Guided Dual Transformer (CGDT), which models spatiotemporal interactions through a Bi-Transformer Connector (BTC) and Channel–Spatial Attention Block (CSAB); (3) Results: Finally, the Cascaded Temporal Classification (CTC) module maps these features to fine-grained action categories, enabling robust recognition even under challenging conditions such as occlusions and rapid movements. (4) Conclusions: This end-to-end architecture ensures high precision in complex real-world soccer scenarios. Full article
(This article belongs to the Special Issue Artificial Intelligence: Deep Learning and Computer Vision)
Show Figures

Figure 1

26 pages, 23082 KB  
Article
SPyramidLightNet: A Lightweight Shared Pyramid Network for Efficient Underwater Debris Detection
by Yi Luo and Osama Eljamal
Appl. Sci. 2025, 15(17), 9404; https://doi.org/10.3390/app15179404 - 27 Aug 2025
Cited by 1 | Viewed by 871
Abstract
Underwater debris detection plays a crucial role in marine environmental protection. However, existing object detection algorithms generally suffer from excessive model complexity and insufficient detection accuracy, making it difficult to meet the real-time detection requirements in resource-constrained underwater environments. To address this challenge, [...] Read more.
Underwater debris detection plays a crucial role in marine environmental protection. However, existing object detection algorithms generally suffer from excessive model complexity and insufficient detection accuracy, making it difficult to meet the real-time detection requirements in resource-constrained underwater environments. To address this challenge, this paper proposes a novel lightweight object detection network named the Shared Pyramid Lightweight Network (SPyramidLightNet). The network adopts an improved architecture based on YOLOv11 and achieves an optimal balance between detection performance and computational efficiency by integrating three core innovative modules. First, the Split–Merge Attention Block (SMAB) employs a dynamic kernel selection mechanism and split–merge strategy, significantly enhancing feature representation capability through adaptive multi-scale feature fusion. Second, the C3 GroupNorm Detection Head (C3GNHead) introduces a shared convolution mechanism and GroupNorm normalization strategy, substantially reducing the computational complexity of the detection head while maintaining detection accuracy. Finally, the Shared Pyramid Convolution (SPyramidConv) replaces traditional pooling operations with a parameter-sharing multi-dilation-rate convolution architecture, achieving more refined and efficient multi-scale feature aggregation. Extensive experiments on underwater debris datasets demonstrate that SPyramidLightNet achieves 0.416 on the mAP@0.5:0.95 metric, significantly outperforming mainstream algorithms including Faster-RCNN, SSD, RT-DETR, and the YOLO series. Meanwhile, compared to the baseline YOLOv11, the proposed algorithm achieves an 11.8% parameter compression and a 17.5% computational complexity reduction, with an inference speed reaching 384 FPS, meeting the stringent requirements for real-time detection. Ablation experiments and visualization analyses further validate the effectiveness and synergistic effects of each core module. This research provides important theoretical guidance for the design of lightweight object detection algorithms and lays a solid foundation for the development of automated underwater debris recognition and removal technologies. Full article
Show Figures

Figure 1

22 pages, 6194 KB  
Article
KidneyNeXt: A Lightweight Convolutional Neural Network for Multi-Class Renal Tumor Classification in Computed Tomography Imaging
by Gulay Maçin, Fatih Genç, Burak Taşcı, Sengul Dogan and Turker Tuncer
J. Clin. Med. 2025, 14(14), 4929; https://doi.org/10.3390/jcm14144929 - 11 Jul 2025
Cited by 5 | Viewed by 2570
Abstract
Background: Renal tumors, encompassing benign, malignant, and normal variants, represent a significant diagnostic challenge in radiology due to their overlapping visual characteristics on computed tomography (CT) scans. Manual interpretation is time consuming and susceptible to inter-observer variability, emphasizing the need for automated, [...] Read more.
Background: Renal tumors, encompassing benign, malignant, and normal variants, represent a significant diagnostic challenge in radiology due to their overlapping visual characteristics on computed tomography (CT) scans. Manual interpretation is time consuming and susceptible to inter-observer variability, emphasizing the need for automated, reliable classification systems to support early and accurate diagnosis. Method and Materials: We propose KidneyNeXt, a custom convolutional neural network (CNN) architecture designed for the multi-class classification of renal tumors using CT imaging. The model integrates multi-branch convolutional pathways, grouped convolutions, and hierarchical feature extraction blocks to enhance representational capacity. Transfer learning with ImageNet 1K pretraining and fine tuning was employed to improve generalization across diverse datasets. Performance was evaluated on three CT datasets: a clinically curated retrospective dataset (3199 images), the Kaggle CT KIDNEY dataset (12,446 images), and the KAUH: Jordan dataset (7770 images). All images were preprocessed to 224 × 224 resolution without data augmentation and split into training, validation, and test subsets. Results: Across all datasets, KidneyNeXt demonstrated outstanding classification performance. On the clinical dataset, the model achieved 99.76% accuracy and a macro-averaged F1 score of 99.71%. On the Kaggle CT KIDNEY dataset, it reached 99.96% accuracy and a 99.94% F1 score. Finally, evaluation on the KAUH dataset yielded 99.74% accuracy and a 99.72% F1 score. The model showed strong robustness against class imbalance and inter-class similarity, with minimal misclassification rates and stable learning dynamics throughout training. Conclusions: The KidneyNeXt architecture offers a lightweight yet highly effective solution for the classification of renal tumors from CT images. Its consistently high performance across multiple datasets highlights its potential for real-world clinical deployment as a reliable decision support tool. Future work may explore the integration of clinical metadata and multimodal imaging to further enhance diagnostic precision and interpretability. Additionally, interpretability was addressed using Grad-CAM visualizations, which provided class-specific attention maps to highlight the regions contributing to the model’s predictions. Full article
(This article belongs to the Special Issue Artificial Intelligence and Deep Learning in Medical Imaging)
Show Figures

Figure 1

21 pages, 9172 KB  
Article
Spike-Driven Channel-Temporal Attention Network with Multi-Scale Convolution for Energy-Efficient Bearing Fault Detection
by JinGyo Lim and Seong-Eun Kim
Appl. Sci. 2025, 15(13), 7622; https://doi.org/10.3390/app15137622 - 7 Jul 2025
Viewed by 1034
Abstract
Real-time bearing fault diagnosis necessitates highly accurate, computationally efficient, and energy-conserving models suitable for deployment on resource-constrained edge devices. To address these demanding requirements, we propose the Spike Convolutional Attention Network (SpikeCAN), a novel spike-driven neural architecture tailored explicitly for real-time industrial diagnostics. [...] Read more.
Real-time bearing fault diagnosis necessitates highly accurate, computationally efficient, and energy-conserving models suitable for deployment on resource-constrained edge devices. To address these demanding requirements, we propose the Spike Convolutional Attention Network (SpikeCAN), a novel spike-driven neural architecture tailored explicitly for real-time industrial diagnostics. SpikeCAN utilizes the inherent sparsity and event-driven processing capabilities of spiking neural networks (SNNs), significantly minimizing both computational load and power consumption. The SpikeCAN integrates a multi-dilated receptive field (MDRF) block and a convolution-based spike attention module. The MDRF module effectively captures extensive temporal dependencies from signals across various scales. Simultaneously, the spike-based attention mechanism dynamically extracts spatial-temporal patterns, substantially improving diagnostic accuracy and reliability. We validate SpikeCAN on two public bearing fault datasets: the Case Western Reserve University (CWRU) and the Society for Machinery Failure Prevention Technology (MFPT). The proposed model achieves 99.86% accuracy on the four-class CWRU dataset through five-fold cross-validation and 99.88% accuracy with a conventional 70:30 train–test random split. For the more challenging ten-class classification task on the same dataset, it achieves 97.80% accuracy under five-fold cross-validation. Furthermore, SpikeCAN attains a state-of-the-art accuracy of 96.31% on the fifteen-class MFPT dataset, surpassing existing benchmarks. These findings underscore a significant advancement in fault diagnosis technology, demonstrating the considerable practical potential of spike-driven neural networks in real-time, energy-efficient industrial diagnostic applications. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

35 pages, 16759 KB  
Article
A Commodity Recognition Model Under Multi-Size Lifting and Lowering Sampling
by Mengyuan Chen, Song Chen, Kai Xie, Bisheng Wu, Ziyu Qiu, Haofei Xu and Jianbiao He
Electronics 2025, 14(11), 2274; https://doi.org/10.3390/electronics14112274 - 2 Jun 2025
Cited by 1 | Viewed by 1014
Abstract
Object detection algorithms have evolved from two-stage to single-stage architectures, with foundation models achieving sustained improvements in accuracy. However, in intelligent retail scenarios, small object detection and occlusion issues still lead to significant performance degradation. To address these challenges, this paper proposes an [...] Read more.
Object detection algorithms have evolved from two-stage to single-stage architectures, with foundation models achieving sustained improvements in accuracy. However, in intelligent retail scenarios, small object detection and occlusion issues still lead to significant performance degradation. To address these challenges, this paper proposes an improved model based on YOLOv11, focusing on resolving insufficient multi-scale feature coupling and occlusion sensitivity. First, a multi-scale feature extraction network (MFENet) is designed. It splits input feature maps into dual branches along the channel dimension: the upper branch performs local detail extraction and global semantic enhancement through secondary partitioning, while the lower branch integrates CARAFE (content-aware reassembly of features) upsampling and SENet (squeeze-and-excitation network) channel weight matrices to achieve adaptive feature enhancement. The three feature streams are fused to output multi-scale feature maps, significantly improving small object detail retention. Second, a convolutional block attention module (CBAM) is introduced during feature fusion, dynamically focusing on critical regions through channel–spatial dual attention mechanisms. A fuseModule is designed to aggregate multi-level features, enhancing contextual modeling for occluded objects. Additionally, the extreme-IoU (XIoU) loss function replaces the traditional complete-IoU (CIoU), combined with XIoU-NMS (extreme-IoU non-maximum suppression) to suppress redundant detections, optimizing convergence speed and localization accuracy. Experiments demonstrate that the improved model achieves a mean average precision (mAP50) of 0.997 (0.2% improvement) and mAP50-95 of 0.895 (3.5% improvement) on the RPC product dataset and the 6th Product Recognition Challenge dataset. The recall rate increases to 0.996 (0.6% improvement over baseline). Although frames per second (FPS) decreased compared to the original model, the improved model still meets real-time requirements for retail scenarios. The model exhibits stable noise resistance in challenging environments and achieves 84% mAP in cross-dataset testing, validating its generalization capability and engineering applicability. Video streams were captured using a Zhongweiaoke camera operating at 60 fps, satisfying real-time detection requirements for intelligent retail applications. Full article
(This article belongs to the Special Issue Emerging Technologies in Computational Intelligence)
Show Figures

Figure 1

17 pages, 11008 KB  
Article
Retinex-Based Low-Light Image Enhancement via Spatial-Channel Redundancy Compression and Joint Attention
by Jinlong Chen, Zhigang Xiao, Xingguo Qin and Deming Luo
Electronics 2025, 14(11), 2212; https://doi.org/10.3390/electronics14112212 - 29 May 2025
Viewed by 2039
Abstract
Low-light image enhancement (LLIE) methods based on Retinex theory often involve complex, multi-stage training and are commonly built on convolutional neural networks (CNNs). However, CNNs suffer from limitations in capturing long-range dependencies and often introduce redundant computations, leading to high computational costs. To [...] Read more.
Low-light image enhancement (LLIE) methods based on Retinex theory often involve complex, multi-stage training and are commonly built on convolutional neural networks (CNNs). However, CNNs suffer from limitations in capturing long-range dependencies and often introduce redundant computations, leading to high computational costs. To address these issues, we propose a lightweight and efficient LLIE framework that incorporates an optimized CNN compression strategy and a novel attention mechanism. Specifically, we design a Spatial-Channel Feature Reconstruction Module (SCFRM) to suppress spatial and channel redundancy via split-reconstruction and separation-fusion strategies. SCFRM is composed of two parts, a Spatial Feature Enhancement Unit (SFEU) and a Channel Refinement Block (CRB), which together enhance feature representation while reducing computational load. Additionally, we introduce a Joint Attention (JOA) mechanism that captures long-range dependencies across spatial dimensions while preserving positional accuracy. Our Retinex-based framework separates the processing of illumination and reflectance components using a Denoising Network (DNNet) and a Light Enhancement Network (LINet). SCFRM is embedded into DNNet for improved denoising, while JOA is applied in LINet for precise brightness adjustment. Extensive experiments on multiple benchmark datasets demonstrate that our method achieves superior or comparable performance to state-of-the-art LLIE approaches, while significantly reducing computational complexity. On the LOL and VE-LOL datasets, our approach achieves the best or second-best scores in terms of PSNR and SSIM metrics, validating its effectiveness and efficiency. Full article
Show Figures

Figure 1

16 pages, 1659 KB  
Article
DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation
by Matteo Fincato and Roberto Vezzani
Sensors 2025, 25(10), 2997; https://doi.org/10.3390/s25102997 - 9 May 2025
Viewed by 1370
Abstract
Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a [...] Read more.
Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a novel framework that enhances multi-person pose estimation by leveraging a dual-block transformer decoding architecture. Class prediction and keypoint estimation are split into parallel blocks so each sub-task can be separately improved and the risk of interference is reduced. This architecture improves the precision of keypoint localization and the model’s capacity to accurately classify individuals. To improve model performance, the Keypoint-Block uses parallel processing of self-attentions, providing a novel strategy that improves keypoint localization accuracy and precision. Additionally, DualPose incorporates a contrastive denoising (CDN) mechanism, leveraging positive and negative samples to stabilize training and improve robustness. Thanks to CDN, a variety of training samples are created by introducing controlled noise into the ground truth, improving the model’s ability to discern between valid and incorrect keypoints. DualPose achieves state-of-the-art results outperforming recent end-to-end methods, as shown by extensive experiments on the MS COCO and CrowdPose datasets. The code and pretrained models are publicly available. Full article
Show Figures

Figure 1

23 pages, 3884 KB  
Article
Cascaded Feature Fusion Grasping Network for Real-Time Robotic Systems
by Hao Li and Lixin Zheng
Sensors 2024, 24(24), 7958; https://doi.org/10.3390/s24247958 - 13 Dec 2024
Cited by 2 | Viewed by 1675
Abstract
Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping [...] Read more.
Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping pose estimation. The network employs innovative structural designs, including depth-wise separable convolutions to reduce parameters and enhance computational efficiency; convolutional block attention modules to augment the model’s ability to focus on key features; multi-scale dilated convolution to expand the receptive field and capture multi-scale information; and bidirectional feature pyramid modules to achieve effective fusion and information flow of features at different levels. In tests on the Cornell dataset, our network achieved grasping pose prediction at a speed of 66.7 frames per second, with accuracy rates of 98.6% and 96.9% for image-wise and object-wise splits, respectively. The experimental results show that our method achieves high-speed processing while maintaining high accuracy. In real-world robotic grasping experiments, our method also proved to be effective, achieving an average grasping success rate of 95.6% on a robot equipped with parallel grippers. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Back to TopTop