Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (126)

Search Parameters:
Keywords = top-down neural attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1205 KB  
Article
A Hybrid CNN–LSTM–Attention Mechanism Model for Anomaly Detection in Lithium-Ion Batteries of Electric Bicycles
by Zhaoyang Sun, Weiming Ye, Yuxin Mao and Yuan Sui
Batteries 2025, 11(10), 384; https://doi.org/10.3390/batteries11100384 - 20 Oct 2025
Abstract
To improve the accuracy and stability of anomaly detection in lithium-ion batteries for electric bicycles, in this study, we propose a hybrid deep learning model that integrates a convolutional neural network (CNN), long short-term memory (LSTM) network, and attention mechanism to extract local [...] Read more.
To improve the accuracy and stability of anomaly detection in lithium-ion batteries for electric bicycles, in this study, we propose a hybrid deep learning model that integrates a convolutional neural network (CNN), long short-term memory (LSTM) network, and attention mechanism to extract local temporal features, capture long-term dependencies, and adaptively focus on key time segments around anomaly occurrences, respectively, thereby achieving a balance between local and global feature modeling. In terms of data preprocessing, separate feature sets are constructed for charging and discharging conditions, and sliding windows combined with min–max normalization are applied to generate model inputs. The model was trained and validated on large-scale real-world battery operation data. The experimental results demonstrate that the proposed method achieves high detection accuracy and robustness in terms of reconstruction error distribution, alarm rate stability, and Top-K anomaly consistency. The method can effectively identify various types of abnormal operating conditions in unlabeled datasets based on unsupervised learning. This study provides a transferable deep learning solution for enhancing the safety monitoring of electric bicycle batteries. Full article
(This article belongs to the Special Issue State-of-Health Estimation of Batteries)
Show Figures

Figure 1

14 pages, 3320 KB  
Article
SFD-YOLO: A Multi-Angle Scattered Field-Based Optical Surface Defect Recognition Method
by Xuan Liu, Hao Sun, Jian Zhang and Chunyan Wang
Photonics 2025, 12(9), 929; https://doi.org/10.3390/photonics12090929 - 18 Sep 2025
Viewed by 612
Abstract
The surface quality of optical components plays a decisive role in advanced imaging, precision manufacturing, and high-power laser systems, where even defects can induce abnormal scattering and degrade system performance. Addressing the limitations of conventional single-view inspection methods, this study presents a panoramic [...] Read more.
The surface quality of optical components plays a decisive role in advanced imaging, precision manufacturing, and high-power laser systems, where even defects can induce abnormal scattering and degrade system performance. Addressing the limitations of conventional single-view inspection methods, this study presents a panoramic multi-angle scattered light field acquisition approach integrated with deep learning-based recognition. A hemispherical synchronous imaging system is designed to capture complete scattered distributions from surface defects in a single exposure, ensuring both structural consistency and angular completeness of the measured data. To enhance the interpretation of complex scattering patterns, we develop a tailored lightweight network, SFD-YOLO, which incorporates the PSimam attention module for improved salient feature extraction and the Efficient_Mamba_CSP module for robust global semantic modeling. Using a simulated dataset of multi-width scratch defects, the proposed method achieves high classification accuracy with strong generalization and computational efficiency. Compared to the baseline YOLOv11-cls, SFD-YOLO improves Top-1 accuracy from 92.5% to 95.6%, while reducing the parameter count from 1.54 M to 1.25 M and maintaining low computational cost (Flops 4.0G). These results confirm that panoramic multi-angle scattered imaging, coupled with advanced neural architectures, provides a powerful and practical framework for optical surface defect detection, offering valuable prospects for high-precision quality evaluation and intelligent defect inversion in optical inspection. Full article
(This article belongs to the Section Lasers, Light Sources and Sensors)
Show Figures

Figure 1

39 pages, 9593 KB  
Article
An Integrated AI Framework for Occupational Health: Predicting Burnout, Long COVID, and Extended Sick Leave in Healthcare Workers
by Maria Valentina Popa, Călin Gheorghe Buzea, Irina Luciana Gurzu, Camer Salim, Bogdan Gurzu, Dragoș Ioan Rusu, Lăcrămioara Ochiuz and Letiția Doina Duceac
Healthcare 2025, 13(18), 2266; https://doi.org/10.3390/healthcare13182266 - 10 Sep 2025
Viewed by 695
Abstract
Background: Healthcare workers face multiple, interlinked occupational health risks—burnout, post-COVID-19 sequelae (Long COVID), and extended medical leave. These outcomes often share predictors, contribute to each other, and, together, impact workforce capacity. Yet, existing tools typically address them in isolation. Objective: The objective of [...] Read more.
Background: Healthcare workers face multiple, interlinked occupational health risks—burnout, post-COVID-19 sequelae (Long COVID), and extended medical leave. These outcomes often share predictors, contribute to each other, and, together, impact workforce capacity. Yet, existing tools typically address them in isolation. Objective: The objective of this study to develop and deploy an integrated, explainable artificial intelligence (AI) framework that predicts these three outcomes using the same structured occupational health dataset, enabling unified workforce risk monitoring. Methods: We analyzed data from 1244 Romanian healthcare professionals with 14 demographic, occupational, lifestyle, and comorbidity features. For each outcome, we trained a separate predictive model within a common framework: (1) a lightweight transformer neural network with hyperparameter optimization, (2) a transformer with multi-head attention, and (3) a stacked ensemble combining transformer, XGBoost, and logistic regression. The data were SMOTE-balanced and evaluated on held-out test sets using Accuracy, ROC-AUC, and F1-score, with 10,000-iteration bootstrap testing for statistical significance. Results: The stacked ensemble achieved the highest performance: ROC AUC = 0.70 (burnout), 0.93 (Long COVID), and 0.93 (extended leave). The F1 scores were >0.89 for Long COVID and extended leave, whereas the performance gains for burnout were comparatively modest, reflecting the multidimensional and heterogeneous nature of burnout as a binary construct. The gains over logistic regression were statistically significant (p < 0.0001 for Long COVID and extended leave; p = 0.0355 for burnout). The SHAP analysis identified overlapping top predictors—tenure, age, job role, cancer history, pulmonary disease, and obesity—supporting the value of a unified framework. Conclusions: We trained separate models for each occupational health risk but deployed them in a single, real-time web application. This integrated approach improves efficiency, enables multi-outcome workforce surveillance, and supports proactive interventions in healthcare settings. Full article
Show Figures

Figure 1

31 pages, 3554 KB  
Article
FFFNet: A Food Feature Fusion Model with Self-Supervised Clustering for Food Image Recognition
by Zhejun Kuang, Haobo Gao, Jian Zhao, Liu Wang and Lei Sun
Appl. Sci. 2025, 15(17), 9542; https://doi.org/10.3390/app15179542 - 29 Aug 2025
Viewed by 623
Abstract
With the growing emphasis on healthy eating and nutrition management in modern society, food image recognition has become increasingly important. However, it faces challenges such as large intra-class differences and high inter-class similarities. To tackle these issues, we present a Food Feature Fusion [...] Read more.
With the growing emphasis on healthy eating and nutrition management in modern society, food image recognition has become increasingly important. However, it faces challenges such as large intra-class differences and high inter-class similarities. To tackle these issues, we present a Food Feature Fusion Network (FFFNet), which leverages a multi-head cross-attention mechanism to integrate the local detail-capturing capability of Convolutional Neural Networks with the global modeling capacity of Vision Transformers. This enables the model to capture key discriminative features when addressing such challenging food recognition tasks. FFFNet also introduces self-supervised clustering, generating pseudo-labels from the feature space distribution and employing a clustering objective derived from Kullback–Leibler divergence to optimize the feature space distribution. By maximizing similarity between features and their corresponding cluster centers, and minimizing similarity with non-corresponding centers, it promotes intra-class compactness and inter-class separability, thereby addressing the core challenges. We evaluated FFFNet across the ISIA Food-500, ETHZ Food-101, and UEC Food256 datasets, attaining Top-1/Top-5 accuracies of 65.31%/88.94%, 89.98%/98.37%, and 80.91%/94.92%, respectively, outperforming existing approaches. Full article
Show Figures

Figure 1

20 pages, 2798 KB  
Article
LSTMConvSR: Joint Long–Short-Range Modeling via LSTM-First–CNN-Next Architecture for Remote Sensing Image Super-Resolution
by Qiwei Zhu, Guojing Zhang, Xiaoying Wang and Jianqiang Huang
Remote Sens. 2025, 17(15), 2745; https://doi.org/10.3390/rs17152745 - 7 Aug 2025
Viewed by 761
Abstract
The inability of existing super-resolution methods to jointly model short-range and long-range spatial dependencies in remote sensing imagery limits reconstruction efficacy. To address this, we propose LSTMConvSR, a novel framework inspired by top-down neural attention mechanisms. Our approach pioneers an LSTM-first–CNN-next architecture. First, [...] Read more.
The inability of existing super-resolution methods to jointly model short-range and long-range spatial dependencies in remote sensing imagery limits reconstruction efficacy. To address this, we propose LSTMConvSR, a novel framework inspired by top-down neural attention mechanisms. Our approach pioneers an LSTM-first–CNN-next architecture. First, an LSTM-based global modeling stage efficiently captures long-range dependencies via downsampling and spatial attention, achieving 80.3% lower FLOPs and 11× faster speed. Second, a CNN-based local refinement stage, guided by the LSTM’s attention maps, enhances details in critical regions. Third, a top-down fusion stage dynamically integrates global context and local features to generate the output. Extensive experiments on Potsdam, UAVid, and RSSCN7 benchmarks demonstrate state-of-the-art performance, achieving 33.94 dB PSNR on Potsdam with 2.4× faster inference than MambaIRv2. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning for Satellite Image Processing)
Show Figures

Figure 1

19 pages, 1339 KB  
Article
Convolutional Graph Network-Based Feature Extraction to Detect Phishing Attacks
by Saif Safaa Shakir, Leyli Mohammad Khanli and Hojjat Emami
Future Internet 2025, 17(8), 331; https://doi.org/10.3390/fi17080331 - 25 Jul 2025
Viewed by 982
Abstract
Phishing attacks pose significant risks to security, drawing considerable attention from both security professionals and customers. Despite extensive research, the current phishing website detection mechanisms often fail to efficiently diagnose unknown attacks due to their poor performances in the feature selection stage. Many [...] Read more.
Phishing attacks pose significant risks to security, drawing considerable attention from both security professionals and customers. Despite extensive research, the current phishing website detection mechanisms often fail to efficiently diagnose unknown attacks due to their poor performances in the feature selection stage. Many techniques suffer from overfitting when working with huge datasets. To address this issue, we propose a feature selection strategy based on a convolutional graph network, which utilizes a dataset containing both labels and features, along with hyperparameters for a Support Vector Machine (SVM) and a graph neural network (GNN). Our technique consists of three main stages: (1) preprocessing the data by dividing them into testing and training sets, (2) constructing a graph from pairwise feature distances using the Manhattan distance and adding self-loops to nodes, and (3) implementing a GraphSAGE model with node embeddings and training the GNN by updating the node embeddings through message passing from neighbors, calculating the hinge loss, applying the softmax function, and updating weights via backpropagation. Additionally, we compute the neighborhood random walk (NRW) distance using a random walk with restart to create an adjacency matrix that captures the node relationships. The node features are ranked based on gradient significance to select the top k features, and the SVM is trained using the selected features, with the hyperparameters tuned through cross-validation. We evaluated our model on a test set, calculating the performance metrics and validating the effectiveness of the PhishGNN dataset. Our model achieved a precision of 90.78%, an F1-score of 93.79%, a recall of 97%, and an accuracy of 93.53%, outperforming the existing techniques. Full article
(This article belongs to the Section Cybersecurity)
Show Figures

Graphical abstract

24 pages, 3937 KB  
Article
HyperTransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Hyperspectral Image Classification
by Xin Dai, Zexi Li, Lin Li, Shuihua Xue, Xiaohui Huang and Xiaofei Yang
Remote Sens. 2025, 17(14), 2361; https://doi.org/10.3390/rs17142361 - 9 Jul 2025
Cited by 1 | Viewed by 684
Abstract
Recent advances in hyperspectral image (HSI) classification have demonstrated the effectiveness of hybrid architectures that integrate convolutional neural networks (CNNs) and Transformers, leveraging CNNs for local feature extraction and Transformers for global dependency modeling. However, existing fusion approaches face three critical challenges: (1) [...] Read more.
Recent advances in hyperspectral image (HSI) classification have demonstrated the effectiveness of hybrid architectures that integrate convolutional neural networks (CNNs) and Transformers, leveraging CNNs for local feature extraction and Transformers for global dependency modeling. However, existing fusion approaches face three critical challenges: (1) insufficient synergy between spectral and spatial feature learning due to rigid coupling mechanisms; (2) high computational complexity resulting from redundant attention calculations; and (3) limited adaptability to spectral redundancy and noise in small-sample scenarios. To address these limitations, we propose HyperTransXNet, a novel CNN-Transformer hybrid architecture that incorporates adaptive spectral-spatial fusion. Specifically, the proposed HyperTransXNet comprises three key modules: (1) a Hybrid Spatial-Spectral Module (HSSM) that captures the refined local spectral-spatial features and models global spectral correlations by combining depth-wise dynamic convolution with frequency-domain attention; (2) a Mixture-of-Experts Routing (MoE-R) module that adaptively fuses multi-scale features by dynamically selecting optimal experts via Top-K sparse weights; and (3) a Spatial-Spectral Tokens Enhancer (SSTE) module that ensures causality-preserving interactions between spectral bands and spatial contexts. Extensive experiments on the Indian Pines, Houston 2013, and WHU-Hi-LongKou datasets demonstrate the superiority of HyperTransXNet. Full article
(This article belongs to the Special Issue AI-Driven Hyperspectral Remote Sensing of Atmosphere and Land)
Show Figures

Figure 1

16 pages, 2882 KB  
Article
Empathic Traits Modulate Oscillatory Dynamics Revealed by Time–Frequency Analysis During Body Language Reading
by Alice Mado Proverbio and Pasquale Scognamiglio
Brain Sci. 2025, 15(7), 673; https://doi.org/10.3390/brainsci15070673 - 23 Jun 2025
Viewed by 915
Abstract
Empathy has been linked to enhanced processing of social information, yet the neurophysiological correlates of such individual differences remain underexplored. Objectives: The aim of this study was to investigate how individual differences in trait empathy are reflected in oscillatory brain activity during [...] Read more.
Empathy has been linked to enhanced processing of social information, yet the neurophysiological correlates of such individual differences remain underexplored. Objectives: The aim of this study was to investigate how individual differences in trait empathy are reflected in oscillatory brain activity during the perception of non-verbal social cues. Methods: In this EEG study involving 30 participants, we examined spectral and time–frequency dynamics associated with trait empathy during a visual task requiring the interpretation of others’ body gestures. Results: FFT Power spectral analyses (applied to alpha/mu, beta, high beta, and gamma bands) revealed that individuals with high empathy quotients (High-EQ) exhibited a tendency for increased beta-band activity over frontal regions and markedly decreased alpha-band activity over occipito-parietal areas compared to their low-empathy counterparts (Low-EQ), suggesting heightened attentional engagement and reduced cortical inhibition during social information processing. Similarly, time–frequency analysis using Morlet wavelets showed higher alpha power in Low-EQ than High-EQ people over occipital sites, with no group differences in mu suppression or desynchronization (ERD) over central sites, challenging prior claims linking mu ERD to mirror neuron activity in empathic processing. These findings align with recent literature associating frontal beta oscillations with top-down attentional control and emotional regulation, and posterior alpha with vigilance and sensory disengagement. Conclusions: Our results indicate that empathic traits are differentially reflected in anterior and posterior oscillatory dynamics, supporting the notion that individuals high in empathy deploy greater cognitive and attentional resources when decoding non-verbal social cues. These neural patterns may underlie their superior ability to interpret body language and mental states from visual input. Full article
Show Figures

Graphical abstract

24 pages, 27167 KB  
Article
ICT-Net: A Framework for Multi-Domain Cross-View Geo-Localization with Multi-Source Remote Sensing Fusion
by Min Wu, Sirui Xu, Ziwei Wang, Jin Dong, Gong Cheng, Xinlong Yu and Yang Liu
Remote Sens. 2025, 17(12), 1988; https://doi.org/10.3390/rs17121988 - 9 Jun 2025
Viewed by 759
Abstract
Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net [...] Read more.
Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net (Integrated CNN-Transformer Network) that synergistically combines convolutional neural networks with Transformer architectures. Our approach harnesses the complementary strengths of CNNs in capturing local geometric details and Transformers in establishing long-range dependencies, enabling comprehensive joint perception of both local and global visual patterns. Furthermore, capitalizing on the Transformer’s flexible input processing mechanism, we develop an attention-guided non-uniform cropping strategy that dynamically eliminates redundant image patches with minimal impact on localization accuracy, thereby achieving enhanced computational efficiency. To facilitate practical deployment, we propose a deep embedding clustering algorithm optimized for rapid parsing of geo-localization information. Extensive experiments demonstrate that ICT-Net establishes new state-of-the-art localization accuracy on the CVUSA benchmark, achieving a top-1 recall rate improvement of 8.6% over previous methods. Additional validation on a challenging real-world dataset collected at Beihang University (BUAA) further confirms the framework’s effectiveness and practical applicability in complex urban environments, particularly showing 23% higher robustness to vegetation variations. Full article
Show Figures

Figure 1

23 pages, 8979 KB  
Article
Beef Carcass Grading with EfficientViT: A Lightweight Vision Transformer Approach
by Hyunwoo Lim and Eungyeol Song
Appl. Sci. 2025, 15(11), 6302; https://doi.org/10.3390/app15116302 - 4 Jun 2025
Cited by 1 | Viewed by 1809
Abstract
Beef carcass grading plays a pivotal role in determining market value and consumer preferences. While traditional visual inspection by experts remains the industry standard, it suffers from subjectivity and inconsistencies, particularly in high-throughput slaughterhouse environments. To address these limitations, we propose a one-stage [...] Read more.
Beef carcass grading plays a pivotal role in determining market value and consumer preferences. While traditional visual inspection by experts remains the industry standard, it suffers from subjectivity and inconsistencies, particularly in high-throughput slaughterhouse environments. To address these limitations, we propose a one-stage automated grading model based on EfficientViT, a lightweight vision transformer architecture. Unlike conventional two-stage methods that require prior segmentation of the loin region, our model directly predicts beef quality grades from raw RGB images, significantly simplifying the pipeline and reducing computational overhead. We evaluate the proposed model against representative convolutional neural networks (VGG-16, ResNeXt-50, DenseNet-121) as well as two-stage combinations of segmentation and classification models. Experiments were conducted on a publicly available beef carcass dataset consisting of over 77,000 labeled images. EfficientViT achieves the highest accuracy (98.46%) and F1-score (0.9867) among all evaluated models while maintaining low inference latency (3.92 ms) and compact parameter size (36.4 MB). In particular, it outperforms CNNs in predicting the top grade (1++), where global visual patterns such as marbling distribution are crucial. Furthermore, we employ Grad-CAM and attention map visualizations to analyze the model’s focus regions and demonstrate that EfficientViT captures holistic contextual features better than CNNs. The model also exhibits robustness across varying loin area proportions. Our findings suggest that EfficientViT is not only accurate but also efficient and interpretable, making it a strong candidate for real-time industrial applications in beef quality grading. Full article
Show Figures

Figure 1

34 pages, 6971 KB  
Article
Mathematical and Machine Learning Innovations for Power Systems: Predicting Transformer Oil Temperature with Beluga Whale Optimization-Based Hybrid Neural Networks
by Jingrui Liu, Zhiwen Hou, Bowei Liu and Xinhui Zhou
Mathematics 2025, 13(11), 1785; https://doi.org/10.3390/math13111785 - 27 May 2025
Cited by 2 | Viewed by 896
Abstract
Power transformers are vital in power systems, where oil temperature is a key operational indicator. This study proposes an advanced hybrid neural network model, BWO-TCN-BiGRU-Attention, to predict the top-oil temperature of transformers. The model was validated using temperature data from power transformers in [...] Read more.
Power transformers are vital in power systems, where oil temperature is a key operational indicator. This study proposes an advanced hybrid neural network model, BWO-TCN-BiGRU-Attention, to predict the top-oil temperature of transformers. The model was validated using temperature data from power transformers in two Chinese regions. It achieved MAEs of 0.5258 and 0.9995, MAPEs of 2.75% and 2.73%, and RMSEs of 0.6353 and 1.2158, significantly outperforming mainstream methods like ELM, PSO-SVR, Informer, CNN-BiLSTM-Attention, and CNN-GRU-Attention. In tests conducted in spring, summer, autumn, and winter, the model’s MAPE was 2.75%, 3.44%, 3.93%, and 2.46% for Transformer 1, and 2.73%, 2.78%, 3.07%, and 2.05% for Transformer 2, respectively. These results indicate that the model can maintain low prediction errors even with significant seasonal temperature variations. In terms of time granularity, the model performed well at both 1 h and 15 min intervals: for Transformer 1, MAPE was 2.75% at 1 h granularity and 2.98% at 15 min granularity; for Transformer 2, MAPE was 2.73% at 1 h granularity and further reduced to 2.16% at 15 min granularity. This shows that the model can adapt to different seasons and maintain good prediction performance with high-frequency data, providing reliable technical support for the safe and stable operation of power systems. Full article
Show Figures

Figure 1

26 pages, 3350 KB  
Article
Optimizing Backbone Networks Through Hybrid–Modal Fusion: A New Strategy for Waste Classification
by Houkui Zhou, Qifeng Ding, Chang Chen, Qinqin Liao, Qun Wang, Huimin Yu, Haoji Hu, Guangqun Zhang, Junguo Hu and Tao He
Sensors 2025, 25(10), 3241; https://doi.org/10.3390/s25103241 - 21 May 2025
Cited by 2 | Viewed by 872
Abstract
With rapid urbanization, effective waste classification is a critical challenge. Traditional manual methods are time-consuming, labor-intensive, costly, and error-prone, resulting in reduced accuracy. Deep learning has revolutionized this field. Convolutional neural networks such as VGG and ResNet have dramatically improved automated sorting efficiency, [...] Read more.
With rapid urbanization, effective waste classification is a critical challenge. Traditional manual methods are time-consuming, labor-intensive, costly, and error-prone, resulting in reduced accuracy. Deep learning has revolutionized this field. Convolutional neural networks such as VGG and ResNet have dramatically improved automated sorting efficiency, and Transformer architectures like the Swin Transformer have further enhanced performance and adaptability in complex sorting scenarios. However, these approaches still struggle in complex environments and with diverse waste types, often suffering from limited recognition accuracy, poor generalization, or prohibitive computational demands. To overcome these challenges, we propose an efficient hybrid-modal fusion method, the Hybrid-modal Fusion Waste Classification Network (HFWC-Net), for precise waste image classification. HFWC-Net leverages a Transformer-based hierarchical architecture that integrates CNNs and Transformers, enhancing feature capture and fusion across varied image types for superior scalability and flexibility. By incorporating advanced techniques such as the Agent Attention mechanism and the LionBatch optimization strategy, HFWC-Net not only improves classification accuracy but also significantly reduces classification time. Comparative experimental results on the public datasets Garbage Classification, TrashNet, and our self-built MixTrash dataset demonstrate that HFWC-Net achieves Top-1 accuracy rates of 98.89%, 96.88%, and 94.35%, respectively. These findings indicate that HFWC-Net attains the highest accuracy among current methods, offering significant advantages in accelerating classification efficiency and supporting automated waste management applications. Full article
Show Figures

Figure 1

20 pages, 2777 KB  
Article
Video Human Action Recognition Based on Motion-Tempo Learning and Feedback Attention
by Yalong Liu, Chengwu Liang, Songqi Jiang and Peiwang Zhu
Appl. Sci. 2025, 15(8), 4186; https://doi.org/10.3390/app15084186 - 10 Apr 2025
Viewed by 910
Abstract
In video human action-recognition tasks, motion tempo describes the dynamic patterns and temporal scales of human motion. Different categories of actions are typically composed of sub-actions with varying motion tempos. Effectively capturing sub-actions with different motion tempos and distinguishing category-specific sub-actions are crucial [...] Read more.
In video human action-recognition tasks, motion tempo describes the dynamic patterns and temporal scales of human motion. Different categories of actions are typically composed of sub-actions with varying motion tempos. Effectively capturing sub-actions with different motion tempos and distinguishing category-specific sub-actions are crucial for improving action-recognition performance. Convolutional Neural Network (CNN)-based methods attempted to address this challenge, by embedding feedforward attention modules to enhance the action’s dynamic representation learning. However, feedforward attention modules rely only on local information from low-level features, lacking contextual information to generate attention weights. Therefore, we propose a Sub-action Motion information Enhancement Network (SMEN) based on motion-tempo learning and feedback attention, which consists of the Multi-Granularity Adaptive Fusion Module (MgAFM) and Feedback Attention-Guided Module (FAGM). MgAFM enhances the model’s ability to capture crucial sub-action intrinsic information by extracting and adaptively fusing motion dynamic features at different granularities. FAGM leverages high-level features that contain contextual information in a feedback manner to guide low-level features in generating attention weights, enhancing the model’s ability to extract more discriminative spatio-temporal and channel-wise features. Experiments are conducted on three datasets, and the proposed SMEN achieves top-1 accuracies of 52.4%, 63.3% on the Something-Something V1 and V2 datasets, and 76.9% on the Kinetics-400 dataset. Ablation studies, evaluations, and visualizations demonstrate that the proposed SMEN is effective for sub-action motion tempo and representation learning, and outperforms compared methods for video action recognition. Full article
(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)
Show Figures

Figure 1

19 pages, 3572 KB  
Article
MOSSNet: A Lightweight Dual-Branch Multiscale Attention Neural Network for Bryophyte Identification
by Haixia Luo, Xiangfen Zhang, Feiniu Yuan, Jing Yu, Hao Ding, Haoyu Xu and Shitao Hong
Symmetry 2025, 17(3), 347; https://doi.org/10.3390/sym17030347 - 25 Feb 2025
Cited by 4 | Viewed by 690
Abstract
Bryophytes, including liverworts, mosses, and hornworts, play an irreplaceable role in soil moisture retention, erosion prevention, and pollution monitoring. The precise identification of bryophyte species enhances our understanding and utilization of their ecological functions. However, their complex morphology and structural symmetry make identification [...] Read more.
Bryophytes, including liverworts, mosses, and hornworts, play an irreplaceable role in soil moisture retention, erosion prevention, and pollution monitoring. The precise identification of bryophyte species enhances our understanding and utilization of their ecological functions. However, their complex morphology and structural symmetry make identification difficult. Although deep learning improves classification efficiency, challenges remain due to limited datasets and the inadequate adaptation of existing methods to multi-scale features, causing poor performance in fine-grained multi-classification. Thus, we propose MOSSNet, a lightweight neural network for bryophyte feature detection. It has a four-stage architecture that efficiently extracts multi-scale features using a modular design with symmetry consideration in feature representation. At the input stage, the Convolutional Patch Embedding (CPE) module captures representative features through a two-layer convolutional structure. In each subsequent stage, Dual-Branch Multi-scale (DBMS) modules are employed, with one branch utilizing convolutional operations and the other utilizing the Dilated Convolution Enhanced Attention (DCEA) module for multi-scale feature fusion. The DBMS module extracts fine-grained and coarse-grained features by a weighted fusion of the outputs from two branches. Evaluating MOSSNet on the self-constructed dataset BryophyteFine reveals a Top-1 accuracy of 99.02% in classifying 26 bryophyte species, 7.13% higher than the best existing model, while using only 1.58 M parameters, 0.07 G FLOPs. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

38 pages, 13077 KB  
Article
Accentuation as a Mechanism of Visual Illusions: Insights from Adaptive Resonance Theory (ART)
by Baingio Pinna, Jurģis Šķilters and Daniele Porcheddu
Information 2025, 16(3), 172; https://doi.org/10.3390/info16030172 - 25 Feb 2025
Cited by 1 | Viewed by 1381
Abstract
This study introduces and examines the principle of accentuation as a novel mechanism in perceptual organization, analyzing its effects through the framework of Grossberg’s Adaptive Resonance Theory (ART). We demonstrate that localized accentuators, manifesting as minimal dissimilarities or discontinuities, can significantly modulate global [...] Read more.
This study introduces and examines the principle of accentuation as a novel mechanism in perceptual organization, analyzing its effects through the framework of Grossberg’s Adaptive Resonance Theory (ART). We demonstrate that localized accentuators, manifesting as minimal dissimilarities or discontinuities, can significantly modulate global perceptions, inducing illusions of geometric distortion, orientation shifts, and apparent motion. Through a series of phenomenological experiments, we establish that accentuation can supersede classical Gestalt principles, influencing figure-ground segregation, shape perception, and lexical processing. Our findings suggest that accentuation functions as an autonomous organizing principle, leveraging salience-driven attentional capture to generate perceptual effects. We then apply the ART model to elucidate these phenomena, focusing on its core constructs of complementary computing, boundary–surface interactions, and resonant states. Specifically, we show how accentuation-induced asymmetries in boundary signals within the boundary contour system (BCS) can propagate through laminar cortical circuits, biasing figure-ground assignments and shape representations. The interaction between these biased signals and top–down expectations, as modeled by ART’s resonance mechanisms, provides a neurally plausible account for the observed illusions. This integration of accentuation effects with ART offers novel insights into the neural substrates of visual perception and presents a unifying theoretical framework for a diverse array of perceptual phenomena, bridging low-level feature processing with high-level cognitive representations. Full article
Show Figures

Figure 1

Back to TopTop