Saved Queries

The automatic modulation recognition (AMR) of low probability intercept (LPI) signals has received a considerable amount of interest from many researchers who have done much work on electronic reconnaissance. This recognition technology aims to design a classifier that enables the identification of signals with different modulation types. Based on deep learning models such as a convolutional neural network (CNN), the time-frequency images (TFIs) of the signal can be input to further extract features for classification. To improve recognition accuracy, especially under low signal-to-noise ratios (SNRs), we propose an AMR method for radio frequency proximity sensor signals based on a TFI enhancement network. The TFIs are denoised based on a per-pixel kernel prediction network (KPN), which can improve the quality of TFIs and achieves comparable denoising performance to traditional TFI reconstruction methods (e.g., sparse representation-based methods and low-rank approximation methods), while requiring significantly less computational overhead. The denoised TFIs, with enhanced signal quality and reduced noise, are then fed into the RetinalNet-based classifier as high-quality input features. This enhancement is crucial for the subsequent recognition stage, as it significantly improves the modulation recognition accuracy, particularly under challenging low SNR conditions. Simulation results show that the proposed method can accurately identify the modulation types of different radio frequency proximity sensors that are aliased in the time-frequency domain under low SNRs, and the average recognition accuracy rate of the signal remains above 97% when the signal-to-noise ratio is above −10 dB. Full article

(This article belongs to the Section Sensing and Imaging)

27 pages, 6376 KB

Open AccessArticle

A GAN-CNN Fusion Framework for Deep Learning-Based DOA Estimation in Low-SNR Environments

by Zhenshan Zhang, Wenjie Xu, Haitao Zou and Shichao Yi

Sensors 2026, 26(5), 1676; https://doi.org/10.3390/s26051676 - 6 Mar 2026

Abstract

Direction of Arrival (DOA) estimation faces significant performance degradation under low Signal-to-Noise Ratio (SNR) conditions, where traditional algorithms and deep learning models struggle due to corrupted spatial information and limited training data. To address these challenges, this paper introduces a novel two-stage framework that integrates a Generative Adversarial Network (GAN) for signal enhancement with a complex-valued Convolutional Neural Network (CNN) for DOA estimation. The proposed GAN incorporates an attention mechanism and a dedicated phase-consistent loss function to suppress noise while preserving spatial phase information critical for accurate direction finding. Enhanced signals are transformed into covariance matrices and processed by a complex-valued CNN designed to extract robust spatial features. Extensive experiments demonstrate that the proposed method achieves a DOA accuracy of 72.2% and a Root Mean Square Error (RMSE) of

3.9 °

at —10 dB SNR with 500 snapshots, substantially outperforming conventional and deep learning baselines. The framework also shows strong robustness to limited data, maintaining 93.8% accuracy with only 50 snapshots. The framework offers a practical solution for reliable DOA estimation in low-SNR and data-scarce environments. Full article

(This article belongs to the Section Remote Sensors)

►▼ Show Figures

Figure 1

21 pages, 6304 KB

Open AccessArticle

Enhancing Gravitational Lens Study with Deep Learning: A Study on Effects of Dropout Regularization

by Juan Jordi Ancona-Flores, Alberto Hernández-Almada and Verónica Motta

Galaxies 2026, 14(2), 18; https://doi.org/10.3390/galaxies14020018 - 6 Mar 2026

Abstract

Strong gravitational lensing provides valuable insights into the mass distribution of galaxies and the nature of dark matter. However, its modeling is computationally demanding due to the large volume of strong lensing observations. In this work, we explore the application of Convolutional Neural Networks to infer physical parameters from simulated galaxy–galaxy lens systems, described by the Singular Isothermal Ellipsoid (SIE) profile for the galaxy lens. We construct a dataset of 76,396 synthetic lensing images derived from the China Space Station Telescope catalog and employ it to train a modified CNN model, based on the AlexNet architecture, to predict four key SIE parameters, the Einstein radius, the axis ratio and ellipticity components. We analyze the network performance under three distinct dropout configurations to quantify their influence on generalization and parameter inference accuracy. The results indicate that the incorporation of dropout is critical for enhancing the precision and robustness of the estimated parameters as demonstrated using a 4-fold cross-validation procedure. When dropout tools are included, we obtain coefficients of determination up to

R^{2} \sim 0.96

for most SIE parameters and mean peak signal-to-noise ratios of up to ∼37 dB. Relative to the configuration without dropout, the use of dropout reduces the relative errors in the inferred SIE parameters by approximately 60–76%, resulting in errors of at most ∼9% at the 90% confidence level for the majority of parameters. These findings highlight the potential of deep learning approaches to enable scalable, computationally efficient, and high-precision modeling of strong gravitational lensing systems. Full article

►▼ Show Figures

Figure 1

24 pages, 3661 KB

Open AccessArticle

A CNN-Based Model of Cross-Immunity to Influenza A(H3N2) Virus: Testing Under “Real-World” Conditions

by Marina N. Asatryan, Vaagn G. Agasaryan, Boris I. Timofeev, Ilya S. Shmyr, Dmitrii N. Shcherbinin, Elita R. Gerasimuk, Tatiana A. Timofeeva, Ivan F. Ershov, Tatiana A. Semenenko, Denis Yu. Logunov and Alexander L. Gintsburg

Viruses 2026, 18(3), 327; https://doi.org/10.3390/v18030327 - 6 Mar 2026

Abstract

A cross-immunity model for influenza A(H3N2) based on convolutional neural networks (CNNs) was developed and validated under temporally structured conditions that mimic real-world forecasting. Antigenic distance was derived from hemagglutination inhibition (HI) titers. The model was trained on WHO data (2011–2023) and tested in a time-split fashion on independent recent data (2022–2024). Hemagglutinin sequences (HA/HA1) were encoded into 3D tensors using five physicochemical indices from AAindex. Two- and three-layer CNN architectures were tested. Performance was evaluated using Accuracy, Sensitivity, Specificity, and Matthews Correlation Coefficient (MCC) with 95% confidence intervals. Validation on the classic Smith’s dataset showed high accuracy (Accuracy = 0.9996, MCC = 0.9964), serving as a necessary sanity check. Testing on current data yielded lower but robust results (Accuracy: 0.73–0.81, MCC: 0.48–0.60), reflecting real-world forecasting complexity. ROC analysis confirmed the strong discriminative ability (AUC ≥ 0.805) and good calibration (Brier scores ≤ 0.192). The three-layer CNN demonstrated greater robustness on challenging data. This CNN model is an effective tool for assessing influenza A(H3N2) antigenic distances and holds promise for integration into epidemiological models to aid vaccine strain selection. Further accuracy improvements may arise from modeling the structural impact of amino acid substitutions and polyclonal immune responses. Full article

(This article belongs to the Section General Virology)

►▼ Show Figures

Figure 1

24 pages, 1727 KB

Open AccessArticle

Symmetry-Guided Deep Generative Model for Multi-Step Evolution of Complex Dynamical Systems

by Ying Xu, Chengbo Zhu, Nannan Su, Yingying Wang and Ziqi Fan

Symmetry 2026, 18(3), 450; https://doi.org/10.3390/sym18030450 - 6 Mar 2026

Abstract

Complex dynamical systems are characterized by inherent nonlinearity, high dimensionality, spatiotemporal uncertainty, and implicit symmetry, posing fundamental challenges for their mathematical modeling and multi-step evolution prediction. For example, wind power exhibits strong randomness, intermittency, and latent temporal symmetry. To address this, this paper proposes a symmetry-guided deep generative model, the bi-directional recurrent generative adversarial network (BDR-GAN), for the multi-step rolling prediction of such systems. The BDR-GAN formalizes multi-step evolution as a conditional probability distribution learning problem. It systematically integrates three forms of symmetry to enhance modeling validity: bi-directional temporal symmetry captured by a BiLSTM-based generator, structural symmetry within the adversarial learning framework between the generator and a 1D-CNN discriminator, and rolling symmetry enabled by a recursive prediction strategy that supports cyclic state updates. Theoretical analysis demonstrates that this symmetry-embedded adversarial mechanism enables BDR-GAN to effectively approximate the underlying dynamic operators and the conditional distribution of future states, improving the learned model’s generalization. Experimental validation on wind power datasets confirms the framework’s superiority. Compared to benchmark models, BDR-GAN achieves superior prediction accuracy (e.g., RMSE 0.236, MAPE 5.12%), provides reliable uncertainty quantification (PICP 95.5%), and exhibits enhanced robustness against noise and variability. This work provides a generalizable, symmetry-guided modeling framework for the multi-step evolution of complex dynamical systems, offering theoretical and technical support for high-precision prediction in critical applications such as wind power integration and smart grid operation. Full article

(This article belongs to the Special Issue Application of Symmetry/Asymmetry and Machine Learning)

►▼ Show Figures

Figure 1

35 pages, 10654 KB

Open AccessArticle

An Empirical Measurement of Lighting Technology Changeover in New York City with Deep Learning

by Lan Yu, Mary Manz, Mohit S. Sharma, Andreas Karpf, Federica B. Bianco and Gregory Dobler

Remote Sens. 2026, 18(5), 799; https://doi.org/10.3390/rs18050799 - 5 Mar 2026

Abstract

Replacing inefficient lighting with energy-efficient alternatives is a proven way to reduce urban energy use, yet evaluating such policies remains challenging. For example, in 2013, New York City (NYC) initiated a program to replace 250,000 high-pressure sodium (HPS) streetlights with light-emitting diodes (LEDs) by 2017, but no subsequent evaluation was published. Here, we employ ground-based hyperspectral imaging (HSI; 0.4–1.0 microns, ∼850 bands) observations from the “Urban Observatory” (UO), obtained in 2013 and 2018, to quantitatively characterize this technological transition. Following co-registration, artifact removal, and source identification, we classified individual light source technologies using both a maximum correlation approach with spectral templates of known lighting types and a one-dimensional Convolutional Neural Network (1D-CNN) trained on 1321 manually labeled spectra, achieving an average precision of ∼92% for the 2013 data and ∼94% for the 2018 data across technology classes. Scene-level mixture modeling indicates a reduction in the HPS-to-LED brightness ratio from 1.15 (2013) to 0.27 (2018), demonstrating the capability of longitudinal HSI for evaluating urban lighting policy outcomes. Full article

23 pages, 10789 KB

Open AccessArticle

Statistical Feature Engineering for Robot Failure Detection: A Comparative Study of Machine Learning and Deep Learning Classifiers

by Sertaç Savaş

Sensors 2026, 26(5), 1649; https://doi.org/10.3390/s26051649 - 5 Mar 2026

Abstract

Industrial robots are widely used in critical tasks such as assembly, welding, and material handling as core components of modern manufacturing systems. For the reliable operation of these systems, early and accurate detection of execution failures is crucial. In this study, a comprehensive comparison of machine learning and deep learning methods is conducted for the classification of robot execution failures using data acquired from force–torque sensors. Three different feature engineering approaches are proposed. The first is a Baseline approach that includes 90 raw time-series features. The second is the Domain-6 approach, which consists of 6 basic statistical features per sensor (36 in total). The third is the Domain-12 approach, which comprises 12 comprehensive statistical features per sensor (72 in total). The domain features include the mean, standard deviation, minimum, maximum, range, slope, median, skewness, kurtosis, RMS, energy, and IQR. In total, ten classification algorithms are evaluated, including eight machine learning methods and two deep learning models: Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (KNN), Artificial Neural Network (ANN), Naive Bayes (NB), Decision Trees (DT), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM-LGBM), as well as a One-Dimensional Convolutional Neural Network (CNN-1D) and Long Short-Term Memory (LSTM). For traditional machine learning algorithms, 5 × 5 nested cross-validation is used, whereas for deep learning models, 5-fold cross-validation with a 20% validation split is employed. To ensure statistical reliability, all experiments are repeated over 30 independent runs. The experimental results demonstrate that feature engineering has a decisive impact on classification performance. In addition, regardless of the feature set, the highest accuracy (93.85% ± 0.90) is achieved by the Naive Bayes classifier using the Baseline features. The Domain-12 feature set provides consistent improvements across many algorithms, with substantial performance gains. The results are reported using accuracy, precision, recall, and F1-score metrics and are supported by confusion matrices. Finally, permutation feature importance analysis indicates that the skewness features of the Fx and Fy sensors are the most critical variables for failure detection. Overall, these findings show that time-domain statistical features offer an effective approach for robot failure classification. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

►▼ Show Figures

Figure 1

21 pages, 5786 KB

Open AccessArticle

Uncertainty3D: A Lightweight Tri-Dimensional Uncertainty Framework for CNN-Based Active Learning in Object Detection

by Qing Li, Chunhe Xia, Zhipeng Zhang and Wenting Ma

Appl. Sci. 2026, 16(5), 2503; https://doi.org/10.3390/app16052503 - 5 Mar 2026

Abstract

In object detection, annotation cost and computational efficiency are important factors in iterative model improvement under standard benchmark settings. Active learning (AL) addresses this challenge by selecting informative samples for labeling; however, many detection-oriented AL methods incur substantial overhead due to repeated inference (e.g., augmentation-based consistency). This paper introduces Uncertainty3D, a lightweight uncertainty proxy designed for standard CNN-based object detectors. It leverages native pre-NMS predictions to estimate sample informativeness using a single forward pass. We propose a tri-dimensional formulation that captures inconsistencies in position, scale, and category across proposal-consistent predictions. Experiments on PASCAL VOC and MS COCO using representative CNN-based detectors (Faster R-CNN and RetinaNet) show competitive mAP versus representative baselines and about 3–

4 \times

faster uncertainty estimation than augmentation-based baselines. Full article

►▼ Show Figures

Figure 1

27 pages, 5957 KB

Open AccessArticle

A Study of the Three-Dimensional Localization of an Underwater Glider Hull Using a Hierarchical Convolutional Neural Network Vision Encoder and a Variable Mixture-of-Experts Transformer

by Jungwoo Lee, Ji-Hyun Park, Jeong-Hwan Hwang, Kyoungseok Noh and Jinho Suh

Remote Sens. 2026, 18(5), 793; https://doi.org/10.3390/rs18050793 - 5 Mar 2026

Abstract

Although underwater gliders are highly energy-efficient platforms capable of long-duration and large-scale ocean observation, their lack of self-propulsion requires external assistance for recovery upon mission completion. In harsh and dynamic marine environments, reliably detecting the glider and accurately estimating its three-dimensional position are critical to ensuring the recovery operations are safe and efficient. This paper proposes a perception framework based on deep learning to detect underwater glider hulls and estimate their three-dimensional relative positions using camera–sonar multi-sensor fusion. This approach integrates a hierarchical convolutional neural network (CNN) vision encoder and a transformer-based architecture to estimate the glider’s spatial location and heading direction simultaneously. The hierarchical CNN encoder extracts multi-level, semantically rich visual features, thereby improving robustness to visual degradation and environmental disturbances common in underwater settings. Additionally, the transformer incorporates a variable mixture-of-experts (vMoE) mechanism that adaptively allocates expert networks across layers, enhancing representational capacity while maintaining computational efficiency. The resulting pose estimates enable precise, collision-free ROV navigation for automated recovery and onboard sensor inspection tasks. Experimental results, including ablation studies, validate the effectiveness of the proposed components and demonstrate their contributions to accurate glider hull detection and three-dimensional localization. Overall, the proposed framework provides a scalable, reliable perception solution that allows for the safe, autonomous recovery of underwater gliders with an ROV in realistic ocean environments. Full article

(This article belongs to the Special Issue Multi-Source Data Fusion and Feature Extraction for Underwater Target Detection)

►▼ Show Figures

Figure 1

33 pages, 4786 KB

Open AccessArticle

A Hierarchical Multi-View Deep Learning Framework for Autism Classification Using Structural and Functional MRI

by Nayif Mohammed Hammash and Mohammed Chachan Younis

J. Imaging 2026, 12(3), 109; https://doi.org/10.3390/jimaging12030109 - 4 Mar 2026

Viewed by 28

Abstract

Autism classification is challenging due to the subtle, heterogeneous, and overlapping neural activation profiles that occur in individuals with autism. Novel deep learning approaches, such as Convolutional Neural Networks (CNNs) and their variants, as well as Transformers, have shown moderate performance in discriminating between autism and normal cohorts; yet, they often struggle to jointly capture the spatial–structural and temporal–functional variations present in autistic brains. To overcome these shortcomings, we propose a novel hierarchical deep learning framework that extracts the inherent spatial dependencies from the dual-modal MRI scans. For sMRI, we develop a 3D Hierarchical Convolutional Neural Network to capture both fine and coarse anatomical structures via multi-view projections along the axial, sagittal, and coronal planes. For the fMRI case, we introduced a bidirectional LSTM-based temporal encoder to examine regional brain dynamics and functional connectivity. The sequential embeddings and correlations are combined into a unified spatiotemporal representation of functional imaging, which is then classified using a multilayer perceptron to ensure continuity in diagnostic predictions across the examined modalities. Finally, a cross-modality fusion scheme was employed to integrate feature representations of both modalities. Extensive evaluations on the ABIDE I dataset (NYU repository) demonstrate that our proposed framework outperforms existing baselines, including Vision/Swin Transformers and various newly developed CNN variants. For the sMRI branch, we achieved 90.19 ± 0.12% accuracy (precision: 90.85 ± 0.16%, recall: 89.27 ± 0.19%, F1-score: 90.05 ± 0.14%, and focal loss: 0.3982). For the fMRI branch, we achieved an accuracy of 88.93 ± 0.15% (precision: 89.78 ± 0.18%, recall: 88.29 ± 0.20%, F1-score: 89.03 ± 0.17%, and focal loss of 0.4437). These outcomes affirm the superior generalization and robustness of the proposed framework for integrating structural and functional brain representations to achieve accurate autism classification. Full article

(This article belongs to the Section Medical Imaging)

►▼ Show Figures

Figure 1

13 pages, 1465 KB

Open AccessArticle

Data Augmentation via Auxiliary Classifier GAN for Enhanced Modeling of Gallium Nitride HEMT Devices

by Yifei Liu, Yihan Qian, Yefeng Hu and Ye Wu

Electronics 2026, 15(5), 1067; https://doi.org/10.3390/electronics15051067 - 4 Mar 2026

Viewed by 41

Abstract

Accurate and efficient modeling of AlGaN/GaN HEMTs is essential for the design of next-generation power electronics. This study introduces a hybrid Auxiliary Classifier Generative Adversarial Network (ACGAN)–mixup data augmentation framework to enhance deep neural network application in AlGaN/GaN high-electron-mobility transistor modeling with limited [...] Read more.

I_{d s}

) under various electronic bias conditions. The quality of the generated data is validated via Jensen–Shannon divergence with an average of 0.0341. A one-dimensional convolutional neural network (1D-CNN) predictive model is trained on augmented data and achieves stable convergence, with a mean absolute error of 0.002 A/mm for the off-state

I_{d s}

and 0.052 A/mm for the linear region. It also shows improved robustness over the model trained on original non-augmented data. The proposed approach offers a low-cost alternative to resource-intensive TCAD simulations, enabling accurate device modeling with limited data. Full article

►▼ Show Figures

Figure 1

27 pages, 7237 KB

Open AccessArticle

Multiperiod EV Charging Demand Projections: Multistage 1D-CNN Adoption Forecasting and Agent-Based Simulation

by Bunga Kharissa Laras Kemala, Isti Surjandari and Zulkarnain Zulkarnain

World Electr. Veh. J. 2026, 17(3), 125; https://doi.org/10.3390/wevj17030125 - 2 Mar 2026

Viewed by 166

Abstract

As a promising alternative for cleaner vehicles, the growth of Battery Electric Vehicle (BEV) adoption should be supported by a reliable charging infrastructure. Therefore, projecting the charging load is required to ensure that the electricity supply is adequate as BEV adoption increases. This study proposes a multistage approach for projecting BEV charging load demand, linking a One-dimensional Convolutional Neural Network (1D-CNN) forecasting model with BEV users’ travel behavior analysis to perform spatiotemporal agent-based trip and charging simulations, which model various types of BEVs traveling across multiple regions. The 1D-CNN model achieves high performance with an RMSE of 0.073 and an R² of 0.881, providing a 10-year BEV adoption outlook. The empirical study in nine regions of Greater Jakarta, Indonesia, shows the one-week temporal charging load demand for three milestone periods—2025, 2030, and 2035—exploring weekday and weekend demand, as well as home and public charging demand at points of interest (POIs). This study identifies a difference between aggregate charging load demand and per-vehicle load intensity: the aggregate demand concentration occurs in South Jakarta (21% for public charging and 22% for home charging), while the highest per-vehicle spatial concentration ratio occurs in Depok (36% for public charging and 16% for home charging) due to long-distance travel patterns. The distribution of charging demand at the subdistrict level provides a basis for charging infrastructure placement, transformer sizing, and charging tariff design. Full article

(This article belongs to the Section Charging Infrastructure and Grid Integration)

►▼ Show Figures

Graphical abstract

28 pages, 2976 KB

Open AccessArticle

DeepHits: A Multimodal CNN Approach to Hit Song Prediction

by Michael Nofer, Valdrin Nimani and Oliver Hinz

Mach. Learn. Knowl. Extr. 2026, 8(3), 58; https://doi.org/10.3390/make8030058 - 2 Mar 2026

Viewed by 604

Abstract

Hit Song Science aims to forecast a song’s success before release and benefits from integrating signals beyond audio content alone. We present DeepHits, an end-to-end multimodal network that combines (i) log-Mel spectrogram embeddings from a compact residual 2D-CNN, (ii) frozen multilingual BERT lyric embeddings, and (iii) structured numeric features including high-level Spotify audio descriptors and contextual metadata (artist popularity, release year). Evaluated on 92,517 tracks from the SpotGenTrack dataset, DeepHits achieves a macro-F1 of 52.20% (accuracy 82.63%) in the established three-class setting and a macro-F1 of 23.15% (accuracy 37.00%) in a ten-class decile benchmark. To contextualize fine-grained performance, we report capacity-controlled shallow baselines, including metadata-only and early/late fusion variants, and show that the deep multimodal model provides a clear gain over these references (e.g., metadata-only: macro-F1 20.92%; accuracy 34.22%). Ablation results indicate that removing metadata yields the largest degradation in class-balanced performance, highlighting the strong predictive value of artist popularity and release year. Overall, DeepHits provides a reproducible benchmark and modality analysis for fine-grained popularity prediction under class imbalance. Full article

►▼ Show Figures

Figure 1

17 pages, 2985 KB

Open AccessArticle

Automated BRDF Measurement for Aerospace Materials and 1D-CNN-Based Estimation of Mixed-Material Composition

by Depu Yao, Yulai Sun, Limin He, Heng Wu, Guanyu Lin, Jianing Wang and Zihui Zhang

Sensors 2026, 26(5), 1560; https://doi.org/10.3390/s26051560 - 2 Mar 2026

Viewed by 101

Abstract

With the growing global emphasis on space resources, the significance of space detection and surveillance technologies has escalated. Currently, space-based optical surveillance stands as the primary means for acquiring information on space objects. However, constrained by the diffraction limits of space telescopes, distant space objects are typically imaged as point sources. The resulting lack of sufficient spatial resolution renders traditional image-based recognition algorithms ineffective. In contrast, the Bidirectional Reflectance Distribution Function (BRDF) fully characterizes surface light scattering properties through four-dimensional features, significantly outperforming traditional two-dimensional spectral techniques in material identification. Consequently, leveraging BRDF signatures at varying phase angles has emerged as an effective approach for Space Object Identification. In this study, we developed an automated BRDF measurement system to characterize various typical aerospace materials and investigated the BRDF properties of mixed-material surfaces. A material composition ratio prediction model was constructed based on a One-Dimensional Convolutional Neural Network (1D-CNN). This model effectively extracts key features, including local slope variations and global waveform characteristics, from the BRDF curves. Experimental results demonstrate that the model achieves a maximum relative percentage error of 6.21%, implying a prediction accuracy for mixed-material composition ratios consistently exceeding 93.79%. Compared to image classification methods based on remote sensing imagery, the proposed approach offers higher computational efficiency, significantly reduced model complexity and computational cost, and enhanced robustness. This work provides essential data support for material identification by space-based telescopes and establishes an algorithmic and experimental foundation for intelligent space situational awareness systems. Full article

(This article belongs to the Section Optical Sensors)

►▼ Show Figures

Figure 1

17 pages, 1732 KB

Open AccessArticle

Lightweight Visual Dynamic Gesture Recognition System Based on CNN-LSTM-DSA

by Zhenxing Wang, Ziyan Wu, Ruidi Qi and Xuan Dou

Sensors 2026, 26(5), 1558; https://doi.org/10.3390/s26051558 - 2 Mar 2026

Viewed by 131

Abstract

Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, extracts 21 keypoint 3D coordinates using MediaPipe, and employs a lightweight hybrid model to perform spatial and temporal feature modeling on keypoint sequences, achieving high-precision recognition of complex dynamic gestures. In static gesture recognition, the system determines the gesture state through joint angle calculation and a sliding window smoothing algorithm, ensuring smooth mapping of the servo motor angles and stability of the robotic hand’s movements. In dynamic gesture recognition, the system models the key point time series based on the CNN-LSTM-DSA hybrid model, enabling accurate classification and reproduction of gesture actions. Experimental results show that the proposed system demonstrates good robustness under various lighting and background conditions, with a static gesture recognition accuracy of up to 96%, dynamic gesture recognition accuracy of 90.19%, and an overall response delay of less than 300 ms. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 52.

Go to page 1 2 3 4 5

Search Results (2,557)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI