Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (121)

Search Parameters:
Keywords = human motion state recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 16941 KiB  
Article
KAN-Sense: Keypad Input Recognition via CSI Feature Clustering and KAN-Based Classifier
by Minseok Koo and Jaesung Park
Electronics 2025, 14(15), 2965; https://doi.org/10.3390/electronics14152965 - 24 Jul 2025
Abstract
Wi-Fi sensing leverages variations in CSI (channel state information) to infer human activities in a contactless and low-cost manner, with growing applications in smart homes, healthcare, and security. While deep learning has advanced macro-motion sensing tasks, micro-motion sensing such as keypad stroke recognition [...] Read more.
Wi-Fi sensing leverages variations in CSI (channel state information) to infer human activities in a contactless and low-cost manner, with growing applications in smart homes, healthcare, and security. While deep learning has advanced macro-motion sensing tasks, micro-motion sensing such as keypad stroke recognition remains underexplored due to subtle inter-class CSI variations and significant intra-class variance. These challenges make it difficult for existing deep learning models typically relying on fully connected MLPs to accurately recognize keypad inputs. To address the issue, we propose a novel approach that combines a discriminative feature extractor with a Kolmogorov–Arnold Network (KAN)-based classifier. The combined model is trained to reduce intra-class variability by clustering features around class-specific centers. The KAN classifier learns nonlinear spline functions to efficiently delineate the complex decision boundaries between different keypad inputs with fewer parameters. To validate our method, we collect a CSI dataset with low-cost Wi-Fi devices (ESP8266 and Raspberry Pi 4) in a real-world keypad sensing environment. Experimental results verify the effectiveness and practicality of our method for keypad input sensing applications in that it outperforms existing approaches in sensing accuracy while requiring fewer parameters. Full article
Show Figures

Figure 1

18 pages, 9571 KiB  
Article
TCN-MAML: A TCN-Based Model with Model-Agnostic Meta-Learning for Cross-Subject Human Activity Recognition
by Chih-Yang Lin, Chia-Yu Lin, Yu-Tso Liu, Yi-Wei Chen, Hui-Fuang Ng and Timothy K. Shih
Sensors 2025, 25(13), 4216; https://doi.org/10.3390/s25134216 - 6 Jul 2025
Viewed by 269
Abstract
Human activity recognition (HAR) using Wi-Fi-based sensing has emerged as a powerful, non-intrusive solution for monitoring human behavior in smart environments. Unlike wearable sensor systems that require user compliance, Wi-Fi channel state information (CSI) enables device-free recognition by capturing variations in signal propagation [...] Read more.
Human activity recognition (HAR) using Wi-Fi-based sensing has emerged as a powerful, non-intrusive solution for monitoring human behavior in smart environments. Unlike wearable sensor systems that require user compliance, Wi-Fi channel state information (CSI) enables device-free recognition by capturing variations in signal propagation caused by human motion. This makes Wi-Fi sensing highly attractive for ambient healthcare, security, and elderly care applications. However, real-world deployment faces two major challenges: (1) significant cross-subject signal variability due to physical and behavioral differences among individuals, and (2) limited labeled data, which restricts model generalization. To address these sensor-related challenges, we propose TCN-MAML, a novel framework that integrates temporal convolutional networks (TCN) with model-agnostic meta-learning (MAML) for efficient cross-subject adaptation in data-scarce conditions. We evaluate our approach on a public Wi-Fi CSI dataset using a strict cross-subject protocol, where training and testing subjects do not overlap. The proposed TCN-MAML achieves 99.6% accuracy, demonstrating superior generalization and efficiency over baseline methods. Experimental results confirm the framework’s suitability for low-power, real-time HAR systems embedded in IoT sensor networks. Full article
(This article belongs to the Special Issue Sensors and Sensing Technologies for Object Detection and Recognition)
Show Figures

Figure 1

25 pages, 2723 KiB  
Article
A Human-Centric, Uncertainty-Aware Event-Fused AI Network for Robust Face Recognition in Adverse Conditions
by Akmalbek Abdusalomov, Sabina Umirzakova, Elbek Boymatov, Dilnoza Zaripova, Shukhrat Kamalov, Zavqiddin Temirov, Wonjun Jeong, Hyoungsun Choi and Taeg Keun Whangbo
Appl. Sci. 2025, 15(13), 7381; https://doi.org/10.3390/app15137381 - 30 Jun 2025
Cited by 1 | Viewed by 271
Abstract
Face recognition systems often falter when deployed in uncontrolled settings, grappling with low light, unexpected occlusions, motion blur, and the degradation of sensor signals. Most contemporary algorithms chase raw accuracy yet overlook the pragmatic need for uncertainty estimation and multispectral reasoning rolled into [...] Read more.
Face recognition systems often falter when deployed in uncontrolled settings, grappling with low light, unexpected occlusions, motion blur, and the degradation of sensor signals. Most contemporary algorithms chase raw accuracy yet overlook the pragmatic need for uncertainty estimation and multispectral reasoning rolled into a single framework. This study introduces HUE-Net—a Human-centric, Uncertainty-aware, Event-fused Network—designed specifically to thrive under severe environmental stress. HUE-Net marries the visible RGB band with near-infrared (NIR) imagery and high-temporal-event data through an early-fusion pipeline, proven more responsive than serial approaches. A custom hybrid backbone that couples convolutional networks with transformers keeps the model nimble enough for edge devices. Central to the architecture is the perturbed multi-branch variational module, which distills probabilistic identity embeddings while delivering calibrated confidence scores. Complementing this, an Adaptive Spectral Attention mechanism dynamically reweights each stream to amplify the most reliable facial features in real time. Unlike previous efforts that compartmentalize uncertainty handling, spectral blending, or computational thrift, HUE-Net unites all three in a lightweight package. Benchmarks on the IJB-C and N-SpectralFace datasets illustrate that the system not only secures state-of-the-art accuracy but also exhibits unmatched spectral robustness and reliable probability calibration. The results indicate that HUE-Net is well-positioned for forensic missions and humanitarian scenarios where trustworthy identification cannot be deferred. Full article
Show Figures

Figure 1

16 pages, 6543 KiB  
Article
IoT-Edge Hybrid Architecture with Cross-Modal Transformer and Federated Manifold Learning for Safety-Critical Gesture Control in Adaptive Mobility Platforms
by Xinmin Jin, Jian Teng and Jiaji Chen
Future Internet 2025, 17(7), 271; https://doi.org/10.3390/fi17070271 - 20 Jun 2025
Viewed by 641
Abstract
This research presents an IoT-empowered adaptive mobility framework that integrates high-dimensional gesture recognition with edge-cloud orchestration for safety-critical human–machine interaction. The system architecture establishes a three-tier IoT network: a perception layer with 60 GHz FMCW radar and TOF infrared arrays (12-node mesh topology, [...] Read more.
This research presents an IoT-empowered adaptive mobility framework that integrates high-dimensional gesture recognition with edge-cloud orchestration for safety-critical human–machine interaction. The system architecture establishes a three-tier IoT network: a perception layer with 60 GHz FMCW radar and TOF infrared arrays (12-node mesh topology, 15 cm baseline spacing) for real-time motion tracking; an edge intelligence layer deploying a time-aware neural network via NVIDIA Jetson Nano to achieve up to 99.1% recognition accuracy with latency as low as 48 ms under optimal conditions (typical performance: 97.8% ± 1.4% accuracy, 68.7 ms ± 15.3 ms latency); and a federated cloud layer enabling distributed model synchronization across 32 edge nodes via LoRaWAN-optimized protocols (κ = 0.912 consensus). A reconfigurable chassis with three operational modes (standing, seated, balance) employs IoT-driven kinematic optimization for enhanced adaptability and user safety. Using both radar and infrared sensors together reduces false detections to 0.08% even under high-vibration conditions (80 km/h), while distributed learning across multiple devices maintains consistent accuracy (variance < 5%) in different environments. Experimental results demonstrate 93% reliability improvement over HMM baselines and 3.8% accuracy gain over state-of-the-art LSTM models, while achieving 33% faster inference (48.3 ms vs. 72.1 ms). The system maintains industrial-grade safety certification with energy-efficient computation. Bridging adaptive mechanics with edge intelligence, this research pioneers a sustainable IoT-edge paradigm for smart mobility, harmonizing real-time responsiveness, ecological sustainability, and scalable deployment in complex urban ecosystems. Full article
(This article belongs to the Special Issue Convergence of IoT, Edge and Cloud Systems)
Show Figures

Figure 1

19 pages, 30474 KiB  
Article
Multi-Head Attention-Based Framework with Residual Network for Human Action Recognition
by Basheer Al-Tawil, Magnus Jung, Thorsten Hempel and Ayoub Al-Hamadi
Sensors 2025, 25(9), 2930; https://doi.org/10.3390/s25092930 - 6 May 2025
Viewed by 710
Abstract
Human action recognition (HAR) is essential for understanding and classifying human movements. It is widely used in real-life applications such as human–computer interaction and assistive robotics. However, recognizing patterns across different temporal scales remains challenging. Traditional methods struggle with complex timing patterns, intra-class [...] Read more.
Human action recognition (HAR) is essential for understanding and classifying human movements. It is widely used in real-life applications such as human–computer interaction and assistive robotics. However, recognizing patterns across different temporal scales remains challenging. Traditional methods struggle with complex timing patterns, intra-class variability, and inter-class similarities, leading to misclassifications. In this paper, we propose a deep learning framework for efficient and robust HAR. It integrates residual networks (ResNet-18) for spatial feature extraction and Bi-LSTM for temporal feature extraction. A multi-head attention mechanism enhances the prioritization of crucial motion details. Additionally, we introduce a motion-based frame selection strategy utilizing optical flow to reduce redundancy and enhance efficiency. This ensures accurate, real-time recognition of both simple and complex actions. We evaluate the framework on the UCF-101 dataset, achieving a 96.60% accuracy, demonstrating competitive performance against state-of-the-art approaches. Moreover, the framework operates at 222 frames per second (FPS), achieving an optimal balance between recognition performance and computational efficiency. The proposed framework was also deployed and tested on a mobile service robot, TIAGo, validating its real-time applicability in real-world scenarios. It effectively models human actions while minimizing frame dependency, making it well-suited for real-time applications. Full article
Show Figures

Figure 1

18 pages, 3238 KiB  
Article
Multi-Grained Temporal Clip Transformer for Skeleton-Based Human Activity Recognition
by Peiwang Zhu, Chengwu Liang, Yalong Liu and Songqi Jiang
Appl. Sci. 2025, 15(9), 4768; https://doi.org/10.3390/app15094768 - 25 Apr 2025
Cited by 1 | Viewed by 587
Abstract
Skeleton-based human activity recognition is a key research topic in the fields of deep learning and computer vision. However, existing approaches are less effective at capturing short-term sub-action information at different granularity levels and long-term motion correlations, which affect recognition accuracy. To overcome [...] Read more.
Skeleton-based human activity recognition is a key research topic in the fields of deep learning and computer vision. However, existing approaches are less effective at capturing short-term sub-action information at different granularity levels and long-term motion correlations, which affect recognition accuracy. To overcome these challenges, an innovative multi-grained temporal clip transformer (MTC-Former) is proposed. Firstly, based on the transformer backbone, a multi-grained temporal clip attention (MTCA) module with multi-branch architecture is proposed to capture the characteristics of short-term sub-action features. Secondly, an innovative multi-scale spatial–temporal feature interaction module is proposed to jointly learn sub-action dependencies and facilitate skeletal motion interactions, where long-range motion patterns are embedded to enhance correlation modeling. Experiments were conducted on three datasets, including NTU RGB+D, NTU RGB+D 120, and InHARD, and achieved state-of-the-art Top-1 recognition accuracy, demonstrating the superiority of the proposed MTC-Former. Full article
Show Figures

Figure 1

22 pages, 8938 KiB  
Article
Enhancing Hand Gesture Image Recognition by Integrating Various Feature Groups
by Ismail Taha Ahmed, Wisam Hazim Gwad, Baraa Tareq Hammad and Entisar Alkayal
Technologies 2025, 13(4), 164; https://doi.org/10.3390/technologies13040164 - 19 Apr 2025
Cited by 1 | Viewed by 1007
Abstract
Human gesture image recognition is the process of identifying, deciphering, and classifying human gestures in images or video frames using computer vision algorithms. These gestures can vary from the simplest hand motions, body positions, and facial emotions to complicated gestures. Two significant problems [...] Read more.
Human gesture image recognition is the process of identifying, deciphering, and classifying human gestures in images or video frames using computer vision algorithms. These gestures can vary from the simplest hand motions, body positions, and facial emotions to complicated gestures. Two significant problems affecting the performance of human gesture picture recognition methods are ambiguity and invariance. Ambiguity occurs when gestures have the same shape but different orientations, while invariance guarantees that gestures are correctly classified even when scale, lighting, or orientation varies. To overcome this issue, hand-crafted features can be combined with deep learning to greatly improve the performance of hand gesture image recognition models. This combination improves the model’s overall accuracy and dependability in identifying a variety of hand movements by enhancing its capacity to record both shape and texture properties. Thus, in this study, we propose a hand gesture recognition method that combines Reset50 model feature extraction with the Tamura texture descriptor and uses the adaptability of GAM to represent intricate interactions between the features. Experiments were carried out on publicly available datasets containing images of American Sign Language (ASL) gestures. As Tamura-ResNet50-OptimizedGAM achieved the highest accuracy rate in the ASL datasets, it is believed to be the best option for human gesture image recognition. According to the experimental results, the accuracy rate was 96%, which is higher than the total accuracy of the state-of-the-art techniques currently in use. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

28 pages, 6367 KiB  
Article
Human Action Recognition from Videos Using Motion History Mapping and Orientation Based Three-Dimensional Convolutional Neural Network Approach
by Ishita Arora and M. Gangadharappa
Modelling 2025, 6(2), 33; https://doi.org/10.3390/modelling6020033 - 18 Apr 2025
Viewed by 1427
Abstract
Human Activity Recognition (HAR) has recently attracted the attention of researchers. Human behavior and human intention are driving the intensification of HAR research rapidly. This paper proposes a novel Motion History Mapping (MHI) and Orientation-based Convolutional Neural Network (CNN) framework for action recognition [...] Read more.
Human Activity Recognition (HAR) has recently attracted the attention of researchers. Human behavior and human intention are driving the intensification of HAR research rapidly. This paper proposes a novel Motion History Mapping (MHI) and Orientation-based Convolutional Neural Network (CNN) framework for action recognition and classification using Machine Learning. The proposed method extracts oriented rectangular patches over the entire human body to represent the human pose in an action sequence. This distribution is represented by a spatially oriented histogram. The frames were trained with a 3D Convolution Neural Network model, thus saving time and increasing the Classification Correction Rate (CCR). The K-Nearest Neighbor (KNN) algorithm is used for the classification of human actions. The uniqueness of our model lies in the combination of Motion History Mapping approach with an Orientation-based 3D CNN, thereby enhancing precision. The proposed method is demonstrated to be effective using four widely used and challenging datasets. A comparison of the proposed method’s performance with current state-of-the-art methods finds that its Classification Correction Rate is higher than that of the existing methods. Our model’s CCRs are 92.91%, 98.88%, 87.97.% and 87.77% which are remarkably higher than the existing techniques for KTH, Weizmann, UT-Tower and YouTube datasets, respectively. Thus, our model significantly outperforms the existing models in the literature. Full article
Show Figures

Figure 1

10 pages, 1379 KiB  
Proceeding Paper
Recognizing Human Emotions Through Body Posture Dynamics Using Deep Neural Networks
by Arunnehru Jawaharlalnehru, Thalapathiraj Sambandham and Dhanasekar Ravikumar
Eng. Proc. 2025, 87(1), 49; https://doi.org/10.3390/engproc2025087049 - 16 Apr 2025
Viewed by 857
Abstract
Body posture dynamics have garnered significant attention in recent years due to their critical role in understanding the emotional states conveyed through human movements during social interactions. Emotions are typically expressed through facial expressions, voice, gait, posture, and overall body dynamics. Among these, [...] Read more.
Body posture dynamics have garnered significant attention in recent years due to their critical role in understanding the emotional states conveyed through human movements during social interactions. Emotions are typically expressed through facial expressions, voice, gait, posture, and overall body dynamics. Among these, body posture provides subtle yet essential cues about emotional states. However, predicting an individual’s gait and posture dynamics poses challenges, given the complexity of human body movement, which involves numerous degrees of freedom compared to facial expressions. Moreover, unlike static facial expressions, body dynamics are inherently fluid and continuously evolving. This paper presents an effective method for recognizing 17 micro-emotions by analyzing kinematic features from the GEMEP dataset using video-based motion capture. We specifically focus on upper body posture dynamics (skeleton points and angle), capturing movement patterns and their dynamic range over time. Our approach addresses the complexity of recognizing emotions from posture and gait by focusing on key elements of kinematic gesture analysis. The experimental results demonstrate the effectiveness of the proposed model, achieving a high accuracy rate of 91.48% for angle metric + DNN and 93.89% for distance + DNN on the GEMEP dataset using a deep neural network (DNN). These findings highlight the potential for our model to advance posture-based emotion recognition, particularly in applications where human body dynamics distance and angle are key indicators of emotional states. Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

12 pages, 8262 KiB  
Article
High-Sensitivity and Wide-Range Flexible Pressure Sensor Based on Gradient-Wrinkle Structures and AgNW-Coated PDMS
by Xiaoran Liu, Xinyi Wang, Tao Xue, Yingying Zhao and Qiang Zou
Micromachines 2025, 16(4), 468; https://doi.org/10.3390/mi16040468 - 15 Apr 2025
Cited by 1 | Viewed by 763
Abstract
Flexible pressure sensors have garnered significant attention due to their wide range of applications in human motion monitoring and smart wearable devices. However, the fabrication of pressure sensors that offer both high sensitivity and a wide detection range remains a challenging task. In [...] Read more.
Flexible pressure sensors have garnered significant attention due to their wide range of applications in human motion monitoring and smart wearable devices. However, the fabrication of pressure sensors that offer both high sensitivity and a wide detection range remains a challenging task. In this paper, we propose an AgNW-coated PDMS flexible piezoresistive sensor based on a gradient-wrinkle structure. By modifying the microstructure of PDMS, the sensor demonstrates varying sensitivities and pressure responses across different pressure ranges. The wrinkle microstructure contributes to high sensitivity (0.947 kPa−1) at low pressures, while the PDMS film with a gradient contact height ensures a continuous change in the contact area through the gradual activation of the contact wrinkles, resulting in a wide detection range (10–50 kPa). This paper also investigates the contact state of gradient-wrinkle films under different pressures to further elaborate on the sensor’s sensing mechanism. The sensor’s excellent performance in real-time response to touch behavior, joint motion, swallowing behavior recognition, and grasping behavior detection highlights its broad application prospects in human–computer interaction, human motion monitoring, and intelligent robotics. Full article
Show Figures

Figure 1

17 pages, 7673 KiB  
Article
Motion Pattern Recognition via CNN-LSTM-Attention Model Using Array-Based Wi-Fi CSI Sensors in GNSS-Denied Areas
by Ming Xia, Shengmao Que, Nanzhu Liu, Qu Wang and Tuan Li
Electronics 2025, 14(8), 1594; https://doi.org/10.3390/electronics14081594 - 15 Apr 2025
Viewed by 883
Abstract
Human activity recognition (HAR) is vital for applications in fields such as smart homes, health monitoring, and navigation, particularly in GNSS-denied environments where satellite signals are obstructed. Wi-Fi channel state information (CSI) has emerged as a key technology for HAR due to its [...] Read more.
Human activity recognition (HAR) is vital for applications in fields such as smart homes, health monitoring, and navigation, particularly in GNSS-denied environments where satellite signals are obstructed. Wi-Fi channel state information (CSI) has emerged as a key technology for HAR due to its wide coverage, low cost, and non-reliance on wearable devices. However, existing methods face challenges including significant data fluctuations, limited feature extraction capabilities, and difficulties in recognizing complex movements. This study presents a novel solution by integrating a multi-sensor array of Wi-Fi CSI with deep learning techniques to overcome these challenges. We propose a 2 × 2 array of Wi-Fi CSI sensors, which collects synchronized data from all channels within the CSI receivable range, improving data stability and providing reliable positioning in GNSS-denied environments. Using the CNN-LSTM-attention (C-L-A) framework, this method combines short- and long-term motion features, enhancing recognition accuracy. Experimental results show 98.2% accuracy, demonstrating superior recognition performance compared to single Wi-Fi receivers and traditional deep learning models. Our multi-sensor Wi-Fi CSI and deep learning approach significantly improve HAR accuracy, generalization, and adaptability, making it an ideal solution for GNSS-denied environments in applications such as autonomous navigation and smart cities. Full article
Show Figures

Figure 1

14 pages, 567 KiB  
Article
Efficient Human Activity Recognition Using Machine Learning and Wearable Sensor Data
by Ziwei Zhong and Bin Liu
Appl. Sci. 2025, 15(8), 4075; https://doi.org/10.3390/app15084075 - 8 Apr 2025
Viewed by 788
Abstract
With the rapid advancement of global development, there is an increasing demand for health monitoring technologies. Human activity recognition and monitoring systems offer a powerful means of identifying daily movement patterns, which helps in understanding human behaviors and provides valuable insights for life [...] Read more.
With the rapid advancement of global development, there is an increasing demand for health monitoring technologies. Human activity recognition and monitoring systems offer a powerful means of identifying daily movement patterns, which helps in understanding human behaviors and provides valuable insights for life management. This paper explores the issue of human motion state recognition using accelerometers and gyroscopes, proposing a human activity recognition system based on a majority decision model that integrates multiple machine learning algorithms. In this study, the majority decision model was compared with an integer programming model, and the accuracy was assessed through a confusion matrix and cross-validation based on a dataset generated from 10 volunteers performing 12 different human activities. The average activity recognition accuracy of the majority decision model can be as high as 91.92%. The results underscore the superior accuracy and efficiency of the majority decision model in human activity state recognition, highlighting its potential for practical applications in health monitoring systems. Full article
(This article belongs to the Special Issue Human Activity Recognition (HAR) in Healthcare, 2nd Edition)
Show Figures

Figure 1

57 pages, 8107 KiB  
Review
Machine Learning for Human Activity Recognition: State-of-the-Art Techniques and Emerging Trends
by Md Amran Hossen and Pg Emeroylariffion Abas
J. Imaging 2025, 11(3), 91; https://doi.org/10.3390/jimaging11030091 - 20 Mar 2025
Cited by 1 | Viewed by 3612
Abstract
Human activity recognition (HAR) has emerged as a transformative field with widespread applications, leveraging diverse sensor modalities to accurately identify and classify human activities. This paper provides a comprehensive review of HAR techniques, focusing on the integration of sensor-based, vision-based, and hybrid methodologies. [...] Read more.
Human activity recognition (HAR) has emerged as a transformative field with widespread applications, leveraging diverse sensor modalities to accurately identify and classify human activities. This paper provides a comprehensive review of HAR techniques, focusing on the integration of sensor-based, vision-based, and hybrid methodologies. It explores the strengths and limitations of commonly used modalities, such as RGB images/videos, depth sensors, motion capture systems, wearable devices, and emerging technologies like radar and Wi-Fi channel state information. The review also discusses traditional machine learning approaches, including supervised and unsupervised learning, alongside cutting-edge advancements in deep learning, such as convolutional and recurrent neural networks, attention mechanisms, and reinforcement learning frameworks. Despite significant progress, HAR still faces critical challenges, including handling environmental variability, ensuring model interpretability, and achieving high recognition accuracy in complex, real-world scenarios. Future research directions emphasise the need for improved multimodal sensor fusion, adaptive and personalised models, and the integration of edge computing for real-time analysis. Additionally, addressing ethical considerations, such as privacy and algorithmic fairness, remains a priority as HAR systems become more pervasive. This study highlights the evolving landscape of HAR and outlines strategies for future advancements that can enhance the reliability and applicability of HAR technologies in diverse domains. Full article
Show Figures

Figure 1

24 pages, 3877 KiB  
Article
A Hybrid Approach for Sports Activity Recognition Using Key Body Descriptors and Hybrid Deep Learning Classifier
by Muhammad Tayyab, Sulaiman Abdullah Alateyah, Mohammed Alnusayri, Mohammed Alatiyyah, Dina Abdulaziz AlHammadi, Ahmad Jalal and Hui Liu
Sensors 2025, 25(2), 441; https://doi.org/10.3390/s25020441 - 13 Jan 2025
Cited by 8 | Viewed by 1151
Abstract
This paper presents an approach for event recognition in sequential images using human body part features and their surrounding context. Key body points were approximated to track and monitor their presence in complex scenarios. Various feature descriptors, including MSER (Maximally Stable Extremal Regions), [...] Read more.
This paper presents an approach for event recognition in sequential images using human body part features and their surrounding context. Key body points were approximated to track and monitor their presence in complex scenarios. Various feature descriptors, including MSER (Maximally Stable Extremal Regions), SURF (Speeded-Up Robust Features), distance transform, and DOF (Degrees of Freedom), were applied to skeleton points, while BRIEF (Binary Robust Independent Elementary Features), HOG (Histogram of Oriented Gradients), FAST (Features from Accelerated Segment Test), and Optical Flow were used on silhouettes or full-body points to capture both geometric and motion-based features. Feature fusion was employed to enhance the discriminative power of the extracted data and the physical parameters calculated by different feature extraction techniques. The system utilized a hybrid CNN (Convolutional Neural Network) + RNN (Recurrent Neural Network) classifier for event recognition, with Grey Wolf Optimization (GWO) for feature selection. Experimental results showed significant accuracy, achieving 98.5% on the UCF-101 dataset and 99.2% on the YouTube dataset. Compared to state-of-the-art methods, our approach achieved better performance in event recognition. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

17 pages, 11589 KiB  
Article
Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
by Song Gao, Dingzhuo Zhang, Zhaoming Tang and Hongyan Wang
Sensors 2024, 24(23), 7609; https://doi.org/10.3390/s24237609 - 28 Nov 2024
Viewed by 1144
Abstract
Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system [...] Read more.
Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial–temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial–temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial–temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Back to TopTop