MDPI - Publisher of Open Access Journals

18 pages, 3407 KiB

Open AccessArticle

Graph Convolutional Network with Multi-View Topology for Lightweight Skeleton-Based Action Recognition

by Liangliang Wang, Xu Zhang and Chuang Zhang

Symmetry 2025, 17(8), 1235; https://doi.org/10.3390/sym17081235 (registering DOI) - 4 Aug 2025

Skeleton-based action recognition is an important subject in deep learning. Graph Convolutional Networks (GCNs) have demonstrated strong performance by modeling the human skeleton as a natural topological graph, representing the connections between joints. However, most existing methods rely on non-adaptive topologies or insufficiently [...] Read more.

Skeleton-based action recognition is an important subject in deep learning. Graph Convolutional Networks (GCNs) have demonstrated strong performance by modeling the human skeleton as a natural topological graph, representing the connections between joints. However, most existing methods rely on non-adaptive topologies or insufficiently expressive representations. To address these limitations, we propose a Multi-view Topology Refinement Graph Convolutional Network (MTR-GCN), which is efficient, lightweight, and delivers high performance. Specifically: (1) We propose a new spatial topology modeling approach that incorporates two views. A dynamic view fuses joint information from dual streams in a pairwise manner, while a static view encodes the shortest static paths between joints, preserving the original connectivity relationships. (2) We propose a new MultiScale Temporal Convolutional Network (MSTC), which is efficient and lightweight. (3) Furthermore, we introduce a new temporal topology strategy by modeling temporal frames as a graph, which strengthens the extraction of temporal features. By modeling the human skeleton as both a spatial and a temporal graph, we reveal a topological symmetry between space and time within the unified spatio-temporal framework. The proposed model achieves state-of-the-art performance on several benchmark datasets, including NTU RGB + D (XSub: 92.8%, XView: 96.8%), NTU RGB + D 120 (XSub: 89.6%, XSet: 90.8%), and NW-UCLA (95.7%), demonstrating the effectiveness of our GCN module, TCN module, and overall architecture. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

20 pages, 4569 KiB

Open AccessArticle

Lightweight Vision Transformer for Frame-Level Ergonomic Posture Classification in Industrial Workflows

by Luca Cruciata, Salvatore Contino, Marianna Ciccarelli, Roberto Pirrone, Leonardo Mostarda, Alessandra Papetti and Marco Piangerelli

Sensors 2025, 25(15), 4750; https://doi.org/10.3390/s25154750 (registering DOI) - 1 Aug 2025

Viewed by 178

Abstract

Work-related musculoskeletal disorders (WMSDs) are a leading concern in industrial ergonomics, often stemming from sustained non-neutral postures and repetitive tasks. This paper presents a vision-based framework for real-time, frame-level ergonomic risk classification using a lightweight Vision Transformer (ViT). The proposed system operates directly [...] Read more.

Work-related musculoskeletal disorders (WMSDs) are a leading concern in industrial ergonomics, often stemming from sustained non-neutral postures and repetitive tasks. This paper presents a vision-based framework for real-time, frame-level ergonomic risk classification using a lightweight Vision Transformer (ViT). The proposed system operates directly on raw RGB images without requiring skeleton reconstruction, joint angle estimation, or image segmentation. A single ViT model simultaneously classifies eight anatomical regions, enabling efficient multi-label posture assessment. Training is supervised using a multimodal dataset acquired from synchronized RGB video and full-body inertial motion capture, with ergonomic risk labels derived from RULA scores computed on joint kinematics. The system is validated on realistic, simulated industrial tasks that include common challenges such as occlusion and posture variability. Experimental results show that the ViT model achieves state-of-the-art performance, with F1-scores exceeding 0.99 and AUC values above 0.996 across all regions. Compared to previous CNN-based system, the proposed model improves classification accuracy and generalizability while reducing complexity and enabling real-time inference on edge devices. These findings demonstrate the model’s potential for unobtrusive, scalable ergonomic risk monitoring in real-world manufacturing environments. Full article

(This article belongs to the Special Issue Secure and Decentralised IoT Systems)

► Show Figures

Figure 1

22 pages, 2525 KiB

Open AccessArticle

mmHSE: A Two-Stage Framework for Human Skeleton Estimation Using mmWave FMCW Radar Signals

by Jiake Tian, Yi Zou and Jiale Lai

Appl. Sci. 2025, 15(15), 8410; https://doi.org/10.3390/app15158410 - 29 Jul 2025

Viewed by 130

Abstract

We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using [...] Read more.

We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using a dual-node radar acquisition platform. Leveraging the collected data, we develop a two-stage neural architecture for human skeleton estimation. The first stage employs a dual-branch network with depthwise separable convolutions and self-attention to extract multi-scale spatiotemporal features from dual-view radar inputs. A cross-modal attention fusion module is then used to generate initial estimates of 21 skeletal keypoints. The second stage refines these estimates using a skeletal topology module based on graph convolutional networks, which captures spatial dependencies among joints to enhance localization accuracy. Experiments show that mmHSE achieves a Mean Absolute Error (MAE) of 2.78 cm. In cross-domain evaluations, the MAE remains at 3.14 cm, demonstrating the method’s generalization ability and robustness for non-intrusive human pose estimation from mmWave FMCW radar signals. Full article

► Show Figures

Figure 1

23 pages, 5594 KiB

Open AccessArticle

Dynamic Properties of Steel-Wrapped RC Column–Beam Joints Connected by Embedded Horizontal Steel Plate: Experimental Study

by Jian Wu, Mingwei Ma, Changhao Wei, Jian Zhou, Yuxi Wang, Jianhui Wang and Weigao Ding

Buildings 2025, 15(15), 2657; https://doi.org/10.3390/buildings15152657 - 28 Jul 2025

Viewed by 287

Abstract

The performance of reinforced concrete (RC) frame structures will gradually decrease over time, posing a threat to the safety of buildings. Although the performance of some buildings may still meet the safety requirements, they cannot meet new usage requirements. Therefore, this paper proposes [...] Read more.

The performance of reinforced concrete (RC) frame structures will gradually decrease over time, posing a threat to the safety of buildings. Although the performance of some buildings may still meet the safety requirements, they cannot meet new usage requirements. Therefore, this paper proposes a new-type joint to promote the development of research on the reinforcement and renovation of RC frame structures in response to this situation. The RC beams and columns of the joints are connected by embedded horizontal steel plate (a single plate with dimension of 150 mm × 200 mm × 5 mm), and the beams and columns are individually wrapped in steel. Through conducting low cyclic loading tests, this paper analyzes the influence of carrying out wrapped steel treatment and the thickness of wrapped steel of the beam and connector on mechanical performance indicators such as hysteresis curve, skeleton curve, stiffness, ductility, and energy dissipation. The experimental results indicate that the reinforcement using steel plate can significantly improve the dynamic performance of the joint. The effect of changing the thickness of the connector on the dynamic performance of the specimen is not significant, while increasing the thickness of wrapped steel of beam can effectively improve the overall strength of joint. The research results of this paper will help promote the application of reinforcement and renovation technology for existing buildings, and improve the quality of human living. Full article

(This article belongs to the Special Issue Advanced Concrete Structures: Structural Behaviors and Design Methods—2nd Edition)

► Show Figures

Figure 1

19 pages, 709 KiB

Open AccessArticle

Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks

by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng

Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025

Viewed by 348

Abstract

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article

(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

13 pages, 1207 KiB

Open AccessArticle

Subaxial Subluxation (SAS) and Cervical Deformity in Patients with Rheumatoid Arthritis in Relation to Selected Sagittal Balance Parameters

by Robert Wróblewski, Małgorzata Mańczak and Robert Gasik

J. Clin. Med. 2025, 14(14), 4954; https://doi.org/10.3390/jcm14144954 - 13 Jul 2025

Viewed by 330

Abstract

Introduction: Synovitis and damage to natural stabilizers of many axial and peripheral joints make patients with rheumatoid arthritis particularly susceptible to sagittal balance disorders of the axial skeleton. This may determine the high individual variability of cervical spine deformities as well as differences [...] Read more.

Introduction: Synovitis and damage to natural stabilizers of many axial and peripheral joints make patients with rheumatoid arthritis particularly susceptible to sagittal balance disorders of the axial skeleton. This may determine the high individual variability of cervical spine deformities as well as differences in the rate of development of disease symptoms in these patients, such as radiculopathy and myelopathy. Methods: In the scientific literature, in addition to systemic factors, more and more attention is paid to work on biomechanical factors in the development of cervical spine instability. One of the methods for assessing the influence of biomechanical factors, which can also be used in everyday practice, is the analysis of radiological parameters of sagittal balance. Results: Among the selected sagittal balance parameters studied, a statistical relationship between C4 and C5 distance and the OI parameter has been found, indicating a relationship to a parameter that remains constant throughout an individual’s life in the group of patients with disease duration over 20 years. Conclusions: The development of instability and deformity in the subaxial segment of the cervical spine in patients with rheumatoid arthritis may be the result of insufficiently understood components of biomechanical factors; hence, further research in this field is necessary. Full article

(This article belongs to the Special Issue Rheumatoid Arthritis: Challenges, Innovations and Outcomes)

► Show Figures

Figure 1

20 pages, 4620 KiB

Open AccessArticle

An Interactive Human-in-the-Loop Framework for Skeleton-Based Posture Recognition in Model Education

by Jing Shen, Ling Chen, Xiaotong He, Chuanlin Zuo, Xiangjun Li and Lin Dong

Biomimetics 2025, 10(7), 431; https://doi.org/10.3390/biomimetics10070431 - 1 Jul 2025

Viewed by 447

Abstract

This paper presents a human-in-the-loop interactive framework for skeleton-based posture recognition, designed to support model training and artistic education. A total of 4870 labeled images are used for training and validation, and 500 images are reserved for testing across five core posture categories: [...] Read more.

This paper presents a human-in-the-loop interactive framework for skeleton-based posture recognition, designed to support model training and artistic education. A total of 4870 labeled images are used for training and validation, and 500 images are reserved for testing across five core posture categories: standing, sitting, jumping, crouching, and lying. From each image, comprehensive skeletal features are extracted, including joint coordinates, angles, limb lengths, and symmetry metrics. Multiple classification algorithms—traditional (KNN, SVM, Random Forest) and deep learning-based (LSTM, Transformer)—are compared to identify effective combinations of features and models. Experimental results show that deep learning models achieve superior accuracy on complex postures, while traditional models remain competitive with low-dimensional features. Beyond classification, the system integrates posture recognition with a visual recommendation module. Recognized poses are used to retrieve matched examples from a reference library, allowing instructors to browse and select posture suggestions for learners. This semi-automated feedback loop enhances teaching interactivity and efficiency. Among all evaluated methods, the Transformer model achieved the best accuracy of 92.7% on the dataset, demonstrating the effectiveness of our closed-loop framework in supporting pose classification and model training. The proposed framework contributes both algorithmic insights and a novel application design for posture-driven educational support systems. Full article

(This article belongs to the Special Issue Biomimetic Innovations for Human–Machine Interaction)

► Show Figures

Figure 1

12 pages, 6032 KiB

Open AccessReview

Imaging Evaluation of Periarticular Soft Tissue Masses in the Appendicular Skeleton: A Pictorial Review

by Francesco Pucciarelli, Maria Carla Faugno, Daniela Valanzuolo, Edoardo Massaro, Lorenzo Maria De Sanctis, Elisa Zaccaria, Marta Zerunian, Domenico De Santis, Michela Polici, Tiziano Polidori, Andrea Laghi and Damiano Caruso

J. Imaging 2025, 11(7), 217; https://doi.org/10.3390/jimaging11070217 - 30 Jun 2025

Viewed by 312

Abstract

Soft tissue masses are predominantly benign, with a benign-to-malignant ratio exceeding 100:1, often located around joints. They may be contiguous or adjacent to joints or reflect systemic diseases or distant organ involvement. Clinically, they typically present as palpable swellings. Evaluation should consider duration, [...] Read more.

Soft tissue masses are predominantly benign, with a benign-to-malignant ratio exceeding 100:1, often located around joints. They may be contiguous or adjacent to joints or reflect systemic diseases or distant organ involvement. Clinically, they typically present as palpable swellings. Evaluation should consider duration, size, depth, and mobility. Also assess consistency, growth rate, symptoms, and history of trauma, infection, or malignancy. Laboratory tests are generally of limited diagnostic value. The primary clinical goal is to avoid unnecessary investigations or procedures for benign lesions while ensuring timely diagnosis and treatment of malignant ones. Imaging plays a central role: it confirms the presence of the lesion, assesses its location, size, and composition, differentiates between cystic and solid or benign and malignant features, and can sometimes provide a definitive diagnosis. Imaging is also crucial for biopsy planning, treatment strategy, identification of involved structures, and follow-up. Ultrasound (US) is the first-line imaging modality for palpable soft tissue masses due to its low cost, wide availability, and lack of ionizing radiation. If findings are inconclusive, magnetic resonance imaging (MRI) or computed tomography (CT) is recommended. This review aims to discuss the most common causes of periarticular soft tissue masses in the appendicular skeleton, focusing on clinical presentation and radiologic features. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition)

► Show Figures

Figure 1

29 pages, 2452 KiB

Open AccessArticle

A Novel Deep Learning Model for Human Skeleton Estimation Using FMCW Radar

by Parma Hadi Rantelinggi, Xintong Shi, Mondher Bouazizi and Tomoaki Ohtsuki

Sensors 2025, 25(13), 3909; https://doi.org/10.3390/s25133909 - 23 Jun 2025

Viewed by 519

Abstract

Human skeleton estimation using Frequency-Modulated Continuous Wave (FMCW) radar is a promising approach for privacy-preserving motion analysis. However, the existing methods struggle with sparse radar point cloud data, leading to inaccuracies in joint localization. To address this challenge, we propose a novel deep [...] Read more.

Human skeleton estimation using Frequency-Modulated Continuous Wave (FMCW) radar is a promising approach for privacy-preserving motion analysis. However, the existing methods struggle with sparse radar point cloud data, leading to inaccuracies in joint localization. To address this challenge, we propose a novel deep learning framework integrating convolutional neural networks (CNNs), multi-head transformers, and Bi-LSTM networks to enhance spatiotemporal feature representations. Our approach introduces a frame concatenation strategy that improves data quality before processing through the neural network pipeline. Experimental evaluations on the MARS dataset demonstrate that our model outperforms conventional methods by significantly reducing estimation errors, achieving a mean absolute error (MAE) of 1.77 cm and a root mean squared error (RMSE) of 2.92 cm while maintaining computational efficiency. Full article

(This article belongs to the Special Issue Convolutional Neural Network Technology for 3D Imaging and Sensing)

► Show Figures

Figure 1

34 pages, 9431 KiB

Open AccessArticle

Gait Recognition via Enhanced Visual–Audio Ensemble Learning with Decision Support Methods

by Ruixiang Kan, Mei Wang, Tian Luo and Hongbing Qiu

Sensors 2025, 25(12), 3794; https://doi.org/10.3390/s25123794 - 18 Jun 2025

Viewed by 440

Abstract

Gait is considered a valuable biometric feature, and it is essential for uncovering the latent information embedded within gait patterns. Gait recognition methods are expected to serve as significant components in numerous applications. However, existing gait recognition methods exhibit limitations in complex scenarios. [...] Read more.

Gait is considered a valuable biometric feature, and it is essential for uncovering the latent information embedded within gait patterns. Gait recognition methods are expected to serve as significant components in numerous applications. However, existing gait recognition methods exhibit limitations in complex scenarios. To address these, we construct a dual-Kinect V2 system that focuses more on gait skeleton joint data and related acoustic signals. This setup lays a solid foundation for subsequent methods and updating strategies. The core framework consists of enhanced ensemble learning methods and Dempster–Shafer Evidence Theory (D-SET). Our recognition methods serve as the foundation, and the decision support mechanism is used to evaluate the compatibility of various modules within our system. On this basis, our main contributions are as follows: (1) an improved gait skeleton joint AdaBoost recognition method based on Circle Chaotic Mapping and Gramian Angular Field (GAF) representations; (2) a data-adaptive gait-related acoustic signal AdaBoost recognition method based on GAF and a Parallel Convolutional Neural Network (PCNN); and (3) an amalgamation of the Triangulation Topology Aggregation Optimizer (TTAO) and D-SET, providing a robust and innovative decision support mechanism. These collaborations improve the overall recognition accuracy and demonstrate their considerable application values. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

33 pages, 4127 KiB

Open AccessArticle

Kinematic Skeleton Extraction from 3D Model Based on Hierarchical Segmentation

by Nitinan Mata and Sakchai Tangwannawit

Symmetry 2025, 17(6), 879; https://doi.org/10.3390/sym17060879 - 4 Jun 2025

Viewed by 684

Abstract

A new approach for skeleton extraction has been designed to work directly with 3D point cloud data. It blends hierarchical segmentation with a multi-scale ensemble built on top of modified PointNet models. Outputs from three network variants trained at different spatial resolutions are [...] Read more.

A new approach for skeleton extraction has been designed to work directly with 3D point cloud data. It blends hierarchical segmentation with a multi-scale ensemble built on top of modified PointNet models. Outputs from three network variants trained at different spatial resolutions are aggregated using majority voting, unweighted averaging, and adaptive weighting, with the latter yielding the best performance. Each joint is set at the center of its part. A radius-based filter is used to remove any outliers, specifically, points that fall too far from where the joints are expected to be. When evaluated on benchmark datasets such as DFaust, CMU, Kids, and EHF, the model demonstrated strong segmentation accuracy (mIoU = 0.8938) and low joint localization error (MPJPE = 22.82 mm). The method generalizes well to an unseen dataset (DanceDB), maintaining strong performance across diverse body types and poses. Compared to benchmark methods such as L1-Medial, Pinocchio, and MediaPipe, our approach offers greater anatomical symmetry, joint completeness, and robustness in occluded or overlapping regions. Structural integrity is maintained by working directly with 3D data, without the need for 2D projections or medial-axis approximations. The visual assessment of DanceDB results indicates improved anatomical accuracy, even in the absence of quantitative comparison. The outcome supports practical applications in animation, motion tracking, and biomechanics. Full article

(This article belongs to the Special Issue Asymmetric and Symmetric in Deep Computer Vision and Generative Modeling)

► Show Figures

Figure 1

25 pages, 9742 KiB

Open AccessArticle

Autism Spectrum Disorder Detection Using Skeleton-Based Body Movement Analysis via Dual-Stream Deep Learning

by Jungpil Shin, Abu Saleh Musa Miah, Manato Kakizaki, Najmul Hassan and Yoichi Tomioka

Electronics 2025, 14(11), 2231; https://doi.org/10.3390/electronics14112231 - 30 May 2025

Viewed by 628

Abstract

Autism Spectrum Disorder (ASD) poses significant challenges in diagnosis due to its diverse symptomatology and the complexity of early detection. Atypical gait and gesture patterns, prominent behavioural markers of ASD, hold immense potential for facilitating early intervention and optimising treatment outcomes. These patterns [...] Read more.

Autism Spectrum Disorder (ASD) poses significant challenges in diagnosis due to its diverse symptomatology and the complexity of early detection. Atypical gait and gesture patterns, prominent behavioural markers of ASD, hold immense potential for facilitating early intervention and optimising treatment outcomes. These patterns can be efficiently and non-intrusively captured using modern computational techniques, making them valuable for ASD recognition. Various types of research have been conducted to detect ASD through deep learning, including facial feature analysis, eye gaze analysis, and movement and gesture analysis. In this study, we optimise a dual-stream architecture that combines image classification and skeleton recognition models to analyse video data for body motion analysis. The first stream processes Skepxels—spatial representations derived from skeleton data—using ConvNeXt-Base, a robust image recognition model that efficiently captures aggregated spatial embeddings. The second stream encodes angular features, embedding relative joint angles into the skeleton sequence and extracting spatiotemporal dynamics using Multi-Scale Graph 3D Convolutional Network(MSG3D), a combination of Graph Convolutional Networks (GCNs) and Temporal Convolutional Networks (TCNs). We replace the ViT model from the original architecture with ConvNeXt-Base to evaluate the efficacy of CNN-based models in capturing gesture-related features for ASD detection. Additionally, we experimented with a Stack Transformer in the second stream instead of MSG3D but found it to result in lower performance accuracy, thus highlighting the importance of GCN-based models for motion analysis. The integration of these two streams ensures comprehensive feature extraction, capturing both global and detailed motion patterns. A pairwise Euclidean distance loss is employed during training to enhance the consistency and robustness of feature representations. The results from our experiments demonstrate that the two-stream approach, combining ConvNeXt-Base and MSG3D, offers a promising method for effective autism detection. This approach not only enhances accuracy but also contributes valuable insights into optimising deep learning models for gesture-based recognition. By integrating image classification and skeleton recognition, we can better capture both global and detailed motion patterns, which are crucial for improving early ASD diagnosis and intervention strategies. Full article

(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)

► Show Figures

Figure 1

34 pages, 20595 KiB

Open AccessArticle

Collision-Free Path Planning in Dynamic Environment Using High-Speed Skeleton Tracking and Geometry-Informed Potential Field Method

by Yuki Kawawaki, Kenichi Murakami and Yuji Yamakawa

Robotics 2025, 14(5), 65; https://doi.org/10.3390/robotics14050065 - 17 May 2025

Viewed by 892

Abstract

In recent years, the realization of a society in which humans and robots coexist has become highly anticipated. As a result, robots are expected to exhibit versatility regardless of their operating environments, along with high responsiveness, to ensure safety and enable dynamic task [...] Read more.

In recent years, the realization of a society in which humans and robots coexist has become highly anticipated. As a result, robots are expected to exhibit versatility regardless of their operating environments, along with high responsiveness, to ensure safety and enable dynamic task execution. To meet these demands, we design a comprehensive system composed of two primary components: high-speed skeleton tracking and path planning. For tracking, we implement a high-speed skeleton tracking method that combines deep learning-based detection with optical flow-based motion extraction. In addition, we introduce a dynamic search area adjustment technique that focuses on the target joint to extract the desired motion more accurately. For path planning, we propose a high-speed, geometry-informed potential field model that addresses four key challenges: (P1) avoiding local minima, (P2) suppressing oscillations, (P3) ensuring adaptability to dynamic environments, and (P4) handling obstacles with arbitrary 3D shapes. We validated the effectiveness of our high-frequency feedback control and the proposed system through a series of simulations and real-world collision-free path planning experiments. Our high-speed skeleton tracking operates at 250 Hz, which is eight times faster than conventional deep learning-based methods, and our path planning method runs at over 10,000 Hz. The proposed system offers both versatility across different working environments and low latencies. Therefore, we hope that it will contribute to a foundational motion generation framework for human–robot collaboration (HRC), applicable to a wide range of downstream tasks while ensuring safety in dynamic environments. Full article

(This article belongs to the Special Issue Visual Servoing-Based Robotic Manipulation)

► Show Figures

Figure 1

15 pages, 1463 KiB

Open AccessArticle

Spatial–Temporal Heatmap Masked Autoencoder for Skeleton-Based Action Recognition

by Cunling Bian, Yang Yang, Tao Wang and Weigang Lu

Sensors 2025, 25(10), 3146; https://doi.org/10.3390/s25103146 - 16 May 2025

Viewed by 650

Abstract

Skeleton representation learning offers substantial advantages for action recognition by encoding intricate motion details and spatial–temporal dependencies among joints. However, fully supervised approaches necessitate large amounts of annotated data, which are often labor-intensive and costly to acquire. In this work, we propose the [...] Read more.

Skeleton representation learning offers substantial advantages for action recognition by encoding intricate motion details and spatial–temporal dependencies among joints. However, fully supervised approaches necessitate large amounts of annotated data, which are often labor-intensive and costly to acquire. In this work, we propose the Spatial–Temporal Heatmap Masked Autoencoder (STH-MAE), a novel self-supervised framework tailored for skeleton-based action recognition. Unlike coordinate-based methods, STH-MAE adopts heatmap volumes as its primary representation, mitigating noise inherent in pose estimation while capitalizing on advances in Vision Transformers. The framework constructs a spatial–temporal heatmap (STH) by aggregating 2D joint heatmaps across both spatial and temporal axes. This STH is partitioned into non-overlapping patches to facilitate local feature learning, with a masking strategy applied to randomly conceal portions of the input. During pre-training, a Vision Transformer-based autoencoder equipped with a lightweight prediction head reconstructs the masked regions, fostering the extraction of robust and transferable skeletal representations. Comprehensive experiments on the NTU RGB+D 60 and NTU RGB+D 120 benchmarks demonstrate the superiority of STH-MAE, achieving state-of-the-art performance under multiple evaluation protocols. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

16 pages, 5532 KiB

Open AccessArticle

Intelligent System Study for Asymmetric Positioning of Personnel, Transport, and Equipment Monitoring in Coal Mines

by Diana Novak, Yuriy Kozhubaev, Hengbo Kang, Haodong Cheng and Roman Ershov

Symmetry 2025, 17(5), 755; https://doi.org/10.3390/sym17050755 - 14 May 2025

Viewed by 445

Abstract

The paper presents a study of an intelligent system for personnel positioning, transport, and equipment monitoring in the mining industry using convolutional neural network (CNN) and OpenPose technology. The proposed framework operates through a three-stage pipeline: OpenPose-based skeleton extraction from surveillance video streams, [...] Read more.

The paper presents a study of an intelligent system for personnel positioning, transport, and equipment monitoring in the mining industry using convolutional neural network (CNN) and OpenPose technology. The proposed framework operates through a three-stage pipeline: OpenPose-based skeleton extraction from surveillance video streams, capturing 18 key body joints at 30fps; multimodal feature fusion, combining skeletal key points and proximity sensor data to achieve environmental context awareness and obtain relevant feature values; and hierarchical pose alert, using attention-enhanced bidirectional LSTM (trained on 5000 annotated fall instances) for fall warning. The experiment conducted demonstrated that the combined use of the aforementioned technologies allows the system to determine the location and behavior of personnel, calculate the distance to hazardous areas in real time, and analyze personnel postures to identify possible risks such as falls or immobility. The system’s capacity to track the location of vehicles and equipment enhances operational efficiency, thereby mitigating the risk of accidents. Additionally, the system provides real-time alerts, identifying abnormal behavior, equipment malfunctions, and safety hazards, thus promoting enhanced mine management efficiency, improved safe working conditions, and a reduction in accidents. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Computer Vision and Graphics)

► Show Figures

Figure 1

Search Results (248)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (248)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI