MDPI - Publisher of Open Access Journals

15 pages, 1806 KB

Open AccessArticle

Deep Topology-Preserving Network for Skeleton Extraction and Node Identification of Tight Junctions in Retinal Pigment Epithelium Images

by Shuo Yuan and Lei Zhang

Appl. Sci. 2026, 16(11), 5667; https://doi.org/10.3390/app16115667 - 4 Jun 2026

Viewed by 157

Abstract

The structural integrity of tight junction (TJ) networks in the retinal pigment epithelium is a key indicator of the function of the outer blood-retinal barrier (oBRB). However, traditional automatic segmentation methods often suffer from topological discontinuities, resulting in fragmented predictions that fail to [...] Read more.

The structural integrity of tight junction (TJ) networks in the retinal pigment epithelium is a key indicator of the function of the outer blood-retinal barrier (oBRB). However, traditional automatic segmentation methods often suffer from topological discontinuities, resulting in fragmented predictions that fail to accurately reflect the barrier’s state. In this study, we propose a topology-preserving deep learning framework specifically designed for TJ skeleton extraction and node identification. Our method employs a multi-task bidirectional architecture that simultaneously models both the midline structure and connecting nodes, and incorporates a composite loss function (clDice) constrained by soft skeleton similarity to explicitly enforce global structural connectivity. Quantitative evaluations indicate that the proposed method significantly improves topological consistency, with a Betti error of 3.3182, a Graph Connectivity Ratio (GCR) of 0.6858, and a Mean Node Degree Error (MNDE) of 1.6977. Although the F1 score for connectivity is 0.7830, the predicted network outperforms the standard model in terms of morphological fidelity and connectivity. These findings underscore the necessity of adopting topology-aware modeling in the process of biological network extraction, providing a solid computational foundation for the objective quantitative analysis of morphological stability in tightly connected networks in both clinical and experimental research. Full article

(This article belongs to the Section Applied Biosciences and Bioengineering)

► Show Figures

Figure 1

20 pages, 6641 KB

Open AccessArticle

Topology-Aware Road Extraction from Remote Sensing Images Using Deep Learning and Graph-Based Connectivity Refinement

by Zixuan Teng, Zezhong Zheng, Xiangyang Sun and Hao Xue

ISPRS Int. J. Geo-Inf. 2026, 15(5), 208; https://doi.org/10.3390/ijgi15050208 - 9 May 2026

Viewed by 680

Abstract

Road networks are fundamental components of transportation infrastructure and play a crucial role in various geospatial applications. Although deep learning-based semantic segmentation models have achieved promising results in extracting roads from high-resolution remote sensing imagery, the resulting networks often suffer from topological fragmentation [...] Read more.

Road networks are fundamental components of transportation infrastructure and play a crucial role in various geospatial applications. Although deep learning-based semantic segmentation models have achieved promising results in extracting roads from high-resolution remote sensing imagery, the resulting networks often suffer from topological fragmentation due to occlusions and shadows. To address this issue, we propose a topology-aware road extraction method that integrates deep learning-based segmentation with a graph-based connectivity refinement strategy. Specifically, a Pyramid Scene Parsing Network (PSPNet) is first employed to generate initial road probability maps. Subsequently, a connectivity-oriented post-processing pipeline is introduced, which incorporates a multi-source cost function strategy and a direction-aware Dijkstra search algorithm. By utilizing endpoint tangent vectors as inertial weights, the algorithm effectively reconstructs fragmented segments while ensuring geometric smoothness and topological consistency. Furthermore, a dynamic road width restoration strategy is applied to transform refined skeletons into physically consistent road entities. Experiments conducted on two publicly available datasets, CHN6-CUG and DeepGlobe, demonstrate the effectiveness of the proposed method. Quantitative results show that the refinement process significantly enhances road connectivity with a minimal trade-off in pixel-level accuracy. Specifically, the Conn metric increases by 0.1989 on the CHN6-CUG dataset and 0.3055 on the DeepGlobe dataset, while MIoU remains high with only marginal decreases of 1.07% and 0.45%, respectively. These findings indicate that the method effectively restores structural continuity, helping with reliable road network generation and subsequent integration into Geographic Information System (GIS)-based applications such as urban planning and autonomous navigation. Full article

(This article belongs to the Topic Digital and Intelligent Technologies and Application in Urban Construction, Operation, Maintenance, and Renewal)

► Show Figures

Figure 1

20 pages, 5431 KB

Open AccessArticle

An Algorithm for Identifying Unsafe Behaviors of Miners Based on the Improved AlphaPose

by Xiaopei Liu, Cong Song and Feng Tian

Sensors 2026, 26(4), 1107; https://doi.org/10.3390/s26041107 - 8 Feb 2026

Cited by 1 | Viewed by 662

Abstract

Utilizing video surveillance in mines to identify unsafe behaviors of miners is an important technical means for preventing coal mine accidents and achieving safety control. However, the complex underground environment (such as chaotic backgrounds, personnel occlusion, etc.) severely affects the estimation of human [...] Read more.

Utilizing video surveillance in mines to identify unsafe behaviors of miners is an important technical means for preventing coal mine accidents and achieving safety control. However, the complex underground environment (such as chaotic backgrounds, personnel occlusion, etc.) severely affects the estimation of human postures and feature extraction, resulting in low accuracy of unsafe behavior identification. To address this issue, this paper proposes a miner unsafe behavior recognition algorithm based on improved AlphaPose (RS-AlphaPose). Firstly, the improved real-time detection Transformer (RTDETR) is adopted to replace the original target detection network. Through the deformable attention mechanism and the addition of small target detection layers, the target detection ability in complex scenes is enhanced. Secondly, the sliding window attention and channel attention mechanisms are integrated in the posture estimation network to strengthen multi-scale semantics and global context correlation, thereby improving the accuracy of skeleton extraction in the presence of occlusion. Finally, the spatio-temporal graph convolution network is introduced to construct the spatio-temporal dependency of the skeleton sequence, capturing the temporal features of dynamic behaviors. On the COCO2017 posture dataset, the average accuracy of posture estimation of this algorithm reaches 72.5%, which is 2.2% higher than the basic AlphaPose model. On the self-built miner dynamic behavior dataset, the average recognition accuracy for typical unsafe behaviors such as climbing and crossing reaches 94.5%, which is 4.5% higher than the basic model. The experiments show that the proposed algorithm can effectively solve the interference problems in complex underground environments, significantly improve the accuracy of dynamic unsafe behavior recognition of miners, and provide a reliable technical solution for coal mine safety production. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

21 pages, 4342 KB

Open AccessArticle

Auto3DPheno: Automated 3D Maize Seedling Phenotyping via Topologically-Constrained Laplacian Contraction with NeRF

by Yi Gou, Xin Tan, Mingyu Yang, Xin Zhang, Liang Xu, Qingbin Jiao, Sijia Jiang, Ding Ma and Junbo Zang

Agronomy 2026, 16(4), 401; https://doi.org/10.3390/agronomy16040401 - 7 Feb 2026

Viewed by 547

Abstract

Analyzing three-dimensional (3D) phenotypic parameters of maize seedlings is of significant importance for maize cultivation and selection. However, existing methods often struggle to balance cost, efficiency, and accuracy, particularly when capturing the complex morphology of seedlings characterized by slender stems. To address these [...] Read more.

Analyzing three-dimensional (3D) phenotypic parameters of maize seedlings is of significant importance for maize cultivation and selection. However, existing methods often struggle to balance cost, efficiency, and accuracy, particularly when capturing the complex morphology of seedlings characterized by slender stems. To address these issues, this study proposes a novel end-to-end automated framework for extracting phenotypes using only consumer-grade RGB cameras. The pipeline initiates with Instant-NGP to rapidly reconstruct dense point clouds, establishing the 3D data foundation for phenotypic extraction. Subsequently, we formulate a directed topological graph-based mechanism. By mathematically defining bifurcation constraints via vector analysis, this mechanism guides a depth-first traversal strategy to explicitly disentangle stem and leaf skeletons. Building upon these decoupled skeletons, organ-level point cloud segmentation is achieved through constraint-based expansion, followed by density-based spatial clustering (DBSCAN) to detect individual leaves. Algorithms combining point cloud geometry with 3D Euclidean distance are also implemented to calculate key phenotypes including plant height and stem width. Finally, single-leaf skeleton fitting is used to estimate leaf length, and principal component analysis (PCA) is adopted to determine the stem–leaf angle, realizing the comprehensive automatic extraction of maize seedling phenotypes. Experiments show that the proposed method achieves high accuracy in extracting key phenotypic parameters. The mean relative errors for plant height, stem width, leaf length, stem-leaf angle, and leaf area are 0.76%, 2.93%, 1.26%, 2.13%, and 3.33%, respectively. Compared with existing methods as far as we know, the proposed method significantly improves extraction efficiency by reducing the processing time per plant to within 5 min while maintaining such high accuracy. Full article

(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)

► Show Figures

Figure 1

20 pages, 49658 KB

Open AccessArticle

Dead Chicken Identification Method Based on a Spatial-Temporal Graph Convolution Network

by Jikang Yang, Chuang Ma, Haikun Zheng, Zhenlong Wu, Xiaohuan Chao, Cheng Fang and Boyi Xiao

Animals 2026, 16(3), 368; https://doi.org/10.3390/ani16030368 - 23 Jan 2026

Cited by 1 | Viewed by 699

Abstract

In intensive cage rearing systems, accurate dead hen detection remains difficult due to complex environments, severe occlusion, and the high visual similarity between dead hens and live hens in a prone posture. To address these issues, this study proposes a dead hen identification [...] Read more.

In intensive cage rearing systems, accurate dead hen detection remains difficult due to complex environments, severe occlusion, and the high visual similarity between dead hens and live hens in a prone posture. To address these issues, this study proposes a dead hen identification method based on a Spatial-Temporal Graph Convolutional Network (STGCN). Unlike conventional static image-based approaches, the proposed method introduces temporal information to enable dynamic spatial-temporal modeling of hen health states. First, a multimodal fusion algorithm is applied to visible light and thermal infrared images to strengthen multimodal feature representation. Then, an improved YOLOv7-Pose algorithm is used to extract the skeletal keypoints of individual hens, and the ByteTrack algorithm is employed for multi-object tracking. Based on these results, spatial-temporal graph-structured data of hens are constructed by integrating spatial and temporal dimensions. Finally, a spatial-temporal graph convolution model is used to identify dead hens by learning spatial-temporal dependency features from skeleton sequences. Experimental results show that the improved YOLOv7-Pose model achieves an average precision (AP) of 92.8% in keypoint detection. Based on the constructed spatial-temporal graph data, the dead hen identification model reaches an overall classification accuracy of 99.0%, with an accuracy of 98.9% for the dead hen category. These results demonstrate that the proposed method effectively reduces interference caused by feeder occlusion and ambiguous visual features. By using dynamic spatial-temporal information, the method substantially improves robustness and accuracy of dead hen detection in complex cage rearing environments, providing a new technical route for intelligent monitoring of poultry health status. Full article

(This article belongs to the Special Issue Welfare and Behavior of Laying Hens)

► Show Figures

Figure 1

29 pages, 3921 KB

Open AccessArticle

A Semantic Priors-Based Non-Euclidean Topological Enhancement Method for 3D Human Pose Estimation in Multi-Class Complex Human Actions

by Xiaowei Han, Chaolong Fei, Yibo Feng, Wenbao Si and Guilin Yao

Electronics 2026, 15(1), 155; https://doi.org/10.3390/electronics15010155 - 29 Dec 2025

Viewed by 536

Abstract

Three-dimensional human pose estimation (3D HPE) aims to recover the three-dimensional coordinates of human joints from 2D images or videos to achieve precise quantification of human movement. In 3D HPE tasks based on multi-class complex human action datasets, the performance of existing Graph [...] Read more.

Three-dimensional human pose estimation (3D HPE) aims to recover the three-dimensional coordinates of human joints from 2D images or videos to achieve precise quantification of human movement. In 3D HPE tasks based on multi-class complex human action datasets, the performance of existing Graph Convolutional Network (GCN) and Transformer fusion models is constrained by the fixed physical connections of the skeleton, which impedes the modeling of cross-joint long-range semantic dependencies and hinders further performance gains. To address this issue, this study proposes a semantic prior-based non-Euclidean topology enhancement method for multi-class complex human actions, built upon a GCN–Transformer fusion model. The proposed method retains the original physical connections while introducing semantic prior edges; by constructing a hybrid topology structure, it explicitly models long-range semantic dependencies between non-adjacent joints, thereby facilitating the extraction of cross-joint semantic information. Experimental results on the Human3.6M and HumanEva-I datasets surpass those of SOTA baseline models. On the Human3.6M dataset, MPJPE and P-MPJPE are reduced by 1.25% and 0.63%, respectively. For the Walk and Jog actions on the HumanEva-I dataset, MPJPE is reduced by approximately 6.5%. These results demonstrate that the proposed method offers significant advantages for 3D HPE tasks based on multi-class complex human action data. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 2619 KB

Open AccessArticle

A Lightweight Spatiotemporal Skeleton Network for Abnormal Train Driver Action Detection

by Kaijia Tao, Fen Wang, Zhigang Liu and Yuanchun Huang

Appl. Sci. 2025, 15(24), 13152; https://doi.org/10.3390/app152413152 - 15 Dec 2025

Cited by 1 | Viewed by 634

Abstract

Abnormal behaviors of train drivers are a critical factor affecting the operational safety of urban rail transit. To achieve automated and efficient detection while meeting practical deployment requirements, this study proposes an end-to-end Temporal Action Detection network based on skeleton data. The network [...] Read more.

Abnormal behaviors of train drivers are a critical factor affecting the operational safety of urban rail transit. To achieve automated and efficient detection while meeting practical deployment requirements, this study proposes an end-to-end Temporal Action Detection network based on skeleton data. The network directly uses skeleton sequences as input, integrates a skeleton topology graph tailored to train driver actions for spatiotemporal feature extraction, and employs a non-shared feature propagation design to enhance classification and regression performance. Evaluated on a custom dataset of driver operations (including both standard and abnormal behaviors), the experimental results demonstrate favorable performance with high mean Average Precision (mAP) and strong accuracy. The findings show that the proposed network can accurately localize and classify driver operational behaviors, enabling precise detection of abnormal actions. Furthermore, its low parameter count and minimal storage requirements highlight strong potential for practical deployment in urban rail transit systems. Full article

(This article belongs to the Section Transportation and Future Mobility)

► Show Figures

Figure 1

27 pages, 2900 KB

Open AccessArticle

Graph-SENet: An Unsupervised Learning-Based Graph Neural Network for Skeleton Extraction from Point Cloud

by Jie Li, Wei Guo and Wenli Zhang

Future Internet 2025, 17(12), 558; https://doi.org/10.3390/fi17120558 - 3 Dec 2025

Viewed by 1109

Abstract

Extracting 3D skeletons from point clouds is a challenging task in computer vision. Most existing deep learning methods rely heavily on supervised data requiring extensive manual annotation. Consequently, re-labeling is often necessary for cross-category applications, while the process of 3D point cloud annotation [...] Read more.

Extracting 3D skeletons from point clouds is a challenging task in computer vision. Most existing deep learning methods rely heavily on supervised data requiring extensive manual annotation. Consequently, re-labeling is often necessary for cross-category applications, while the process of 3D point cloud annotation is inherently time-consuming and expensive. Simultaneously, existing unsupervised methods often suffer from significant skeleton point deviations due to limited capabilities in modeling local structures. To address these limitations, we propose Graph-SENet, an unsupervised learning-based graph neural network method for skeleton extraction. This method integrates dynamic graph convolution with a multi-level feature fusion mechanism to more comprehensively capture local geometric relationships. Through a multi-dimensional unsupervised feature loss, it learns the structural representation of skeleton points, significantly improving the precision and stability of skeleton point localization under annotation-free conditions. Furthermore, we propose a graph autoencoder structure optimized by cosine similarity to predict topological connections between skeleton points, thereby recovering semantically consistent and structurally complete 3D skeleton representations in an end-to-end manner. Experimental results on multiple datasets, including ShapeNet, ITOP, and Soybean-MVS, demonstrate that Graph-SENet outperforms existing mainstream unsupervised methods in terms of Chamfer Distance and F1-score. It exhibits superior accuracy, robustness, and cross-category generalization capabilities, effectively reducing manual annotation costs while enhancing the completeness and semantic consistency of skeleton recovery. These results validate the application potential and practical value of Graph-SENet in 3D structure understanding and downstream 3D analysis tasks. Full article

(This article belongs to the Special Issue Algorithms and Models for Next-Generation Vision Systems)

► Show Figures

Figure 1

23 pages, 2403 KB

Open AccessArticle

LI-AGCN: A Lightweight Initialization-Enhanced Adaptive Graph Convolutional Network for Effective Skeleton-Based Action Recognition

by Qingsheng Xie and Hongmin Deng

Sensors 2025, 25(23), 7282; https://doi.org/10.3390/s25237282 - 29 Nov 2025

Viewed by 1131

Abstract

The graph convolutional network (GCN) has become a mainstream technology in skeleton-based action recognition since it was first applied to this field. However, previous studies often overlooked the pivotal role of heuristic model initialization in the extraction of spatial features, impeding the model [...] Read more.

The graph convolutional network (GCN) has become a mainstream technology in skeleton-based action recognition since it was first applied to this field. However, previous studies often overlooked the pivotal role of heuristic model initialization in the extraction of spatial features, impeding the model from achieving its optimal performance. To address this issue, a lightweight initialization-enhanced adaptive graph convolutional network (LI-AGCN) is proposed, which effectively captures spatiotemporal features while maintaining low computational complexity. LI-AGCN employs three coordinate-based input branches (CIB) to dynamically adjust graph structures, which facilitates the extraction of informative spatial features. In addition, the model incorporates a lightweight and multi-scale temporal module to extract temporal feature, and employs an attention module that considers the temporal, spatial, and channel dimensions simultaneously to enhance key features. Finally, the performance of our proposed model is evaluated on three large-scale public datasets: NTU RGB+D, NTU RGB+D 120, and UAV-Human. The experimental results demonstrate that the LI-AGCN achieves excellent comprehensive performances on these datasets, especially obtaining 90.03% accuracy on the cross-subject benchmark of the NTU RGB+D dataset with only 0.18 million parameters, showcasing the effectiveness of the model. Full article

(This article belongs to the Special Issue Computer Vision Sensing and Pattern Recognition)

► Show Figures

Figure 1

14 pages, 2635 KB

Open AccessFeature PaperArticle

Clustered Federated Spatio-Temporal Graph Attention Networks for Skeleton-Based Action Recognition

by Tao Yu, Sandro Pinto, Tiago Gomes, Adriano Tavares and Hao Xu

Sensors 2025, 25(23), 7277; https://doi.org/10.3390/s25237277 - 29 Nov 2025

Cited by 1 | Viewed by 1246

Abstract

Federated learning (FL) for skeleton-based action recognition remains underexplored, particularly under strong client heterogeneity where regular FedAvg tends to cause client drift and unstable convergence. We introduce Clustered Federated Spatio-Temporal Graph Attention Networks (CF-STGAT), a clustered FL framework that leverages attention-derived spatio-temporal statistics [...] Read more.

Federated learning (FL) for skeleton-based action recognition remains underexplored, particularly under strong client heterogeneity where regular FedAvg tends to cause client drift and unstable convergence. We introduce Clustered Federated Spatio-Temporal Graph Attention Networks (CF-STGAT), a clustered FL framework that leverages attention-derived spatio-temporal statistics from local STGAT models to dynamically group clients and perform attention-weighted inter-cluster fusion that gently align cluster models. Concretely, the server periodically extracts multi-head parameter-based attention descriptors, normalizes and projects them via PCA, and applies K-means to form clusters; a global reference is then computed by attention–similarity weighting and used to regularize each cluster model with a lightweight fusion step. On NTU RGB+D 60/120(NTU 60/120), CF-STGAT consistently outperforms strong FL baselines with the STGAT backbone, yielding absolute top-1 gains of +0.84/+4.09 (NTU 60, X-Sub/X-Setup) and +7.98/+4.18 (NTU 120, X-Sub/X-Setup) over FedAvg, alongside smoother per-client trajectories and lower terminal test loss. Ablations indicate that attention-guided clustering and inter-cluster fusion are complementary: clustering reduces within-group variance whereas fusion limits cross-cluster divergence. The approach keeps local training unchanged and adds only server-side statistics and clustering. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Figure 1

23 pages, 27054 KB

Open AccessArticle

ActionMamba: Action Spatial–Temporal Aggregation Network Based on Mamba and GCN for Skeleton-Based Action Recognition

by Jinglong Wen, Dan Liu and Bin Zheng

Electronics 2025, 14(18), 3610; https://doi.org/10.3390/electronics14183610 - 11 Sep 2025

Cited by 2 | Viewed by 2858

Abstract

Skeleton-based action recognition networks have widely adopted the approach of Graph Convolutional Networks (GCN) due to their superior capabilities in modeling data topology, but several key issues still require further investigation. Firstly, the graph convolutional network extracts action features by applying temporal convolution [...] Read more.

Skeleton-based action recognition networks have widely adopted the approach of Graph Convolutional Networks (GCN) due to their superior capabilities in modeling data topology, but several key issues still require further investigation. Firstly, the graph convolutional network extracts action features by applying temporal convolution to each key point, which causes the model to ignore the temporal connections between different important points. Secondly, the local receptive field of graph convolutional networks limits their ability to capture correlations between non-adjacent joints. Motivated by the State Space Model (SSM), we propose an Action Spatio-temporal Aggregation Network, named ActionMamba. Specifically, we introduce a novel embedding module called the Action Characteristic Encoder (ACE), which enhances the coupling of temporal and spatial information in skeletal features by combining intrinsic spatio-temporal encoding with extrinsic space encoding. Additionally, we design an Action Perception Model (APM) based on Mamba and GCN. By effectively combining the excellent feature processing capabilities of GCN with the outstanding global information modeling capabilities of Mamba, APM is able to comprehend the hidden features between different joints and selectively filter information from various joints. Extensive experimental results demonstrate that ActionMamba achieves highly competitive performance on three challenging benchmark datasets: NTU-RGB+D 60, NTU-RGB+D 120, and UAV–Human. Full article

(This article belongs to the Special Issue Advances in Image Recognition, Image Segmentation, Image Fusion, and Singal Processing)

► Show Figures

Figure 1

24 pages, 5612 KB

Open AccessArticle

Center-of-Gravity-Aware Graph Convolution for Unsafe Behavior Recognition of Construction Workers

by Peijian Jin, Shihao Guo and Chaoqun Li

Sensors 2025, 25(17), 5493; https://doi.org/10.3390/s25175493 - 4 Sep 2025

Cited by 1 | Viewed by 1664

Abstract

Falls from height are a critical safety concern in the construction industry, underscoring the need for effective identification of high-risk worker behaviors near hazardous edges for proactive accident prevention. This study aimed to address this challenge by developing an improved action recognition model. [...] Read more.

Falls from height are a critical safety concern in the construction industry, underscoring the need for effective identification of high-risk worker behaviors near hazardous edges for proactive accident prevention. This study aimed to address this challenge by developing an improved action recognition model. We propose a novel dynamic spatio-temporal graph convolutional network (CoG-STGCN) that incorporates a center of gravity (CoG)-aware mechanism. The method computes global and local CoG using anthropometric priors and extracts four key dynamic CoG features, which a Multi-Layer Perceptron (MLP) then uses to generate modulation weights that dynamically adjust the skeleton graph’s adjacency matrix, enhancing sensitivity to stability changes. On a self-constructed dataset of eight typical edge-related hazardous behaviors, CoG-STGCN achieved a Top-1 accuracy of 95.83% (baseline ST-GCN: 93.75%) and an average accuracy of 94.17% in fivefold cross-validation (baseline ST-GCN: 92.91%), with significant improvements in recognizing actions involving rapid CoG shifts. The CoG-STGCN provides a more effective and physically informed approach for intelligent unsafe behavior recognition and early warning in built environments. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 3407 KB

Open AccessArticle

Graph Convolutional Network with Multi-View Topology for Lightweight Skeleton-Based Action Recognition

by Liangliang Wang, Xu Zhang and Chuang Zhang

Symmetry 2025, 17(8), 1235; https://doi.org/10.3390/sym17081235 - 4 Aug 2025

Cited by 2 | Viewed by 3145

Abstract

Skeleton-based action recognition is an important subject in deep learning. Graph Convolutional Networks (GCNs) have demonstrated strong performance by modeling the human skeleton as a natural topological graph, representing the connections between joints. However, most existing methods rely on non-adaptive topologies or insufficiently [...] Read more.

Skeleton-based action recognition is an important subject in deep learning. Graph Convolutional Networks (GCNs) have demonstrated strong performance by modeling the human skeleton as a natural topological graph, representing the connections between joints. However, most existing methods rely on non-adaptive topologies or insufficiently expressive representations. To address these limitations, we propose a Multi-view Topology Refinement Graph Convolutional Network (MTR-GCN), which is efficient, lightweight, and delivers high performance. Specifically: (1) We propose a new spatial topology modeling approach that incorporates two views. A dynamic view fuses joint information from dual streams in a pairwise manner, while a static view encodes the shortest static paths between joints, preserving the original connectivity relationships. (2) We propose a new MultiScale Temporal Convolutional Network (MSTC), which is efficient and lightweight. (3) Furthermore, we introduce a new temporal topology strategy by modeling temporal frames as a graph, which strengthens the extraction of temporal features. By modeling the human skeleton as both a spatial and a temporal graph, we reveal a topological symmetry between space and time within the unified spatio-temporal framework. The proposed model achieves state-of-the-art performance on several benchmark datasets, including NTU RGB + D (XSub: 92.8%, XView: 96.8%), NTU RGB + D 120 (XSub: 89.6%, XSet: 90.8%), and NW-UCLA (95.7%), demonstrating the effectiveness of our GCN module, TCN module, and overall architecture. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 2525 KB

Open AccessArticle

mmHSE: A Two-Stage Framework for Human Skeleton Estimation Using mmWave FMCW Radar Signals

by Jiake Tian, Yi Zou and Jiale Lai

Appl. Sci. 2025, 15(15), 8410; https://doi.org/10.3390/app15158410 - 29 Jul 2025

Cited by 2 | Viewed by 2249

Abstract

We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using [...] Read more.

We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using a dual-node radar acquisition platform. Leveraging the collected data, we develop a two-stage neural architecture for human skeleton estimation. The first stage employs a dual-branch network with depthwise separable convolutions and self-attention to extract multi-scale spatiotemporal features from dual-view radar inputs. A cross-modal attention fusion module is then used to generate initial estimates of 21 skeletal keypoints. The second stage refines these estimates using a skeletal topology module based on graph convolutional networks, which captures spatial dependencies among joints to enhance localization accuracy. Experiments show that mmHSE achieves a Mean Absolute Error (MAE) of 2.78 cm. In cross-domain evaluations, the MAE remains at 3.14 cm, demonstrating the method’s generalization ability and robustness for non-intrusive human pose estimation from mmWave FMCW radar signals. Full article

► Show Figures

Figure 1

19 pages, 709 KB

Open AccessArticle

Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks

by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng

Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025

Cited by 4 | Viewed by 2033

Abstract

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article

(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

Search Results (75)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (75)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI