Dynamic Hypergraph Convolutional Networks for Hand Motion Gesture Sequence Recognition
Abstract
:1. Introduction
- Hypergraph Integration: Building on graph convolutional networks (GCNs), DHGCN incorporates hypergraph structures to model higher-order relational features among multiple nodes. Unlike traditional graphs with pairwise edges, hyperedges in hypergraphs can connect multiple nodes simultaneously, enabling the network to capture complex, multinode dependencies.
- Dynamic Hypergraph Structure: The hypergraph in DHGCN is data-driven and non-predefined, learned directly from input data during training. This dynamic scheme ensures objective and adaptive feature extraction, avoiding the biases of manually predefined structures.
- Multistream Fusion Strategy: By fusing prediction vectors from hypergraph convolutional networks with varying numbers of hyperedges, the model achieves enhanced performance. This strategy leverages diverse hypergraph configurations to capture complementary information and improve recognition accuracy.
2. Materials and Methods
2.1. GCNs
- represents a graph, where is the set of vertices corresponding to hand joints, and is the set of edges representing relationships between joints.
- represents the matrix of node features, where each row is a feature vector for vertex .
- represents the adjacency matrix, with entries indicating the presence and weight of the edge between vertices and .
- is the matrix of node features at layer , with .
- is the adjacency matrix with added self-loops.
- is the degree matrix of .
- is the trainable weight matrix at layer .
- is an activation function, typically ReLU.
2.2. Hypergraph Convolution
- is the set of nodes (vertices);
- is the set of hyperedges, where each .
- : input node features (each row is a feature of node );
- : output node features after convolution.
- : diagonal matrix of vertex degrees, with the following entries: );
- : diagonal matrix of hyperedge degrees, with ;
- : diagonal matrix of hyperedge weights .
2.3. Architecture Design
2.4. DHGC Block
- represents the average pooling operation across the temporal dimension.
- is the convolution operation, with input channels and output channels .
- is the weight matrix that adjusts the feature channels.
- The incidence matrix encodes the relationships between vertices and hyperedges, with each element initialized randomly from a standard normal distribution and learnable during training.
- The degree matrices and represent the degree of vertices and hyperedges, respectively.
2.5. Multistream Strategy
- Joint Stream:
- Processes the raw joint positions.
- Extracts spatial configuration features of the hand.
- Bone Stream:
- Processes bone vectors, i.e., the directional vectors between connected joints.
- Captures structural and relational features between joints.
- Motion Stream:
- Processes temporal differences (e.g., joint velocities).
- Captures motion dynamics, helpful for distinguishing gestures with similar poses but different movements.
- The input is a hand-skeleton sequence.
- A data preparation module extracts three types of node features:
- j: joint;
- : motion;
- b: bone features (possibly inter-joint vectors).
3. Results
3.1. Implementation Details
3.1.1. Datasets
3.1.2. Model Evaluation
3.1.3. Comparison of Results
3.1.4. Ablation Study
3.2. Fusion Performance of Different Streams
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
Abbreviation | Full Term |
CNN | Convolutional Neural Network |
CTR-GCN | Channel-wise Topology Refinement Graph Convolutional Network |
DG-GCN | Dynamic Graph Convolutional Network |
DHGCN | Dynamic Hypergraph Convolutional Network |
FC | Fully Connected |
GCN | Graph Convolutional Network |
GCNs | Graph Convolutional Networks |
HD-GCN | Hierarchically Decomposed Graph Convolutional Network |
HGCN | Hypergraph Convolutional Network |
HGCNs | Hypergraph Convolutional Networks |
HCI | Human–Computer Interaction |
MSTC | Multiscale Temporal Convolution |
RGB | Red, Green, Blue |
RNN | Recurrent Neural Network |
SGD | Stochastic Gradient Descent |
ST-GCN | Spatial–Temporal Graph Convolutional Network |
TD-GCN | Temporal Difference Graph Convolutional Network |
References
- Gu, Y.; Xu, Y.; Shen, Y.; Huang, H.; Liu, T.; Jin, L.; Ren, H.; Wang, J. A review of hand function rehabilitation systems based on hand motion recognition devices and artificial intelligence. Brain Sci. 2022, 12, 1079. [Google Scholar] [CrossRef] [PubMed]
- Ohn-Bar, E.; Trivedi, M.M. Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations. IEEE Trans. Intell. Transp. Syst. 2014, 15, 2368–2377. [Google Scholar] [CrossRef]
- Qi, J.; Ma, L.; Cui, Z.; Yu, Y. Computer vision-based hand gesture recognition for human-robot interaction: A review. Complex Intell. Syst. 2024, 10, 1581–1606. [Google Scholar] [CrossRef]
- Peng, S.H.; Tsai, P.H. An efficient graph convolution network for skeleton-based dynamic hand gesture recognition. IEEE Trans. Cogn. Dev. Syst. 2023, 15, 2179–2189. [Google Scholar] [CrossRef]
- Okano, M.; Liu, J.Q.; Tateyama, T.; Chen, Y.W. DHGD: Dynamic Hand Gesture Dataset for Skeleton-Based Gesture Recognition and Baseline Evaluations. In Proceedings of the 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 6–8 January 2024; pp. 1–4. [Google Scholar] [CrossRef]
- Jacob, M.G.; Wachs, J.P.; Packer, R.A. Hand-gesture-based sterile interface for the operating room using contextual cues for the navigation of radiological images. J. Am. Med. Inform. Assoc. 2013, 20, e183–e186. [Google Scholar] [CrossRef] [PubMed]
- Bulugu, I. Adaptive shift graph convolutional neural network for hand gesture recognition based on 3D skeletal similarity. SIViP 2024, 18, 7583–7595. [Google Scholar] [CrossRef]
- Köpüklü, O.; Gunduz, A.; Kose, N.; Rigoll, G. Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Strezoski, G.; Stojanovski, D.; Dimitrovski, I.; Madjarov, G. Hand Gesture Recognition Using Deep Convolutional Neural Networks. In ICT Innovations 2016, Advances in Intelligent Systems and Computing; Stojanov, G., Kulakov, A., Eds.; Springer: Cham, Switzerland, 2016; Volume 665, pp. 51–61. [Google Scholar] [CrossRef]
- Chen, X.; Guo, H.; Wang, G.; Zhang, L. Motion Feature Augmented Recurrent Neural Network for Skeleton-Based Dynamic Hand Gesture Recognition. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 2881–2885. [Google Scholar] [CrossRef]
- Lai, K.; Yanushkevich, S.N. CNN+RNN depth and skeleton based dynamic hand gesture recognition. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3451–3456. [Google Scholar] [CrossRef]
- Devineau, G.; Moutarde, F.; Xi, W.; Yang, J. Deep Learning for Hand Gesture Recognition on Skeletal Data. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 106–113. [Google Scholar] [CrossRef]
- Lee, J.; Lee, M.; Lee, D.; Lee, S. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 10410–10419. [Google Scholar] [CrossRef]
- Niepert, M.; Ahmed, M.; Kutzkov, K. Learning convolutional neural networks for graphs. In Proceedings of the 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, NY, USA; Volume 48, pp. 2014–2023. [Google Scholar]
- Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Las Vegas, NV, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12018–12027. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, Z.; Yuan, C.; Li, B.; Deng, Y.; Hu, W. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 13339–13348. [Google Scholar] [CrossRef]
- Liu, J.; Wang, X.; Wang, C.; Gao, Y.; Liu, M. Temporal decoupling graph convolutional network for skeleton-based gesture recognition. IEEE Trans. Multimed. 2024, 26, 811–823. [Google Scholar] [CrossRef]
- Duan, H.; Wang, J.; Chen, K.; Lin, D. DG-STGCN: Dynamic spatial-temporal modeling for skeleton-based action recognition. arXiv 2022, arXiv:2210.05895. [Google Scholar]
- Zhang, C.; Hu, S.; Tang, Z.G.; Chan, T.H.H. Re-revisiting learning on hypergraphs: Confidence interval and subgradient method. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 4026–4034. [Google Scholar]
- Feng, Y.; You, H.; Zhang, Z.; Ji, R.; Gao, Y. Hypergraph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 3558–3565. [Google Scholar] [CrossRef]
- Bai, S.; Zhang, F.; Torr, P.H.S. Hypergraph convolution and hypergraph attention. Pattern Recognit. 2021, 110, 107637. [Google Scholar] [CrossRef]
- De Smedt, Q.; Wannous, H.; Vandeborre, J.P. Skeleton-based dynamic hand gesture recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1206–1214. [Google Scholar] [CrossRef]
- De Smedt, Q.; Wannous, H.; Vandeborre, J.-P.; Guerry, J.; Le Saux, B.; Filliat, D. 3D hand gesture recognition using a depth and skeletal dataset. In Eurographics Workshop on 3D Object Retrieval; Pratikakis, I., Dupont, F., Ovsjanikov, M., Eds.; The Eurographics Association: Lyon, France, 2017; pp. 33–38. [Google Scholar] [CrossRef]
- Núñez, J.C.; Cabido, R.; Pantrigo, J.J.; Montemayor, A.S.; Vélez, J.F. Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit. 2018, 76, 80–94. [Google Scholar] [CrossRef]
- Nguyen, X.S.; Brun, L.; Lézoray, O.; Bougleux, S. A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12028–12037. [Google Scholar] [CrossRef]
- Liu, J.; Liu, Y.; Wang, Y.; Prinet, V.; Xiang, S.; Pan, C. Decoupled representation learning for skeleton-based gesture recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5750–5759. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition. In Computer Vision—ACCV 2020: 15th Asian Conference on Computer Vision, Revised Selected Papers, Part V, 1st ed.; Ishikawa, H., Liu, C.L., Pajdla, T., Shi, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2021; Volume 12626, pp. 38–53. [Google Scholar] [CrossRef]
- Guo, F.; He, Z.; Zhang, S.; Zhao, X.; Fang, J.; Tan, J. Normalized edge convolutional networks for skeleton-based hand gesture recognition. Pattern Recognit. 2021, 118, 108044. [Google Scholar] [CrossRef]
- Song, J.H.; Kong, K.; Kang, S.J. Dynamic hand gesture recognition using improved spatio-temporal graph convolutional network. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6227–6239. [Google Scholar] [CrossRef]
- Huang, X.; Zhou, H.; Wang, J.; Feng, H.; Han, J.; Ding, E.; Wang, J.; Wang, X.; Liu, W.; Feng, B. Graph contrastive learning for skeleton-based action recognition. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Aspect | Classical Convolution | Graph Convolution | Hypergraph Convolution |
---|---|---|---|
Domain | Time/Spatial | Graph | Hypergraph |
Transform | Fourier Transform | Graph Fourier transform | Hypergraph Fourier transform |
Basis | Sine/Cosine Functions | Graph Laplacian eigenvectors | Hypergraph Laplacian eigenvectors |
Convolution | Multiplication in Frequency Domain | Pointwise multiplication in spectral domain | Pointwise multiplication in hypergraph frequency domain |
Relationships Modeled | Local, translation invariant | Pairwise node relationships | Higher-order relationships among multiple nodes |
Methods | Source | Year of Publication |
---|---|---|
CNN + LSTM [26] | PR | 2018 |
ST-GCN [15] | AAAI | 2018 |
ST-TS-HGR-Net [27] | CVPR | 2019 |
HPEV [28] | CVPR | 2020 |
DSTA [29] | CVPR | 2020 |
NormalizedEdgeCN [30] | CVPR | 2021 |
MS-ISTGCN [31] | TCSVT | 2022 |
SkeletonGCN [32] | ICLR | 2023 |
TD-GCN [19] | TMM | 2023 |
Methods | Streams | Accuracy (%) | |
---|---|---|---|
14-Class | 28-Class | ||
CNN + LSTM [26] | 1 | 85.6 | 81.1 |
ST-GCN [15] | 1 | 85.6 | 81.2 |
ST-TS-HGR-Net [27] | 1 | 87.3 | 83.4 |
HPEV [28] | 2 | 92.5 | 88.9 |
DSTA [29] | 4 | 93.8 | 90.9 |
NormalizedEdgeCN [30] | 2 | 92.9 | 91.1 |
MS-ISTGCN [31] | 3 | 93.7 | 91.2 |
SkeletonGCN [32] | 3 | 95.7 | 94.3 |
TD-GCN [19] | 3 | 93.9 | 91.4 |
Ours(j + + b) | 3 | 94.3 | 94.3 |
Ours(j + + b + o) | 4 | 96.4 | 95.0 |
Methods | Streams | Accuracy (%) | |
---|---|---|---|
14-Class | 28-Class | ||
ST-GCN [15] | 1 | 92.7 | 87.7 |
ST-TS-HGR-Net [27] | 1 | 94.3 | 89.4 |
HPEV [28] | 1 | 94.9 | 92.3 |
DSTA [29] | 2 | 97.0 | 93.9 |
NormalizedEdgeCN [30] | 4 | 94.8 | 92.9 |
MS-ISTGCN [31] | 2 | 96.7 | 94.9 |
SkeletonGCN [32] | 3 | 96.9 | 94.8 |
TD-GCN [19] | 3 | 97.0 | 95.0 |
Ours(j + + b) | 3 | 97.5 | 95.2 |
Ours(j + + b + o) | 3 | 97.6 | 95.6 |
Methods | Non Predefined | Dynamic | |
---|---|---|---|
ST-GCN | ×(No Hypergraph) | × | × |
HGCN | √ | × | |
DHGCN (Ours) | Combined Dynamic and Predefined Elements | √ | √ |
Stream | SHREC’17 14-Class | SHREC’17 28-Class | DHG 14-Class | DHG 28-Class | ||||
---|---|---|---|---|---|---|---|---|
Index | Accuracy(%) | Coefficient | Accuracy(%) | Coefficient | Accuracy(%) | Coefficient | Accuracy(%) | Coefficient |
j | 95.8 | 1.0 | 94.4 | 1.0 | 92.9 | 1.0 | 90.7 | 1.0 |
95.2 | 1.0 | 92.3 | 1.0 | 92.1 | 1.0 | 87.1 | 1.0 | |
b | 87.4 | 1.0 | 87.5 | 1.0 | 79.3 | 1.0 | 73.6 | 1.0 |
o | 94.4 | 1.0 | 92.3 | 1.0 | 92.1 | 1.0 | 86.4 | 1.0 |
j + b | 96.0 | [0.8, 0.2] | 94.4 | [0.1, 0.0] | 94.2 | [0.3, 0.1] | 92.9 | [0.1, 0.1] |
j | 96.5 | [0.0, 1.0] | 94.6 | [0.3, 0.2] | 92.9 | [0.4, 0.1] | 90.0 | [0.1, 0.0] |
j + o | 96.0 | [0.4, 0.1] | 94.4 | [0.1, 0.1] | 95.0 | [0.4, 0.1] | 90.7 | [0.4, 0.1] |
+ b | 96.0 | [0.0, 1.0] | 93.8 | [0.0, 1.0] | 90.0 | [0.0, 1.0] | 87.0 | [0.0, 1.0] |
+ o | 95.6 | [0.4, 0.5] | 93.6 | [0.5, 0.4] | 92.1 | [0.0, 1.0] | 90.7 | [0.3, 0.1] |
o + b | 96.3 | [0.3, 0.2] | 94.8 | [0.5, 0.7] | 92.6 | [0.9, 1.1] | 87.1 | [0.4, 0.3] |
j + b | 97.5 | [0.3, 0.5, 0.4] | 95.2 | [0.8, 0.7, 0.4] | 94.2 | [0.3, 0.1, 0.0] | 94.3 | [0.3, 0.6, 0.2] |
+ b + o | 96.9 | [0.2, 0.2, 0.1] | 95.2 | [0.2, 0.5, 0.3] | 93.4 | [0.7, 0.3, 0.5] | 92.9 | [0.3, 0.5, 0.4] |
j + b + o | 97.1 | [1.0, 0.6, 0.7] | 95.6 | [0.3, 0.9, 0.7] | 95.7 | [0.8, 0.1, 0.4] | 90.7 | [0.4, 0.0, 0.1] |
j + o | 96.3 | [0.3, 0.4, 0.2] | 94.6 | [0.6, 0.4, 0.5] | 96.4 | [1.0, 0.4, 0.3] | 94.3 | [0.2, 0.3, 0.2] |
j + b + o | 97.6 | [0.9, 0.5, 0.7, 0.3] | 96.6 | [0.2, 0.1, 0.7, 0.4] | 96.4 | [1.0, 0.4, 0.0, 0.3] | 95.0 | [0.1, 0.3, 0.1, 0.1] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jing, D.-X.; Huang, K.; Liu, S.-J.; Zou, Z.; Hsu, C.-Y. Dynamic Hypergraph Convolutional Networks for Hand Motion Gesture Sequence Recognition. Technologies 2025, 13, 257. https://doi.org/10.3390/technologies13060257
Jing D-X, Huang K, Liu S-J, Zou Z, Hsu C-Y. Dynamic Hypergraph Convolutional Networks for Hand Motion Gesture Sequence Recognition. Technologies. 2025; 13(6):257. https://doi.org/10.3390/technologies13060257
Chicago/Turabian StyleJing, Dong-Xing, Kui Huang, Shi-Jian Liu, Zheng Zou, and Chih-Yu Hsu. 2025. "Dynamic Hypergraph Convolutional Networks for Hand Motion Gesture Sequence Recognition" Technologies 13, no. 6: 257. https://doi.org/10.3390/technologies13060257
APA StyleJing, D.-X., Huang, K., Liu, S.-J., Zou, Z., & Hsu, C.-Y. (2025). Dynamic Hypergraph Convolutional Networks for Hand Motion Gesture Sequence Recognition. Technologies, 13(6), 257. https://doi.org/10.3390/technologies13060257