A Review of State-of-the-Art Methodologies and Applications in Action Recognition
Abstract
1. Introduction
- This manuscript delivers a thorough overview of action recognition from various structural and hierarchical perspectives, encompassing traditional methods, RGB-based neural networks, and skeleton-based neural networks. It highlights cutting-edge algorithms, providing an introduction and comparative analysis of pivotal techniques.
- A thorough review of action recognition applications across various domains, including video surveillance, motion analysis, medical monitoring, and other societal sectors, is presented from diverse perspectives and functions.
- This comprehensive review investigates cutting-edge, high-performance pose estimation techniques designed to optimize the extraction of skeletal data and better serve skeletal neural network-based methods.
- The performance of both established and emerging action recognition technologies is assessed, with incisive insights into future technological and application trends.
2. Traditional Action Recognition Methods and Applications
2.1. Traditional Methods
2.2. Datasets and Comparisons of Traditional Methods
- (1)
- Weizmann dataset
- (2)
- KTH dataset
- (3)
- IXMAS dataset
- (4)
- Hollywood dataset
2.3. Application of Traditional Methods
3. RGB-Based Neural Network Methods and Applications
3.1. RGB-Based Methods
3.1.1. Two-Flow Networks Methods
3.1.2. Three-Dimensional Convolution Methods
3.1.3. Hybrid Methods
3.2. Datasets and Comparisons of RGB-Based Methods
- (1)
- HMDB51 dataset
- (2)
- UCF101 dataset
- (3)
- Kinetics-400 dataset
- (4)
- Something–Something V1 dataset
- (5)
- Something–Something V2 dataset
3.3. Application of RGB Neural Network Methods

4. Skeleton-Based Neural Network Action Recognition Methods and Applications
4.1. Pose Estimation
4.1.1. Two-Dimensional Pose Estimation
4.1.2. Three-Dimensional Pose Estimation
4.2. Skeleton-Based Neural Network Methods
4.2.1. RNN-Based Methods
4.2.2. CNN-Based Methods
4.2.3. GCN-Based Methods

4.3. Datasets and Comparisons of Skeleton-Based Methods
- (1)
- NTU-RGB+D dataset
- (2)
- SBU Kinect dataset
- (3)
- UT-Kinect dataset
- (4)
- PKU-MMD dataset
4.4. Application of Skeleton-Based Neural Network Methods
5. Challenges
6. Conclusions and Outlook
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huszar, V.D.; Adhikarla, V.K.; Negyesi, I.; Krasznay, C. Toward Fast and Accurate Violence Detection for Automated Video Surveillance Applications. IEEE Access 2023, 11, 18772–18793. [Google Scholar] [CrossRef]
- Liu, Z.; Yan, D.; Cai, Y.; Song, Y. Spatio-temporal human action localization in indoor surveillances. Pattern Recognit. 2024, 147, 110087. [Google Scholar] [CrossRef]
- Ruiz-Santaquiteria, J.; Munoz, J.D.; Maigler, F.J.; Deniz, O.; Bueno, G. Firearm-related action recognition and object detection dataset for video surveillance systems. Data Brief 2024, 52, 110030. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Sheng, H.; Zhang, Y.; Wang, S.; Xiong, Z.; Ke, W. Hybrid Motion Model for Multiple Object Tracking in Mobile Devices. IEEE Internet Things J. 2023, 10, 4735–4748. [Google Scholar] [CrossRef]
- Da Lio, M.; Dona, R.; Papini, G.P.R.; Plebe, A. The Biasing of Action Selection Produces Emergent Human-Robot Interactions in Autonomous Driving. IEEE Rob. Autom. Lett. 2022, 7, 1254–1261. [Google Scholar] [CrossRef]
- Hu, X.; Liu, Y.; Tang, B.; Yan, J.; Chen, L. Learning Dynamic Graph for Overtaking Strategy in Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11921–11933. [Google Scholar] [CrossRef]
- Cao, A.; Xie, X.; Zhou, M.; Zhang, H.; Xu, M.; Wu, Y. Action-Evaluator: A Visualization Approach for Player Action Evaluation in Soccer. IEEE Trans. Vis. Comput. Graph. 2024, 30, 880–890. [Google Scholar] [CrossRef]
- Xiao, L.; Cao, Y.; Gai, Y.; Khezri, E.; Liu, J.; Yang, M. Recognizing sports activities from video frames using deformable convolution and adaptive multiscale features. J. Cloud Comput. Adv. Syst. Appl. 2023, 12, 167. [Google Scholar] [CrossRef]
- Guo, Y.; Ju, R.; Li, K.; Lan, Z.; Niu, L.; Hou, X.; Qian, S.; Chen, W.; Liu, X.; Li, G.; et al. A Smart Ski Pole for Skiing Pattern Recognition and Quantification Application. Sensors 2024, 24, 5291. [Google Scholar] [CrossRef]
- Zhou, K.; Ma, Y.; Shum, H.P.H.; Liang, X. Hierarchical Graph Convolutional Networks for Action Quality Assessment. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7749–7763. [Google Scholar] [CrossRef]
- Akinyemi, T.O.; Omisore, O.M.; Du, W.; Duan, W.; Chen, X.-Y.; Yi, G.; Wang, L. Interventionalist Hand Motion Recognition With Convolutional Neural Network in Robot-Assisted Coronary Interventions. IEEE Sens. J. 2023, 23, 17725–17736. [Google Scholar] [CrossRef]
- Gilanie, G.; ul Hassan, M.; Asghar, M.; Qamar, A.M.; Ullah, H.; Khan, R.U.; Aslam, N.; Khan, I.U. An Automated and Real-time Approach of Depression Detection from Facial Micro-expressions. CMC-Comput. Mater. Contin. 2022, 73, 2513–2528. [Google Scholar] [CrossRef]
- Meli, D.; Fiorini, P. Unsupervised Identification of Surgical Robotic Actions From Small Homogeneous Datasets. IEEE Rob. Autom. Lett. 2021, 6, 8205–8212. [Google Scholar] [CrossRef]
- Pan, M.-Z.; Deng, Y.-W.; Li, Z.; Chen, Y.; Liao, X.-L.; Bian, G.-B. Dynamic Multiaction Recognition and Expert Movement Mapping for Closed Pelvic Reduction. IEEE Trans. Ind. Inf. 2023, 19, 8667–8678. [Google Scholar] [CrossRef]
- Tong, Y.; Zhang, Z.; Chen, G.; Li, X.; Yan, H.; Xu, M.; Lin, B. Testing the Feasibility of a Multi-Model Fusion Method for Monitoring the Action of Rehabilitating Stroke Patients in Care Management. IEEE Access 2021, 9, 78174–78187. [Google Scholar] [CrossRef]
- Ding, X.; Peng, W.; Yi, X. Evaluation of Physical Education Teaching Effect Based on Action Skill Recognition. Comput. Intell. Neurosci. 2022, 2022, 9489704. [Google Scholar] [CrossRef]
- Li, Y.; Qi, X.; Saudagar, A.K.J.; Badshah, A.M.; Muhammad, K.; Liu, S. Student behavior recognition for interaction detection in the classroom environment. Image Vis. Comput. 2023, 136, 104726. [Google Scholar] [CrossRef]
- Zhang, Y.; Hou, X. Application of video image processing in sports action recognition based on particle swarm optimization algorithm. Prev. Med. 2023, 173, 107592. [Google Scholar] [CrossRef]
- Lin, F.; Wang, Z.; Zhao, H.; Qiu, S.; Shi, X.; Wu, L.; Gravina, R.; Fortino, G. Adaptive Multi-Modal Fusion Framework for Activity Monitoring of People With Mobility Disability. IEEE J. Biomed. Health. Inf. 2022, 26, 4314–4324. [Google Scholar] [CrossRef]
- Cui, J.; Yan, B.; Du, H.; Shang, Y.; Tong, L. Application of Foot Hallux Contact Force Signal for Assistive Hand Fine Control. Sensors 2023, 23, 5277. [Google Scholar] [CrossRef]
- Bobick, A.; Davis, J. The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 257–267. [Google Scholar] [CrossRef]
- Laptev, I. On space-time interest points. Int. J. Comput. Vis. 2005, 64, 107–123. [Google Scholar] [CrossRef]
- Gorelick, L.; Blank, M.; Shechtman, E.; Irani, M.; Basri, R. Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2247–2253. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Elman, J.L. Finding structure in time. Cognit. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
- Jin, C.B.; Li, S.Z.; Do, T.D.; Kim, H. Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras. In Proceedings of the 16th Pacific-Rim Conference on Multimedia (PCM), Gwangju, Republic of Korea, 16–18 September 2015; pp. 330–339. [Google Scholar]
- Yao, G.L.; Lei, T.; Zhong, J.D. A review of Convolutional-Neural-Network-based action recognition. Pattern Recognit. Lett. 2019, 118, 14–22. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Ji, S.W.; Xu, W.; Yang, M.; Yu, K. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [Google Scholar] [CrossRef] [PubMed]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Li, Y.; He, Z.; Ye, X.; He, Z.; Han, K. Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition. EURASIP J. Image Video Process. 2019, 2019, 78. [Google Scholar] [CrossRef]
- Song, L.; Yu, G.; Yuan, J.; Liu, Z. Human pose estimation and its application to action recognition: A survey*. J. Vis. Commun. Image Represent. 2021, 76, 103055. [Google Scholar] [CrossRef]
- Feng, L.Q.; Zhao, Y.Q.; Zhao, W.X.; Tang, J.X. A comparative review of graph convolutional networks for human skeleton-based action recognition. Artif. Intell. Rev. 2022, 55, 4275–4305. [Google Scholar] [CrossRef]
- Dang, L.M.; Min, K.; Wang, H.; Piran, M.J.; Lee, C.H.; Moon, H. Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognit. 2020, 108, 107561. [Google Scholar] [CrossRef]
- Wang, L.; Huynh, D.; Koniusz, P. A Comparative Review of Recent Kinect-Based Action Recognition Algorithms. IEEE Trans. Image Process. 2020, 29, 15–28. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.-B.; Zhang, Y.-X.; Zhong, B.; Lei, Q.; Yang, L.; Du, J.-X.; Chen, D.-S. A Comprehensive Survey of Vision-Based Human Action Recognition Methods. Sensors 2019, 19, 1005. [Google Scholar] [CrossRef] [PubMed]
- Beddiar, D.R.; Nini, B.; Sabokrou, M.; Hadid, A. Vision-based human activity recognition: A survey. Multimed. Tools Appl. 2020, 79, 30509–30555. [Google Scholar] [CrossRef]
- Yue, R.J.; Tian, Z.Q.; Du, S.Y. Action recognition based on RGB and skeleton data sets: A survey. Neurocomputing 2022, 512, 287–306. [Google Scholar] [CrossRef]
- Wang, C.L.; Yan, J.J. A Comprehensive Survey of RGB-Based and Skeleton-Based Human Action Recognition. IEEE Access 2023, 11, 53880–53898. [Google Scholar] [CrossRef]
- Karim, M.; Khalid, S.; Aleryani, A.; Khan, J.; Ullah, I.; Ali, Z. Human Action Recognition Systems: A Review of the Trends and State-of-the-Art. IEEE Access 2024, 12, 36372–36390. [Google Scholar] [CrossRef]
- Kong, Y.; Fu, Y. Human Action Recognition and Prediction: A Survey. Int. J. Comput. Vis. 2022, 130, 1366–1401. [Google Scholar] [CrossRef]
- Wu, F.; Wang, Q.Z.; Bian, J.; Ding, N.; Lu, F.X.; Cheng, J.; Dou, D.J.; Xiong, H.Y. A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications. IEEE Trans. Multimed. 2023, 25, 7943–7966. [Google Scholar] [CrossRef]
- Chaquet, J.M.; Carmona, E.J.; Fernández-Caballero, A. A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 2013, 117, 633–659. [Google Scholar] [CrossRef]
- Sun, Z.H.; Ke, Q.H.; Rahmani, H.; Bennamoun, M.; Wang, G.; Liu, J. Human Action Recognition From Various Data Modalities: A Review. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3200–3225. [Google Scholar] [CrossRef] [PubMed]
- Moghaddam, Z.; Piccardi, M. Deterministic Initialization of Hidden Markov Models for Human Action Recognition. In Proceedings of the 11th Conference on Digital Image Computing: Techniques and Applications, Melbourne, Australia, 1–3 December 2009; pp. 188–195. [Google Scholar]
- Moghaddam, Z.; Piccardi, M. Training Initialization of Hidden Markov Models in Human Action Recognition. IEEE Trans. Autom. Sci. Eng. 2014, 11, 394–408. [Google Scholar] [CrossRef]
- Shi, Q.; Cheng, L.; Wang, L.; Smola, A. Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models. Int. J. Comput. Vis. 2011, 93, 22–32. [Google Scholar] [CrossRef]
- Zhou, W.; Zhang, Z. Human Action Recognition With Multiple-Instance Markov Model. IEEE Trans. Inf. Forensics Secur. 2014, 9, 1581–1591. [Google Scholar] [CrossRef]
- Nie, S.; Ji, Q. Capturing Global and Local Dynamics for Human Action Recognition. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR), Swedish Soc Automated Image Anal, Stockholm, Sweden, 24–28 August 2014; pp. 1946–1951. [Google Scholar]
- Liu, L.; Shao, L.; Rockett, P. Human action recognition based on boosted feature selection and naive Bayes nearest-neighbor classification. Signal Process. 2013, 93, 1521–1530. [Google Scholar] [CrossRef]
- Liu, L.; Shao, L.; Zhen, X.; Li, X. Learning Discriminative Key Poses for Action Recognition. IEEE Trans. Cybern. 2013, 43, 1860–1870. [Google Scholar] [CrossRef]
- Mahapatra, A.; Mishra, T.K.; Sa, P.K.; Majhi, B. Human recognition system for outdoor videos using Hidden Markov model. AEU-Int. J. Electron. Commun. 2014, 68, 227–236. [Google Scholar] [CrossRef]
- Lin, C.-H.; Hsu, F.-S.; Lin, W.-Y. Recognizing Human Actions Using NWFE-Based Histogram Vectors. EURASIP J. Adv. Signal Process. 2010, 2010, 453064. [Google Scholar] [CrossRef]
- Zhu, F.; Shao, L.; Lin, M. Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognit. Lett. 2013, 34, 20–24. [Google Scholar] [CrossRef]
- Iosifidis, A.; Tefas, A.; Pitas, I. Multi-view action recognition based on action volumes, fuzzy distances and cluster discriminant analysis. Signal Process. 2013, 93, 1445–1457. [Google Scholar] [CrossRef]
- Iosifidis, A.; Tefas, A.; Pitas, I. Discriminant Bag of Words based representation for human action recognition. Pattern Recognit. Lett. 2014, 49, 185–192. [Google Scholar] [CrossRef]
- Marin-Jimenez, M.J.; Munoz-Salinas, R.; Yeguas-Bolivar, E.; Perez de la Blanca, N. Human interaction categorization by using audio-visual cues. Mach. Vis. Appl. 2014, 25, 71–84. [Google Scholar] [CrossRef]
- Souvenir, R.; Parrigan, K. Viewpoint Manifolds for Action Recognition. EURASIP J. Image Video Process. 2009, 2009, 738702. [Google Scholar] [CrossRef][Green Version]
- DeMenthon, D.; Doermann, D. Video retrieval of near-duplicates using κ-nearest neighbor retrieval of spatio-temporal descriptors. Multimed. Tools Appl. 2006, 30, 229–253. [Google Scholar] [CrossRef][Green Version]
- Nga, D.H.; Yanai, K. Automatic extraction of relevant video shots of specific actions exploiting Web data. Comput. Vis. Image Underst. 2014, 118, 2–15. [Google Scholar] [CrossRef]
- Nagarajan, R.; Hariharan, M.; Satiyan, M. Luminance Sticker Based Facial Expression Recognition Using Discrete Wavelet Transform for Physically Disabled Persons. J. Med. Syst. 2012, 36, 2225–2234. [Google Scholar] [CrossRef] [PubMed]
- Ren, Z.; Yuan, J.; Meng, J.; Zhang, Z. Robust Part-Based Hand Gesture Recognition Using Kinect Sensor. IEEE Trans. Multimed. 2013, 15, 1110–1120. [Google Scholar] [CrossRef]
- Wang, H.; Kläser, A.; Schmid, C.; Liu, C.L. Dense Trajectories and Motion Boundary Descriptors for Action Recognition. Int. J. Comput. Vis. 2013, 103, 60–79. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Two-Stream Convolutional Networks for Action Recognition in Videos. In Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Ren, Z.L.; Zhang, Q.S.; Qiao, P.Y.; Niu, M.L.; Gao, X.Y.; Cheng, J. Joint learning of convolution neural networks for RGB-D-based human action recognition. Electron. Lett. 2020, 56, 1112–1114. [Google Scholar] [CrossRef]
- Xiong, Q.Q.; Zhang, J.J.; Wang, P.; Liu, D.D.; Gao, R.X. Transferable two-stream convolutional neural network for human action recognition. J. Manuf. Syst. 2020, 56, 605–614. [Google Scholar] [CrossRef]
- Liu, J.; Xu, D. GeometryMotion-Net: A Strong Two-Stream Baseline for 3D Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4711–4721. [Google Scholar] [CrossRef]
- Yadav, S.K.; Agarwal, A.; Kumar, A.; Tiwari, K.; Pandey, H.M.; Akbar, S.A. YogNet: A two-stream network for realtime multiperson yoga action recognition and posture correction. Knowl. Based Syst. 2022, 250, 109097. [Google Scholar] [CrossRef]
- Xu, J.; Song, R.; Wei, H.L.; Guo, J.H.; Zhou, Y.F.; Huang, X.W. A fast human action recognition network based on spatio-temporal features. Neurocomputing 2021, 441, 350–358. [Google Scholar] [CrossRef]
- Gao, Z.; Guo, L.; Ren, T.; Liu, A.-A.; Cheng, Z.-Y.; Chen, S. Pairwise Two-Stream ConvNets for Cross-Domain Action Recognition With Small Data. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1147–1161. [Google Scholar] [CrossRef]
- Zhao, Y.L.; Lee, H.J. Tgsnet: A Fractal Neural Network For Action Recognition. Fractals-Complex Geom. Patterns Scaling Nat. Soc. 2023, 31, 2340152. [Google Scholar] [CrossRef]
- Zhou, A.; Ma, Y.; Ji, W.; Zong, M.; Yang, P.; Wu, M.; Liu, M. Multi-head attention-based two-stream EfficientNet for action recognition. Multimed. Syst. 2023, 29, 487–498. [Google Scholar] [CrossRef]
- Ting-Long, L. Short-Term Action Learning for Video Action Recognition. IEEE Access 2024, 12, 30867–30875. [Google Scholar] [CrossRef]
- Yang, H.; Yuan, C.F.; Li, B.; Du, Y.; Xing, J.L.; Hu, W.M.; Maybank, S.J. Asymmetric 3D Convolutional Neural Networks for action recognition. Pattern Recognit. 2019, 85, 1–12. [Google Scholar] [CrossRef]
- Lu, X.S.; Yao, H.X.; Zhao, S.C.; Sun, X.S.; Zhang, S.P. Action recognition with multi-scale trajectory-pooled 3D convolutional descriptors. Multimed. Tools Appl. 2019, 78, 507–523. [Google Scholar] [CrossRef]
- Yang, H.; Liu, L.; Min, W.; Yang, X.; Xiong, X. Driver Yawning Detection Based on Subtle Facial Action Recognition. IEEE Trans. Multimed. 2021, 23, 572–583. [Google Scholar] [CrossRef]
- Jiang, S.; Qi, Y.; Zhang, H.; Bai, Z.; Lu, X.; Wang, P. D3D: Dual 3-D Convolutional Network for Real-Time Action Recognition. IEEE Trans. Ind. Inf. 2021, 17, 4584–4593. [Google Scholar] [CrossRef]
- Zhang, H.; Hu, Z.; Yu, D.; Guan, L.; Liu, X.; Ma, C. Multipath Attention and Adaptive Gating Network for Video Action Recognition. Neural Process. Lett. 2024, 56, 124. [Google Scholar] [CrossRef]
- Ju, Y. Study of Human Motion Recognition Algorithm Based on Multichannel 3D Convolutional Neural Network. Complexity 2021, 2021, 7646813. [Google Scholar] [CrossRef]
- Zhang, Z.; Peng, Y.; Gan, C.; Abate, A.F.; Zhu, L. Separable 3D residual attention network for human action recognition. Multimed. Tools Appl. 2023, 82, 5435–5453. [Google Scholar] [CrossRef]
- Kumawat, S.; Verma, M.; Nakashima, Y.; Raman, S. Depthwise Spatio-Temporal STFT Convolutiona Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4839–4851. [Google Scholar] [CrossRef]
- Chang, Y.L.; Chan, C.S.; Remagnino, P. Action recognition on continuous video. Neural Comput. Appl. 2021, 33, 1233–1243. [Google Scholar] [CrossRef]
- Deng, L.; Fu, R.; Sun, Q.; Jiang, M.; Li, Z.; Chen, H.; Yu, Z.; Bu, X. Abnormal behavior recognition based on feature fusion C3D network. J. Electron. Imaging 2023, 32, 021605. [Google Scholar] [CrossRef]
- Sanchez-Caballero, A.; de Lopez-Diz, S.; Fuentes-Jimenez, D.; Losada-Gutierrez, C.; Marron-Romera, M.; Casillas-Perez, D.; Sarker, M.I. 3DFCNN: Real-time action recognition using 3D deep neural networks with raw depth information. Multimed. Tools Appl. 2022, 81, 24119–24143. [Google Scholar] [CrossRef]
- He, J.Y.; Wu, X.; Cheng, Z.Q.; Yuan, Z.Q.; Jiang, Y.G. DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition. Neurocomputing 2021, 444, 319–331. [Google Scholar] [CrossRef]
- Munsif, M.; Khan, S.U.; Khan, N.; Baik, S.W. Attention-Based Deep Learning Framework for Action Recognition in a Dark Environment. Hum.-Centric Comput. Inf. Sci. 2024, 14, 4. [Google Scholar] [CrossRef]
- Dai, C.; Liu, X.G.; Lai, J.F. Human action recognition using two-stream attention based LSTM networks. Appl. Soft Comput. 2020, 86, 105820. [Google Scholar] [CrossRef]
- Li, X.Y.; Hou, Y.H.; Wang, P.C.; Gao, Z.M.; Xu, M.L.; Li, W.Q. Trear: Transformer-Based RGB-D Egocentric Action Recognition. IEEE Trans. Cognit. Dev. Syst. 2022, 14, 246–252. [Google Scholar] [CrossRef]
- Srihari, D.; Kishore, P.V.V.; Kumar, E.K.; Kumar, D.A.; Kumar, M.T.K.; Prase, M.V.D.; Prasd, C.R. A four-stream ConvNet based on spatial and depth flow for human action classification using RGB-D data. Multimed. Tools Appl. 2020, 79, 11723–11746. [Google Scholar] [CrossRef]
- Ullah, A.; Muhammad, K.; Ding, W.P.; Palade, V.; Ul Haq, I.; Baik, S.W. Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl. Soft Comput. 2021, 103, 107102. [Google Scholar] [CrossRef]
- Liu, B.; Luo, J.X.; Huang, H. Toward automatic quantification of knee osteoarthritis severity using improved Faster R-CNN. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 457–466. [Google Scholar] [CrossRef] [PubMed]
- Jain, D.K.; Zhang, Z.; Huang, K. Multi angle optimal pattern-based deep learning for automatic facial expression recognition. Pattern Recognit. Lett. 2020, 139, 157–165. [Google Scholar] [CrossRef]
- Ding, I., Jr.; Zheng, N.-W.; Hsieh, M.-C. Hand gesture intention-based identity recognition using various recognition strategies incorporated with VGG convolution neural network-extracted deep learning features. J. Intell. Fuzzy Syst. 2021, 40, 7775–7788. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, X.; Lin, Y.; Wang, H. Facial Expression Recognition via Deep Action Units Graph Network Based on Psychological Mechanism. IEEE Trans. Cognit. Dev. Syst. 2020, 12, 311–322. [Google Scholar] [CrossRef]
- Bellamkonda, S.; Gopalan, N.P.; Mala, C.; Settipalli, L. Facial expression recognition on partially occluded faces using component based ensemble stacked CNN. Cognit. Neurodyn. 2023, 17, 985–1008. [Google Scholar] [CrossRef] [PubMed]
- Onal Ertugrul, I.; Ahn, Y.A.; Bilalpur, M.; Messinger, D.S.; Speltz, M.L.; Cohn, J.F. Infant AFAR: Automated facial action recognition in infants. Behav. Res. Methods 2023, 55, 1024–1035. [Google Scholar] [CrossRef]
- Hu, J.; Liu, W.; Kang, J.; Yang, W.; Zhao, H. Semi-cascade network for driver’s distraction recognition. Proc. Inst. Mech. Eng. Part D-J. Automob. Eng. 2019, 233, 2323–2332. [Google Scholar] [CrossRef]
- Su, L.; Sun, C.; Cao, D.; Khajepour, A. Efficient Driver Anomaly Detection via Conditional Temporal Proposal and Classification Network. IEEE Trans. Comput. Soc. Syst. 2023, 10, 736–745. [Google Scholar] [CrossRef]
- Alotaibi, M.; Alotaibi, B. Distracted driver classification using deep learning. Signal Image Video Process. 2020, 14, 617–624. [Google Scholar] [CrossRef]
- Yin, Z.; Yang, Z.; Van de Panne, M.; Yin, K. Discovering Diverse Athletic Jumping Strategies. ACM Trans. Graph. 2021, 40, 1–17. [Google Scholar] [CrossRef]
- Liu, Y.; Dong, H.; Wang, L. Trampoline Motion Decomposition Method Based on Deep Learning Image Recognition. Sci. Program. 2021, 2021, 1215065. [Google Scholar] [CrossRef]
- Alanazi, T.; Muhammad, G. Human Fall Detection Using 3D Multi-Stream Convolutional Neural Networks with Fusion. Diagnostics 2022, 12, 3060. [Google Scholar] [CrossRef] [PubMed]
- Jain, S.; Rustagi, A.; Saurav, S.; Saini, R.; Singh, S. Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment. Neural Comput. Appl. 2021, 33, 6427–6441. [Google Scholar] [CrossRef]
- Wang, Y. Research on Dance Movement Recognition Based on Multi-Source Information. Math. Probl. Eng. 2022, 2022, 5257165. [Google Scholar] [CrossRef]
- Zhu, D. Real-Time Recognition Method of Video Basketball Technical Action Based on Target Detection Algorithm. Math. Probl. Eng. 2022, 2022, 4209020. [Google Scholar] [CrossRef]
- Liu, R.; Liu, Z.Q.; Liu, S.Y. Recognition of Basketball Player’s Shooting Action Based on the Convolutional Neural Network. Sci. Program. 2021, 2021, 3045418. [Google Scholar] [CrossRef]
- Jiang, F.; Chen, X. An Action Recognition Algorithm for Sprinters Using Machine Learning. Mob. Inf. Syst. 2021, 2021, 9919992. [Google Scholar] [CrossRef]
- He, S.; Gong, R. Recognition and Prediction of Badminton Attitude Based on Video Image Analysis. Mob. Inf. Syst. 2022, 2022, 6960343. [Google Scholar] [CrossRef]
- Sun, X.; Wang, Y.; Khan, J. Hybrid LSTM and GAN model for action recognition and prediction of lawn tennis sport activities. Soft Comput. 2023, 27, 18093–18112. [Google Scholar] [CrossRef]
- Xu, C.; Fu, Y.; Zhang, B.; Chen, Z.; Jiang, Y.-G.; Xue, X. Learning to Score Figure Skating Sport Videos. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4578–4590. [Google Scholar] [CrossRef]
- Ben Mabrouk, A.; Zagrouba, E. Abnormal behavior recognition for intelligent video surveillance systems: A review. Expert Syst. Appl. 2018, 91, 480–491. [Google Scholar] [CrossRef]
- Srivastava, A.; Badal, T.; Garg, A.; Vidyarthi, A.; Singh, R. Recognizing human violent action using drone surveillance within real-time proximity. J. Real-Time Image Process. 2021, 18, 1851–1863. [Google Scholar] [CrossRef]
- Zhu, Y.; Gao, Q.; Shi, H.; Liu, J. Gestures recognition based on multimodal fusion by using 3D CNNs. J. Intell. Fuzzy Syst. 2024, 46, 1647–1661. [Google Scholar] [CrossRef]
- Zhong, C.; Reibman, A.R.; Mina, H.A.; Deering, A.J. Multi-View Hand-Hygiene Recognition for Food Safety. J. Imaging 2020, 6, 120. [Google Scholar] [CrossRef]
- Al-Amin, M.; Qin, R.; Tao, W.; Doell, D.; Lingard, R.; Yin, Z.; Leu, M.C. Fusing and refining convolutional neural network models for assembly action recognition in smart manufacturing. Proc. Inst. Mech. Eng. Part C-J. Mech. Eng. Sci. 2022, 236, 2046–2059. [Google Scholar] [CrossRef]
- Ding, I., Jr.; Zheng, N.-W. RGB-D Depth-sensor-based Hand Gesture Recognition Using Deep Learning of Depth Images with Shadow Effect Removal for Smart Gesture Communication. Sens. Mater. 2022, 34, 203–216. [Google Scholar] [CrossRef]
- Munea, T.L.; Yang, C.; Huang, C.; Elhassan, M.A.M.; Zhen, Q. SimpleCut: A simple and strong 2D model for multi-person pose estimation? Comput. Vis. Image Underst. 2022, 222, 103509. [Google Scholar] [CrossRef]
- Xiao, B.; Wu, H.P.; Wei, Y.C. Simple Baselines for Human Pose Estimation and Tracking. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 472–487. [Google Scholar]
- Cheng, Y.; Ai, Y.; Wang, B.; Wang, X.; Tan, R.T. Bottom-up 2D pose estimation via dual anatomical centers for small-scale persons. Pattern Recognit. 2023, 139, 109403. [Google Scholar] [CrossRef]
- Zhang, Z.; Luo, Y.; Gou, J. Double anchor embedding for accurate multi-person 2D pose estimation. Image Vis. Comput. 2021, 111, 104198. [Google Scholar] [CrossRef]
- Jin, L.; Wang, X.; Nie, X.; Liu, L.; Guo, Y.; Zhao, J. Grouping by Center: Predicting Centripetal Offsets for the Bottom-up Human Pose Estimation. IEEE Trans. Multimed. 2023, 25, 3364–3374. [Google Scholar] [CrossRef]
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.-E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 172–186. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Fang, Z.; Song, W.-F.; Hao, A.-M.; Qin, H. Bidirectional Optimization Coupled Lightweight Networks for Efficient and Robust Multi-Person 2D Pose Estimation. J. Comput. Sci. Technol. 2019, 34, 522–536. [Google Scholar] [CrossRef]
- Li, M.; Hu, H.; Xiong, J.; Zhao, X.; Yan, H. TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow. Expert Syst. Appl. 2024, 249, 123545. [Google Scholar] [CrossRef]
- Chen, T.; Fang, C.; Shen, X.; Zhu, Y.; Chen, Z.; Luo, J. Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 198–209. [Google Scholar] [CrossRef]
- Cheng, Y.; Wang, B.; Tan, R.T.T. Dual Networks Based 3D Multi-Person Pose Estimation From Monocular Video. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1636–1651. [Google Scholar] [CrossRef]
- Wang, R.; Ying, X.; Xing, B. Exploiting Temporal Correlations for 3D Human Pose Estimation. IEEE Trans. Multimed. 2024, 26, 4527–4539. [Google Scholar] [CrossRef]
- Li, S.; Li, W.; Cook, C.; Zhu, C.; Gao, Y. Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. Proceedings of 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5457–5466. [Google Scholar]
- Liu, J.; Wang, G.; Duan, L.-Y.; Abdiyeva, K.; Kot, A.C. Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks. IEEE Trans. Image Process. 2018, 27, 1586–1599. [Google Scholar] [CrossRef]
- Zhang, S.; Yang, Y.; Xiao, J.; Liu, X.; Yang, Y.; Xie, D.; Zhuang, Y. Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks. IEEE Trans. Multimed. 2018, 20, 2330–2343. [Google Scholar] [CrossRef]
- Feng, J.; Zhang, S.; Xiao, J. Explorations of skeleton features for LSTM-based action recognition. Multimed. Tools Appl. 2019, 78, 591–603. [Google Scholar] [CrossRef]
- Cui, R.; Zhu, A.C.; Wu, J.R.; Hua, G. Skeleton-based attention-aware spatial-temporal model for action detection and recognition. IET Comput. Vis. 2020, 14, 177–184. [Google Scholar] [CrossRef]
- Yang, A.; Lu, W.; Naeem, W.; Chen, L.; Fei, M. A sequence models-based real-time multi-person action recognition method with monocular vision. J. Ambient Intell. Hum. Comput. 2021, 14, 1877–1887. [Google Scholar] [CrossRef]
- Gao, Y.; Li, C.; Li, S.; Cai, X.; Ye, M.; Yuan, H. A Deep Attention Model for Action Recognition from Skeleton Data. Appl. Sci. 2022, 12, 2006. [Google Scholar] [CrossRef]
- Yu, J.; Gao, H.; Chen, Y.; Zhou, D.; Liu, J.; Ju, Z. Adaptive Spatiotemporal Representation Learning for Skeleton-Based Human Action Recognition. IEEE Trans. Cognit. Dev. Syst. 2022, 14, 1654–1665. [Google Scholar] [CrossRef]
- She, Q.; Mu, G.; Gan, H.; Fan, Y. Spatio-temporal SRU with global context-aware attention for 3D human action recognition. Multimed. Tools Appl. 2020, 79, 12349–12371. [Google Scholar] [CrossRef]
- Wang, H.; Wang, L. Learning content and style: Joint action recognition and person identification from human skeletons. Pattern Recognit. 2018, 81, 23–35. [Google Scholar] [CrossRef]
- Zhang, P.F.; Xue, J.R.; Lan, C.L.; Zeng, W.J.; Gao, Z.N.; Zheng, N.N. EleAtt-RNN: Adding Attentiveness to Neurons in Recurrent Neural Networks. IEEE Trans. Image Process. 2020, 29, 1061–1073. [Google Scholar] [CrossRef]
- Wei, S.; Zhao, J.; Li, J.; Yuan, M. Seq2seq model for human action recognition based on skeleton and two-layer bidirectional LSTM. J. Ambient Intell. Smart Environ. 2023, 15, 315–331. [Google Scholar] [CrossRef]
- Du, Y.; Fu, Y.; Wang, L. Skeleton Based Action Recognition with Convolutional Neural Network. In Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 3–6 November 2015; pp. 579–583. [Google Scholar]
- Li, B.; He, M.; Dai, Y.; Cheng, X.; Chen, Y. 3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN. Multimed. Tools Appl. 2018, 77, 22901–22921. [Google Scholar] [CrossRef]
- Nie, W.; Wang, W.; Huang, X. SRNet: Structured Relevance Feature Learning Network From Skeleton Data for Human Action Recognition. IEEE Access 2019, 7, 132161–132172. [Google Scholar] [CrossRef]
- Dang, Y.; Yang, F.; Yin, J. DWnet: Deep-wide network for 3D action recognition. Rob. Auton. Syst. 2020, 126, 103441. [Google Scholar] [CrossRef]
- Guan, S.; Lu, H.; Zhu, L.; Fang, G. AFE-CNN: 3D Skeleton-based Action Recognition with Action Feature Enhancement. Neurocomputing 2022, 514, 256–267. [Google Scholar] [CrossRef]
- De Boissiere, A.M.; Noumeir, R. Infrared and 3D Skeleton Feature Fusion for RGB-D Action Recognition. IEEE Access 2020, 8, 168297–168308. [Google Scholar] [CrossRef]
- Banerjee, A.; Singh, P.K.; Sarkar, R. Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 2206–2216. [Google Scholar] [CrossRef]
- Dhiman, C.; Vishwakarma, D.K.; Agarwal, P. Part-wise Spatio-temporal Attention Driven CNN-based 3D Human Action Recognition. ACM Trans. Multimed. Comput. Commun. Appl. 2021, 17, 1–24. [Google Scholar] [CrossRef]
- Su, B.; Zhang, P.; Sun, M.; Sheng, M. Direction-guided two-stream convolutional neural networks for skeleton-based action recognition. Soft Comput. 2023, 27, 11833–11842. [Google Scholar] [CrossRef]
- Li, C.; Zhong, Q.Y.; Xie, D.; Pu, S.L. Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. Proceedings of 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 786–792. [Google Scholar]
- Huang, H.e.; Su, H.; Chang, Z.; Yu, M.; Gao, J.; Li, X.; Zheng, S. Convolutional neural network with adaptive inferential framework for skeleton-based action recognition. J. Vis. Commun. Image Represent. 2020, 73, 102925. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
- Yan, S.; Xiong, Y.; Lin, D. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence/30th Innovative Applications of Artificial Intelligence Conference/8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 7444–7452. [Google Scholar]
- Chan, W.; Tian, Z.; Wu, Y. GAS-GCN: Gated Action-Specific Graph Convolutional Networks for Skeleton-Based Action Recognition. Sensors 2020, 20, 3499. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Skeleton-Based Action Recognition With Multi-Stream Adaptive Graph Convolutional Networks. IEEE Trans. Image Process. 2020, 29, 9532–9545. [Google Scholar] [CrossRef]
- Plizzari, C.; Cannici, M.; Matteucci, M. Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vis. Image Underst. 2021, 208, 103219. [Google Scholar] [CrossRef]
- Song, Y.-F.; Zhang, Z.; Shan, C.; Wang, L. Richly Activated Graph Convolutional Network for Robust Skeleton-Based Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 1915–1925. [Google Scholar] [CrossRef]
- Shi, W.Z.; Li, D.; Wen, Y.; Yang, W. Occlusion-Aware Graph Neural Networks for Skeleton Action Recognition. IEEE Trans. Ind. Inf. 2023, 19, 10288–10298. [Google Scholar] [CrossRef]
- Bai, Z.Y.; Ding, Q.C.; Xu, H.L.; Chi, J.N.; Zhang, X.Y.; Sun, T.S. Skeleton-based similar action recognition through integrating the salient image feature into a center-connected graph convolutional network. Neurocomputing 2022, 507, 40–53. [Google Scholar] [CrossRef]
- Li, M.S.; Chen, S.H.; Chen, X.; Zhang, Y.; Wang, Y.F.; Tian, Q. Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3316–3333. [Google Scholar] [CrossRef]
- Zhu, L.; Wan, B.; Li, C.; Tian, G.; Hou, Y.; Yuan, K. Dyadic relational graph convolutional networks for skeleton-based human interaction recognition. Pattern Recognit. 2021, 115, 107920. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12026–12035. [Google Scholar]
- Shahroudy, A.; Liu, J.; Ng, T.-T.; Wang, G. NTU RGB plus D: A Large Scale Dataset for 3D Human Activity Analysis. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 1010–1019. [Google Scholar]
- Zhu, Y.; Xiao, M.; Xie, Y.; Xiao, Z.; Jin, G.; Shuai, L. In-bed human pose estimation using multi-source information fusion for health monitoring in real-world scenarios. Inf. Fusion 2024, 105, 102209. [Google Scholar] [CrossRef]
- Yin, Y.; Robinson, J.P.; Fu, Y. Multimodal in-bed pose and shape estimation under the blankets. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 2411–2419. [Google Scholar]
- Liu, S.; Huang, X.; Fu, N.; Li, C.; Su, Z.; Ostadabbas, S.; Intelligence, M. Simultaneously-collected multimodal lying pose dataset: Enabling in-bed human pose monitoring. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1106–1118. [Google Scholar] [CrossRef] [PubMed]
- Karácsony, T.; Jeni, L.A.; de la Torre, F.; Cunha, J.P.S. Deep learning methods for single camera based clinical in-bed movement action recognition. Image Vis. Comput. 2024, 143, 104928. [Google Scholar] [CrossRef]
- Li, J.; Wang, Z.; Wang, C.; Su, W. GaitFormer: Leveraging dual-stream spatial-temporal Vision Transformer via a single low-cost RGB camera for clinical gait analysis. Knowl. Based Syst. 2024, 295, 111810. [Google Scholar] [CrossRef]
- Wang, Z.; Deligianni, F.; Voiculescu, I.; Yang, G.-Z. A Single RGB Camera Based Gait Analysis With A Mobile Tele-Robot For Healthcare. Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Eng. Med. Biol. Soc. Annu. Int. Conf. 2021, 2021, 6933–6936. [Google Scholar] [CrossRef]
- Zhang, D.; Zhang, Y.; Zhou, M. Skeleton-Guided Action Recognition with Multistream 3D Convolutional Neural Network for Elderly-Care Robot. Adv. Intell. Syst. 2023, 5, 2300326. [Google Scholar] [CrossRef]
- Lin, C.-B.; Dong, Z.; Kuan, W.-K.; Huang, Y.-F. A Framework for Fall Detection Based on OpenPose Skeleton and LSTM/GRU Models. Appl. Sci. 2021, 11, 329. [Google Scholar] [CrossRef]
- Zahan, S.; Hassan, G.M.; Mian, A. SDFA: Structure-Aware Discriminative Feature Aggregation for Efficient Human Fall Detection in Video. IEEE Trans. Ind. Inf. 2023, 19, 8713–8721. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, N. Jumping Action Recognition for Figure Skating Video in IoT Using Improved Deep Reinforcement Learning. Inf. Technol. Control 2023, 52, 309–321. [Google Scholar] [CrossRef]
- Luo, C.; Kim, S.-W.; Park, H.-Y.; Lim, K.; Jung, H. Viewpoint-Agnostic Taekwondo Action Recognition Using Synthesized Two-Dimensional Skeletal Datasets. Sensors 2023, 23, 8049. [Google Scholar] [CrossRef] [PubMed]
- Peng, F.; Zhang, H. Research on Action Recognition Method of Dance Video Image Based on Human-Computer Interaction. Sci. Program. 2021, 2021, 8763133. [Google Scholar] [CrossRef]
- Wei, G.; Zhou, H.; Zhang, L.; Wang, J. Spatial-Temporal Self-Attention Enhanced Graph Convolutional Networks for Fitness Yoga Action Recognition. Sensors 2023, 23, 4741. [Google Scholar] [CrossRef]
- Roggio, F.; Ravalli, S.; Maugeri, G.; Bianco, A.; Palma, A.; Di Rosa, M.; Musumeci, G. Technological advancements in the analysis of human motion and posture management through digital devices. World J. Orthop. 2021, 12, 467–484. [Google Scholar] [CrossRef] [PubMed]
- Liu, L. Objects detection toward complicated high remote basketball sports by leveraging deep CNN architecture. Futur. Gener. Comp. Syst. 2021, 119, 31–36. [Google Scholar] [CrossRef]
- Tang, J. An Action Recognition Method for Volleyball Players Using Deep Learning. Sci. Program. 2021, 2021, 3934443. [Google Scholar] [CrossRef]
- Li, X.; Ullah, R. An image classification algorithm for football players’ activities using deep neural network. Soft Comput. 2023, 27, 19317–19337. [Google Scholar] [CrossRef]
- Ren, W. A novel approach for automatic detection and identification of inappropriate postures and movements of table tennis players. Soft Comput. 2024, 28, 2245–2269. [Google Scholar] [CrossRef]
- Chen, G. An interpretable composite CNN and GRU for fine-grained martial arts motion modeling using big data analytics and machine learning. Soft Comput. 2024, 28, 2223–2243. [Google Scholar] [CrossRef]
- Chang, Z.; Zhao, Y. Algorithm for Swimmers’ Starting Posture Correction Based on Kinect. Math. Probl. Eng. 2022, 2022, 1101002. [Google Scholar] [CrossRef]
- Rastgoo, R.; Kiani, K.; Escalera, S.; Sabokrou, M. Multi-modal zero-shot dynamic hand gesture recognition. Expert Syst. Appl. 2024, 247, 123349. [Google Scholar] [CrossRef]
- Balaji, P.; Prusty, M.R. Multimodal fusion hierarchical self-attention network for dynamic hand gesture recognition. J. Vis. Commun. Image Represent. 2024, 98, 104019. [Google Scholar] [CrossRef]
- Li, R.; Wang, H. Graph convolutional networks and LSTM for first-person multimodal hand action recognition. Mach. Vis. Appl. 2022, 33, 84. [Google Scholar] [CrossRef]
- Lin, K.; Wang, X.; Zhu, L.; Zhang, B.; Yang, Y. SKIM: Skeleton-Based Isolated Sign Language Recognition With Part Mixing. IEEE Trans. Multimed. 2024, 26, 4271–4280. [Google Scholar] [CrossRef]
- Arkushin, R.S.; Moryossef, A.; Fried, O. Ham2pose: Animating Sign Language Notation into Pose Sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 21046–21056. [Google Scholar]

















| Methodology | Network Framework Model | Backbone | Highlights | 
|---|---|---|---|
| Xiong [66] | Transferable two-stream network (space flow + time flow + classifier) | CNN | Substantial improvement in feature separation, along with effective feature extraction and learning. | 
| Liu [67] | GeometryMotion-Net (geometry stream + motion stream) | PointNet++ | Facilitating the generation of highly discriminative representations for 3D motion recognition. | 
| Xu [69] | MotionNet, CNN with OFF + Spatial Stream CNN | CNN | Enabling the rapid capture of video frames without the need for precomputing or storing optical flow. | 
| Zhao [71] | TGSNet (teacher network + student network) | CNN | RGB input for fast action classification. | 
| Zhou [72] | MAT-EffNet (two-streams with EfficientNet + multi-head attention) | EfficientNet-B0 | Focus on keyframes to differentiate similar actions. | 
| Liu [73] | STASTA (ST-AFS-FF + DM-TA) | I3D | Stronger discrimination of short-term actions. | 
| Methodology | Network Framework Model | Backbone | Highlights | 
|---|---|---|---|
| Yang [74] | Asymmetric one-directional 3D convolutions (multi-scale convolutional branches + multivariate enhanced input) | 3D CNN | Minimizing algorithmic complexity while simultaneously enhancing feature extraction capabilities. | 
| Lu [75] | Multi-scale trajectory-pooled 3D convolutional (C3D + trajectory-constrained pooling) | 3D CNN | Extracting high-level video features with strategic pooling of key data. | 
| Yang [76] | 3D deep learning network with Low Time Sampling rate (3D-LTS) | 3D CNN | Efficient spatiotemporal feature extraction utilizing histograms and mean similarity measures. | 
| Jiang [77] | Dual 3D Convolutional Network (down sampling + fine branch + coarse branch + lateral connections) | C3D-like net | Optimal inference speed coupled with robust recognition capabilities. | 
| Zhang [78] | Multipath Attention and Adaptive Gating Network (SDM + MTAM + AGM) | ResNet | Feature extraction adaptable to diverse data types. | 
| Ju [79] | Motion recognition method based on random projection (Lucas–Kanade algorithm + compressed sensing + multichannel 3D convolutional) | AlexNet | Suitable for complex scenes and low power consumption. | 
| Deng [83] | Multi-scale feature fusion model based on 3D convolutional networks | PANet | Capable of extracting more complete semantic information with fewer parameters. | 
| Methodology | Network Framework Model | Backbone | Highlights | 
|---|---|---|---|
| He [85] | Densely connected Bidirectional LSTM (sampling stack + spatial SRL+ temporal SRL) | DenseNet-161 | Effectively handling temporal and spatial relationships in long-tail videos. | 
| Dai [87] | Two-Stream Attention-Based LSTM Networks (optical flow + spatial attention module) | LSTM | Capable of efficiently distinguishing different features. | 
| Li [88] | Transformer-based RGB-D egocentric action recognition framework (inter-frame attention encoder + mutual-attentional fusion block) | ResNet-34 | Capable of recognizing different modality features and distributing temporal information to each modality. | 
| D. Srihari [89] | Four-stream CNN architecture (two spatial RGB-D streams + two motion streams) | CNN | Deeper exploration of depth information combined with cross-fusion with RGB data to improve overall performance. | 
| Ullah [90] | Lightweight deep learning-assisted framework (CNN+MOSSE+DS-GRU) | Darknet-53 | Lightweight network capable of addressing issues during real-time monitoring. | 
| Methodology | Network Framework Model | Backbone | Highlights | 
|---|---|---|---|
| Zhang [130] | Multi-stream LSTM architecture (new smoothed score fusion + different geometric features) | LSTM | Achievable with fewer samples, addressing the issue of limited data. | 
| Feng [131] | Three-layer LSTM | LSTM | Extraction of geometric features beneficial for action recognition | 
| Cui [132] | Skeleton-based attention-aware spatial–temporal model (multi-layer bidirectional LSTM + CRF) | BiLSTM | Capable of distinguishing similar actions effectively. | 
| Yu [135] | Adaptive skeleton-based neural networks (ASRT + C3D-LSTM) | LSTM | Improving the ability to model multiple skeleton representations simultaneously | 
| She [136] | Global context-aware attention spatiotemporal SRU (two layers ST-SRU + global context-aware attention) | ST-LSTM | Achieving a balance between classification speed and accuracy | 
| Wei [139] | LSTM-based Seq2Seq model | LSTM | Network lightweighting, significantly reducing training time | 
| Methodology | Network Framework Model | Backbone | Highlights | 
|---|---|---|---|
| Dang [143] | Deep-wide network (DWnet) | HCN | Effectively capturing representative and discriminative spatiotemporal features. | 
| Guan [144] | Action feature enhance model (AFE-CNN) | CNN | Enhanced key node and critical sequence features. | 
| Banerjee [146] | Four CNNs + Choquet integral fusion | CNN | Multi-stream feature fusion enhances complementarity and overcomes limitations. | 
| Su [148] | Direction-guided two-stream CNN | CNN | Incorporating directional information with translation and rotation to better represent human motion. | 
| Huang [150] | Adaptive inferential framework (AIF-CNN) | CNN | Using joint dependencies to construct pseudo-images and incorporate core information for efficient feature extraction. | 
| Methodology | Network Framework Model | Backbone | Highlights | 
|---|---|---|---|
| Chan [154] | Gated action-specific graph convolutional networks | GCN | Extracting implicit connections and coordinating structural and implicit edges for skeletal recognition. | 
| Shi [155] | Multi-stream attention-enhanced adaptive graph convolutional network | AGCN | Enhancing model flexibility and generalization through data augmentation and adaptive graph topology. | 
| Song [157] | Richly activated GCN | ST-GCN | Using multi-stream mechanism and known joint activation to handle occlusion and jitter. | 
| Shi [158] | Multi-stream fusion graph convolutional network | GCN | Using scale differences as new features to restore intrinsic characteristics more effectively. | 
| Bai [159] | Center-connected graph convolutional network | GCN | Focusing on subtle changes to distinguish similar skeletal data effectively. | 
| Li [160] | Symbiotic model (action-recognition head + motion-prediction head) | Multi-branch multiscale GCN | Multi-scale feature extraction and dual-bone graph complementarity, mutually enhanced by a dual-head mechanism. | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, L.; Lin, Z.; Sun, R.; Wang, A. A Review of State-of-the-Art Methodologies and Applications in Action Recognition. Electronics 2024, 13, 4733. https://doi.org/10.3390/electronics13234733
Zhao L, Lin Z, Sun R, Wang A. A Review of State-of-the-Art Methodologies and Applications in Action Recognition. Electronics. 2024; 13(23):4733. https://doi.org/10.3390/electronics13234733
Chicago/Turabian StyleZhao, Lanfei, Zixiang Lin, Ruiyang Sun, and Aili Wang. 2024. "A Review of State-of-the-Art Methodologies and Applications in Action Recognition" Electronics 13, no. 23: 4733. https://doi.org/10.3390/electronics13234733
APA StyleZhao, L., Lin, Z., Sun, R., & Wang, A. (2024). A Review of State-of-the-Art Methodologies and Applications in Action Recognition. Electronics, 13(23), 4733. https://doi.org/10.3390/electronics13234733
 
        




 
       
       
       
       