Basketball Action Recognition Method of Deep Neural Network Based on Dynamic Residual Attention Mechanism
Abstract
:1. Introduction
- The basketball motion image obtained from the video contains a large amount of noise, in order to achieve a better suppression of the noise in the image, this paper uses a median filter to pre-process the image, so that the extraction of the basketball motion in the video image is the least interference.
- In order to fully extract the effective feature information from the video image basketball, this paper improves the convolution layer to dynamic residual convolution on the basis of the original C3D network.
- In order to efficiently recognize basketball action in video images, the extracted feature information is focused on the important features by improving the attention mechanism and eliminating the unimportant feature information, so as to delete the feature information that is beneficial to the basketball player’s pose recognition and improve the accuracy of basketball pose recognition.
2. Materials and Methods
2.1. C3D Neural Network
2.2. Residual Network
2.3. Attention Mechanism
- (1)
- The spatial attention mechanism acts on the two-dimensional spatial plane and uses the learned attention mask to determine the corresponding attention weights of each element on the feature plane, so as to evaluate the correlation between different spatial positions and the target object and highlight the significant areas in space. The spatial attention mechanism helps the model to search for regions with a high concentration of target objects in the input feature plane and avoids the interference of chaotic background information to a certain extent, which is of great significance for the HAR task.
- (2)
- The channel attention mechanism acts on different convolutional channels of the input features and adaptively adjusts the feature response values of each channel using the learned attention masks. The purpose is to filter out the features that contribute more to the recognition results in the feature grasping process, through which the channel attention is weighted to the different channels of the input features, thus assisting the model to learn more meaningful features.
3. Approach
3.1. Overview
3.2. Dynamic Residual Network
3.3. Improved Attention Mechanism
4. Experiments
4.1. Experimental Data
4.2. Dynamic Residual Network Impact
4.3. Analysis of Experimental Results
4.4. Ablation Study
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ning, X.; Tian, W.; He, F.; Bai, X.; Sun, L.; Li, W. Hyper-sausage coverage function neuron model and learning algorithm for image classification. Pattern Recognit. 2022, 136, 109216. [Google Scholar] [CrossRef]
- Hou, X.; Ji, Q. Research on the Recognition Algorithm of Basketball Technical Action Based on BP Neural System. Sci. Program. 2022, 2022, 7668425. [Google Scholar] [CrossRef]
- Fan, J.; Bi, S.; Xu, R.; Wang, L.; Zhang, L. Hybrid lightweight Deep-learning model for Sensor-fusion basketball Shooting-posture recognition. Measurement 2022, 189, 110595. [Google Scholar] [CrossRef]
- Yuan, B.; Kamruzzaman, M.; Shan, S. Application of motion sensor based on neural network in basketball technology and physical fitness evaluation system. Wirel. Commun. Mob. Comput. 2021, 2021, 5562954. [Google Scholar] [CrossRef]
- Wei, Y.; Jiao, L.; Wang, S.; Bie, R.; Chen, Y.; Liu, D. Sports motion recognition using MCMR features based on interclass symbolic distance. Int. J. Distrib. Sens. Netw. 2016, 12, 7483536. [Google Scholar] [CrossRef] [Green Version]
- Li, G.; Zhang, C. Automatic detection technology of sports athletes based on image recognition technology. EURASIP J. Image Video Process. 2019, 2019, 1–9. [Google Scholar] [CrossRef]
- Wu, G.; He, F.; Zhou, Y.; Jing, Y.; Ning, X.; Wang, C.; Jin, B. ACGAN: Age-compensated makeup transfer based on homologous continuity generative adversarial network model. IET Comput. Vis. 2022. [Google Scholar] [CrossRef]
- Song, Z.; Zhao, X.; Hui, Y.; Jiang, H. Fusing Attention Network based on Dilated Convolution for Super Resolution. IEEE Trans. Cogn. Dev. Syst. 2022. [Google Scholar] [CrossRef]
- Zhao, W.; Wang, S.; Wang, X.; Zhao, Y.; Li, T.; Lin, J.; Wei, J. CZ-Base: A Database for Hand Gesture Recognition in Chinese Zither Intelligence Education. In Proceedings of the International Forum on Digital TV and Wireless Multimedia Communications, Shanghai, China, 2 December 2020; Springer: Singapore, 2020; pp. 282–292. [Google Scholar]
- Qu, W.; Zhu, T.; Liu, J.; Li, J. A time sequence location method of long video violence based on improved C3D network. J. Supercomput. 2022, 78, 19545–19565. [Google Scholar] [CrossRef]
- Zhang, Y.H.; Wen, C.; Zhang, M.; Xie, K.; He, J.B. Fast 3D Visualization of Massive Geological Data Based on Clustering Index Fusion. IEEE Access 2022, 10, 28821–28831. [Google Scholar] [CrossRef]
- Lin, J.; Mou, L.; Zhu, X.X.; Ji, X.; Wang, Z.J. Attention-aware pseudo-3-D convolutional neural network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 7790–7802. [Google Scholar] [CrossRef]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Van Gool, L. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 20–36. [Google Scholar]
- Zhao, Y.; Man, K.L.; Smith, J.; Siddique, K.; Guan, S.-U. Improved two-stream model for human action recognition. EURASIP J. Image Video Process. 2020, 2020, 1–9. [Google Scholar] [CrossRef]
- Fan, Y.; Lu, X.; Li, D.; Liu, Y. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, 12–16 November 2016; pp. 445–450. [Google Scholar]
- Li, Y.; Miao, Q.; Tian, K.; Fan, Y.; Xu, X.; Li, R.; Song, J. Large-scale gesture recognition with a fusion of RGB-D data based on saliency theory and C3D model. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 2956–2964. [Google Scholar] [CrossRef]
- Yang, J.; Wang, F.; Jieru, Y. A review of action recognition based on convolutional neural network. J. Phys. Conf. Series. IOP Publ. 2021, 1827, 012138. [Google Scholar] [CrossRef]
- Xu, H.; Das, A.; Saenko, K. R-c3d: Region convolutional 3d network for temporal activity detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5783–5792. [Google Scholar]
- De Melo, W.C.; Granger, E.; Hadid, A. Combining global and local convolutional 3d networks for detecting depression from facial expressions. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–8. [Google Scholar]
- Donahue, J.; Anne Hendricks, L.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2625–2634. [Google Scholar]
- Tran, D.; Ray, J.; Shou, Z.; Chang, S.F.; Paluri, M. Convnet architecture search for spatiotemporal feature learning. arXiv 2017, arXiv:1708.05038. [Google Scholar]
- Qiu, Z.; Yao, T.; Mei, T. Learning spatio-temporal representation with pseudo-3d residual networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October; pp. 5533–5541.
- Zhou, Y.; Sun, X.; Zha, Z.J.; Zeng, W. Mict: Mixed 3d/2d convolutional tube for human action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 449–458. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 October 2014; pp. 27–42. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1933–1941. [Google Scholar]
- Zhang, B.; Wang, L.; Wang, Z.; Qiao, Y.; Wang, H. Real-time action recognition with enhanced motion vector CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2718–2726. [Google Scholar]
- Yao, G.; Lei, T.; Zhong, J. A review of convolutional-neural-network-based action recognition. Pattern Recognit. Lett. 2019, 118, 14–22. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition. In Proceedings of the Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Su, J.N. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhang, Z.; Lv, Z.; Gan, C.; Zhu, Q. Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions. Neurocomputing 2020, 410, 304–316. [Google Scholar] [CrossRef]
- Zhao, D. Injuries in college basketball sports based on machine learning from the perspective of the integration of sports and medicine. Comput. Intell. Neurosci. 2022, 2022, 1429042. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Chen, Y.; Chakraborty, R.; Yu, S.X. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13–19. [Google Scholar]
Layer | Kernel Name | Kernel Size | Stride | Output Size |
---|---|---|---|---|
Input | - | - | - | [16,112,112,1] |
Conv1 | 64 | [3,3,3] | [2,2,2] | [8,56,56,64] |
Attention1 | 128 | [1,1,1] | [1,1,1] | [8,56,56,128] |
Conv2 | 128 | [3,3,3] | [2,2,2] | [4,28,28,128] |
Pool1 | - | [2,2,2] | [2,2,2] | [4,14,14,128] |
Residual1 | 256 | [3,3,3] | [1,1,1] | [4,14,14,256] |
Attention2 | 256 | [1,1,1] | [1,1,1] | [4,14,14,256] |
Residual2 | 512 | [3,3,3] | [2,2,2] | [2,7,7,512] |
Attention3 | 512 | [1,1,1] | [1,1,1] | [2,7,7,512] |
Residual3 | 512 | [3,3,3] | [1,1,1] | [1,7,7,512] |
Attention4 | 512 | [1,1,1] | [1,1,1] | [1,7,7,512] |
Pool2 | - | [2,2,2] | [2,2,2] | [1,4,4,512] |
FC | 4096 | - | - | 4096 |
Parameter Type | Parameter Values |
---|---|
Batch-size | 64 |
Learning rate | 0.001 |
decay | 0.9 |
Dropout | 0.5 |
epoch | 50 |
classifier | Softmax |
Methods | Accuracy |
---|---|
Traditional C3D Network | 80.3% |
ResC3D Network | 86.5% |
Only dynamic residual networks | 87.6% |
EfficientNet-B0 | 88.3% |
ShuffleNetV2 | 88.9% |
Only the attention mechanism network | 89.4% |
Our proposed method | 97.82% |
Method | Training Time/Min | Accuracy |
---|---|---|
Residual block | 39.4 | 92.45% |
Improve residual block | 36.1 | 97.82% |
Method | Training Time/Min | Accuracy |
---|---|---|
Channel attention | 38.6 | 91.69% |
Spatial attention | 39.1 | 93.58 |
Channel attention + Spatial attention | 36.1 | 97.82% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, J.; Tian, W.; Ding, L. Basketball Action Recognition Method of Deep Neural Network Based on Dynamic Residual Attention Mechanism. Information 2023, 14, 13. https://doi.org/10.3390/info14010013
Xiao J, Tian W, Ding L. Basketball Action Recognition Method of Deep Neural Network Based on Dynamic Residual Attention Mechanism. Information. 2023; 14(1):13. https://doi.org/10.3390/info14010013
Chicago/Turabian StyleXiao, Jiongen, Wenchun Tian, and Liping Ding. 2023. "Basketball Action Recognition Method of Deep Neural Network Based on Dynamic Residual Attention Mechanism" Information 14, no. 1: 13. https://doi.org/10.3390/info14010013
APA StyleXiao, J., Tian, W., & Ding, L. (2023). Basketball Action Recognition Method of Deep Neural Network Based on Dynamic Residual Attention Mechanism. Information, 14(1), 13. https://doi.org/10.3390/info14010013