Video Human Action Recognition Based on Motion-Tempo Learning and Feedback Attention
Abstract
:1. Introduction
- To capture sub-action motion tempos that characterize the dynamic motion information at different temporal granularities, we propose a Multi-Granularity Adaptive Fusion Module (MgAFM). This module captures sub-action temporal information at various motion tempos and adaptively fuses it with the original RGB features for subsequent representation learning.
- The Feedback Attention-Guided Module (FAGM) is proposed, which utilizes high-level features containing action semantics and action category-specific contextual information to guide low-level features in a feedback manner, focusing on important spatio-temporal and channel-wise information for distinguishing category-specific sub-actions.
- Extensive experiments on Something-Something V1, Something-Something V2, and Kinetics-400 datasets achieve top-1 accuracies of 52.4%, 63.3%, and 76.9% respectively, demonstrating the effectiveness of the proposed model.
2. Related Work on Video Action Recognition Methods
2.1. Methods Based on CNNs
2.2. Methods for Capturing Motion Tempos of Sub-Action
2.3. Video Action Recognition Method Based on Attention Mechanism
3. The Proposed Critical Action Information Extraction Network
3.1. Multi-Granularity Adaptive Fusion Module (MgAFM)
3.1.1. Motion-Extraction Module (MEM)
3.1.2. Adaptive Fusion Module (AFM)
3.2. Feedback Attention-Guided Module (FAGM)
3.2.1. Temporal Attention Module (TAM)
3.2.2. Channel Attention-Guided Module (CAGM)
3.2.3. Spatial Attention-Guided Module (SAGM)
4. Experiments
4.1. Datasets and Experiment Setup
4.1.1. Datasets
4.1.2. Experiment Setup
4.2. Ablation Study
4.2.1. The Ablation of FAGM and MgAFM
4.2.2. The Impact of Motion Tempos at Different Temporal Granularities
4.2.3. The Impact of Feedback Iteration Times
4.3. Comparison with the State of the Art
4.3.1. Something-Something V1
4.3.2. Something-Something V2
4.3.3. Kinetics-400
4.4. Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Karim, M.; Khalid, S.; Aleryani, A.; Khan, J.; Ullah, I.; Ali, Z. Human Action Recognition Systems: A Review of the Trends and State-of-the-Art. IEEE Access 2024, 12, 36372–36390. [Google Scholar] [CrossRef]
- Ji, Y.; Yang, Y.; Shen, F.; Shen, H.T.; Li, X. A Survey of Human Action Analysis in HRI Applications. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 2114–2128. [Google Scholar] [CrossRef]
- Sun, Z.; Ke, Q.; Rahmani, H.; Bennamoun, M.; Wang, G.; Liu, J. Human Action Recognition From Various Data Modalities: A Review. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3200–3225. [Google Scholar] [CrossRef]
- Wang, L.; Huynh, D.Q.; Koniusz, P. A Comparative Review of Recent Kinect-Based Action Recognition Algorithms. IEEE Trans. Image Process. 2019, 29, 15–28. [Google Scholar] [CrossRef]
- Liu, S.; Ma, X. Attention-Driven Appearance-Motion Fusion Network for Action Recognition. IEEE Trans. Multimed. 2022, 25, 2573–2584. [Google Scholar] [CrossRef]
- Jaiswal, S.; Fernando, B.; Tan, C. TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 259–276. [Google Scholar]
- Lin, J.; Gan, C.; Han, S. TSM: Temporal Shift Module for Efficient Video Understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7083–7093. [Google Scholar]
- Du, T.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Santiago, Chile, 11–18 December 2015; pp. 4489–4497. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6299–6308. [Google Scholar]
- Zhou, Y.; Sun, X.; Luo, C.; Zha, Z.-J.; Zeng, W. Spatiotemporal Fusion in 3D CNNs: A Probabilistic View. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 9826–9835. [Google Scholar]
- Liu, Y.; Yuan, J.; Tu, Z. Motion-Driven Visual Tempo Learning for Video-Based Action Recognition. IEEE Trans. Image Process. 2022, 31, 4104–4116. [Google Scholar] [CrossRef] [PubMed]
- Jiang, B.; Wang, M.; Gan, W.; Wu, W.; Yan, J. STM: SpatioTemporal and Motion Encoding for Action Recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2000–2009. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsevc, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 2014, 1, 568–576. [Google Scholar]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Van Gool, L. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 September 2016; pp. 20–36. [Google Scholar]
- Kwon, H.; Kim, M.; Kwak, S.; Cho, M. MotionSqueeze: Neural Motion Feature Learning for Video Understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 345–362. [Google Scholar]
- Li, Y.; Ji, B.; Shi, X.; Zhang, J.; Kang, B.; Wang, L. TEA: Temporal Excitation and Aggregation for Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 909–918. [Google Scholar]
- Wang, Z.; She, Q.; Smolic, A. ACTION-Net: Multipath Excitation for Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 13214–13223. [Google Scholar]
- Ramachandran, P.; Parmar, N.; Vaswani, A.; Bello, I.; Levskaya, A.; Shlens, J. Stand-alone self-attention in vision models. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Wang, Z.; Liu, Z.; Li, G.; Wang, Y.; Zhang, T.; Xu, L.; Wang, J. Spatio-Temporal Self-Attention Network for Video Saliency Prediction. IEEE Trans. Multimed. 2021, 25, 1161–1174. [Google Scholar] [CrossRef]
- Nasaoui, H.; Bellamine, I.; Silkan, H. Improving Human Action Recognition in Videos with Two-Stream and Self-Attention Module. In Proceedings of the 7th IEEE Congress on Information Science and Technology, Agadir, Morocco, 16–22 December 2023; pp. 215–220. [Google Scholar]
- Gilbert, C.D.; Li, W. Top-down influences on visual processing. Nat. Rev. Neurosci. 2013, 14, 350–363. [Google Scholar] [CrossRef] [PubMed]
- Kok, P.; Bains, L.J.; van Mourik, T.; Norris, D.G.; de lange, F.P. Selective Activation of the Deep Layers of the Human Primary Visual Cortex by Top-Down Feedback. Curr. Biol. 2016, 26, 371–376. [Google Scholar] [CrossRef] [PubMed]
- Kreiman, G.; Serre, T. Beyond the feedforward sweep: Feedback computations in the visual cortex. Ann. N. Y. Acad. Sci. 2020, 1464, 222–241. [Google Scholar] [CrossRef] [PubMed]
- Hochstein, S.; Ahissar, M. View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron 2002, 36, 791–804. [Google Scholar] [CrossRef] [PubMed]
- Qiao, S.; Chen, L.C.; Yuille, A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 10213–10224. [Google Scholar]
- Qiu, Z.; Yao, T.; Mei, T. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5534–5542. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. SlowFast Networks for Video Recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6202–6211. [Google Scholar]
- Chen, Y.; Ge, H.; Liu, Y.; Cai, X.; Sun, L. AGPN: Action Granularity Pyramid Network for Video Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 3912–3923. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Lee, J.; Kim, D.; Ponce, J.; Ham, B. SFNet: Learning Object-aware Semantic Correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2278–2287. [Google Scholar]
- Goyal, R.; Kahou, S.E.; Michalski, V.; Materzynska, J.; Westphal, S.; Kim, H.; Haenel, V.; Fruend, I.; Yianilos, P.; Mueller-Freitag, M.; et al. The “something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5842–5850. [Google Scholar]
- Jia, D.; Wei, D.; Socher, R.; Li-Jia, L.; Kai, L.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Wang, B.; Chang, F.; Liu, C.; Wang, W.; Ma, R. An efficient motion visual learning method for video action recognition. Expert Syst. Appl. 2024, 255, 124596. [Google Scholar] [CrossRef]
- He, K.; Girshick, R.; Dollár, P. Rethinking imagenet pre-training. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4918–4927. [Google Scholar]
- Zolfaghari, M.; Singh, K.; Brox, T. ECO: Efficient Convolutional Network for Online Video Understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 695–712. [Google Scholar]
- Sheng, X.; Li, K.; Shen, Z.; Xiao, G. A Progressive Difference Method for Capturing Visual Tempos on Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 977–987. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, L.; Wu, W.; Qian, C.; Lu, T. TAM: Temporal Adaptive Module for Video Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 13688–13698. [Google Scholar]
- Liu, Z.; Luo, D.; Wang, Y.; Wang, L.; Tai, Y.; Wang, C.; Li, J.; Huang, F.; Lu, T. TEINet: Towards an Efficient Architecture for Video Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11669–11676. [Google Scholar]
- Li, X.; Shuai, B.; Tighe, J. Directional Temporal Modeling for Action Recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 275–291. [Google Scholar]
- Fan, L.; Buch, S.; Wang, G.; Cao, R.; Zhu, Y.; Niebles, J.C.; Li, F.-F. RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 505–521. [Google Scholar]
- Sudhakaran, S.; Escalera, S.; Lanz, O. Gate-Shift-Fuse for Video Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10913–10928. [Google Scholar] [CrossRef] [PubMed]
- Wu, W.; He, D.; Lin, T.; Li, F.; Gan, C.; Ding, E. Mvfnet: Multi-view fusion network for efficient video recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 2943–2951. [Google Scholar]
- Xie, Z.; Chen, J.; Wu, K.; Guo, D.; Hong, R. Global Temporal Difference Network for Action Recognition. IEEE Trans. Multimed. 2022, 25, 7594–7606. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Duan, H.; Zhao, Y.; Chen, K.; Lin, D.; Dai, B. Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2969–2978. [Google Scholar]
FAGM | MgAFM | Top-1 (%) | |
---|---|---|---|
45.6 | Baseline (TSM) | ||
✓ | 49.6 | +4.0 | |
✓ | 51.3 | +5.7 | |
✓ | ✓ | 52.4 | +6.8 |
Branch | Temporal Granularities | Top-1 (%) |
---|---|---|
1 | 0 | 51.5 |
2 | 0, 1 | 51.7 |
3 | 0, 1, 2 | 52.3 |
4 | 0, 1, 2, 3 | 52.4 |
5 | 0, 1, 2, 3, 4 | 52.3 |
6 | 0, 1, 2, 3, 4, 5 | 52.1 |
7 | 0, 1, 2, 3, 4, 5, 6 | 51.9 |
Iteration Times | Top-1 (%) |
---|---|
0 | 51.3 |
1 | 51.8 |
2 | 52.1 |
3 | 52.4 |
4 | 52.2 |
5 | 51.6 |
6 | 49.9 |
7 | 49.0 |
Method | Backbone | Pre-Trained | Frame × Clip × Scrops | Params | Top-1 (%) | Top-5 (%) |
---|---|---|---|---|---|---|
3DCNN: | ||||||
ECOEnLite [37] | BNIncep+3D ResNet18 | Kinetics-400 | 92 × 1 × 1 | 150 M | 46.4 | - |
I3D from [9] | ResNet50 | ImageNet+ Kinetics-400 | 32 × 2 × 1 | 28.0 M | 41.6 | 72.2 |
NL-I3D from [9] | ResNet50 | ImageNet+ Kinetics-400 | 32 × 2 × 1 | 35.3 M | 44.4 | 76.0 |
NL-I3D + GCN [9] | ResNet50 | ImageNet+ Kinetics-400 | 32 × 2 × 1 | 62.2 M | 46.1 | 76.8 |
3D DenseNet-121 [10] | 3D DenseNet-121 | ImageNet | 16 × 1 × 1 | 21.4 M | 50.2 | 78.9 |
2DCNN: | ||||||
TSN [15] | BNInception | ImageNet | 8 × 1 × 1 | 10.7 M | 19.5 | - |
TSM [7] | ResNet50 | ImageNet | 8 × 1 × 1 | 24.3 M | 45.6 | 74.2 |
TSM [7] | ResNet50 | ImageNet | 16 × 1 × 1 | 24.3 M | 47.2 | 77.1 |
TANet [39] | ResNet50 | ImageNet | 16 × 1 × 1 | 25.6 M | 47.6 | 77.7 |
TCM [11] | ResNet50 | ImageNet | 8 × 1 × 1 | 24.5 M | 52.2 | 80.4 |
MSNet [16] | ResNet50 | ImageNet | 8 × 1 × 1 | 24.6 M | 50.9 | 80.3 |
AGPN [30] | ResNet50 | ImageNet | 8 × 1 × 1 | 27.6 M | 51.6 | 80.9 |
TEA [17] | ResNet50 | ImageNet | 16 × 1 × 1 | 24.5 M | 51.9 | 80.3 |
MDAF [35] | ResNet50 | ImageNet | 8 × 1 × 1 | 24.5 M | 49.1 | 78.0 |
MT-Net [38] | ResNet50 | ImageNet | 16 × 1 × 1 | - | 53.4 | 81.3 |
SMEN (ours) | ResNet50 | ImageNet | 8 × 1 × 1 | 26.1 M | 52.4 | 80.6 |
SMEN (ours) | ResNet50 | ImageNet | 16 × 1 × 1 | 26.1 M | 54.4 | 82.7 |
Method | Backbone | Pre-Trained | Frame × Clips × Crops | Params | Top-1 (%) | Top-5 (%) |
---|---|---|---|---|---|---|
3DCNN: | ||||||
3D DenseNet-121 [10] | 3D DenseNet121 | ImageNet | 16 × 1 × 1 | 21.4 M | 62.9 | 88.0 |
CIDC [41] | ResNet50 | ImageNet | 32 × 1 × 1 | 87 M | 56.3 | 83.7 |
RubiksNet [42] | ResNet50 | ImageNet | 8 × 1 × 1 | - | 58.8 | 85.6 |
2DCNN: | ||||||
TSN [15] | BNInception | ImageNet | 8 × 1 × 1 | 10.7 M | 33.4 | - |
TSM [7] | ResNet50 | ImageNet | 8 × 1 × 1 | 24.3 M | 58.8 | 85.4 |
TSM [7] | ResNet50 | ImageNet | 16 × 1 × 1 | 24.3 M | 63.4 | 88.5 |
STM [12] | ResNet50 | ImageNet | 16 × 1 × 1 | 35.3 M | 64.2 | 89.8 |
MSNet [16] | ResNet50 | ImageNet | 8 × 1 × 1 | 24.6 M | 63.0 | 88.4 |
AGPN [30] | ResNet50 | ImageNet | 8 × 1 × 1 | 27.6 M | 63.1 | 88.6 |
ACTION-Net [18] | ResNet50 | ImageNet | 16 × 1 × 1 | 28.1 M | 64.0 | 89.3 |
GSFNet [43] | ResNet50 | ImageNet | 8 × 1 × 1 | 24.0 M | 62.1 | - |
MDAF [35] | ResNet50 | ImageNet | 8 × 1 × 1 | 24.5 M | 62.3 | 87.8 |
TEINet [40] | ResNet50 | ImageNet | 16 × 1 × 1 | 30.4 M | 61.3 | - |
SMEN (ours) | ResNet50 | ImageNet | 8 × 1 × 1 | 26.1 M | 63.3 | 88.8 |
SMEN (ours) | ResNet50 | ImageNet | 16 × 1 × 1 | 26.1 M | 65.0 | 89.5 |
Method | Backbone | Pre-Trained | Frame × Clip × Crops | Params | Top-1 (%) | Top-5 (%) |
---|---|---|---|---|---|---|
3DCNN: | ||||||
ECO [37] | BNIncep+3D-ResNet-18 | Scratch | 92 × 1 × 1 | 150 M | 70.0 | 89.4 |
I3D [9] | Inception | ImageNet | 64 × N/A × N/A | 28.0 M | 72.1 | 90.3 |
SlowFast [29] | ResNet50 | Scratch | (4 + 32) × 1 × 3 | - | 75.6 | 92.1 |
2DCNN: | ||||||
TSN [15] | BNInception | ImageNet | 25 × 1 × 10 | 10.7 M | 72.5 | 90.2 |
TSM [7] | ResNet50 | ImageNet | 8 × 10 × 3 | 24.3 M | 74.1 | 91.2 |
TCM [11] | ResNet50 | ImageNet | 8 × 10 × 3 | 24.5M | 76.1 | 92.3 |
STM [12] | ResNet50 | ImageNet | 16 × 10 × 3 | 24.0 M | 73.7 | 91.6 |
MSNet [16] | ResNet50 | ImageNet | 8 × 10 × 3 | 24.5 M | 75.0 | - |
GSFNet [43] | ResNet50 | ImageNet | 8 × 10 × 3 | 24.0M | 74.8 | - |
MDAF [35] | ResNet50 | ImageNet | 8 × 10 × 3 | 24.5 M | 76.2 | 92.0 |
TEINet [40] | ResNet50 | ImageNet | 8 × 10 × 3 | 30.4 M | 74.9 | 91.9 |
MVFNet [44] | ResNet50 | ImageNet | 8 × 10 × 3 | 24.3 M | 76.0 | 92.4 |
GTDNet [45] | ResNet50 | ImageNet | 8 × 10 × 3 | 33.9 M | 75.5 | 92.5 |
GTDNet [45] | ResNet50 | ImageNet | 16 × 10 × 3 | 33.9 M | 76.8 | 92.3 |
SMEN (ours) | ResNet50 | ImageNet | 8 × 10 × 3 | 26.1 M | 76.9 | 92.8 |
SMEN (ours) | ResNet50 | ImageNet | 16 × 10 × 3 | 26.1 M | 77.9 | 93.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Liang, C.; Jiang, S.; Zhu, P. Video Human Action Recognition Based on Motion-Tempo Learning and Feedback Attention. Appl. Sci. 2025, 15, 4186. https://doi.org/10.3390/app15084186
Liu Y, Liang C, Jiang S, Zhu P. Video Human Action Recognition Based on Motion-Tempo Learning and Feedback Attention. Applied Sciences. 2025; 15(8):4186. https://doi.org/10.3390/app15084186
Chicago/Turabian StyleLiu, Yalong, Chengwu Liang, Songqi Jiang, and Peiwang Zhu. 2025. "Video Human Action Recognition Based on Motion-Tempo Learning and Feedback Attention" Applied Sciences 15, no. 8: 4186. https://doi.org/10.3390/app15084186
APA StyleLiu, Y., Liang, C., Jiang, S., & Zhu, P. (2025). Video Human Action Recognition Based on Motion-Tempo Learning and Feedback Attention. Applied Sciences, 15(8), 4186. https://doi.org/10.3390/app15084186