Intelligent Dance Motion Evaluation: An Evaluation Method Based on Keyframe Acquisition According to Musical Beat Features
Abstract
:1. Introduction
- A beat extraction method considering prior experience is designed. This method first preprocesses the input sound signal and unifies the sampling rate. Then, it extracts features by calculating the short-time Fourier transform (STFT) and mapping it to the Mel frequency scale. Next, it calculates the energy difference in each Mel band and generates the initial intensity envelope, which is then filtered and smoothed to reduce noise. After that, peaks in the envelope are detected as beat points and normalized. Finally, utilizing the prior knowledge of human perception of a tempo of approximately 120 BPM, it optimizes beat detection through a perception-weighted window, thus enhancing accuracy.
- A key action frame extraction and feature quantization method based on beats is designed. Leveraging the high correlation between musical beats and dance movements, the problem of manual keyframe extraction and labeling is transformed into a musical beat extraction problem. Image frames corresponding to each beat’s time sequence node are extracted from dance videos and treated as key action frames. This transforms the continuous action recognition and evaluation problem into a discrete action evaluation problem, effectively reducing the computational complexity. Subsequently, key action features are extracted and standardized using the OpenPOSE-based pose recognition method.
- An intelligent action evaluation method is designed. First, an Action Sequence Evaluation method (ASCS) is constructed based on all action features in a single action (frame) to achieve an accurate evaluation of the overall single action. Then, incorporating contextual features and drawing inspiration from the Rouge-L evaluation method, a similarity measure for action context and rhythm (SMACR) is constructed, focusing on the evaluation of action coherence. Combining the ASCS and SMACR, athletes are comprehensively evaluated from both the static and dynamic perspectives.
2. Related Work
3. Methodology
3.1. Introduction and Motivation
3.2. Overall Framework
3.3. Beat Detection Model
3.4. Onset Strength Envelope Calculation
3.5. Global Tempo Estimation
3.6. Keyframe Extraction
4. Intelligent Dance Motion Evaluation
4.1. Dance Pose Recognition
4.2. Basic Scoring
4.3. Scoring Method Based on Action Contextual Relationships
5. Experimental Results and Discussion
5.1. Dataset
5.2. Music Beat Recognition
5.3. Dance Keyframe Evaluation Based on Musical Beats
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ASCS | Action Sequence Evaluation Method |
BLSTM | Bidirectional Long Short-Term Memory |
BPM | Beats Per Minute |
DNN | Deep Neural Network |
FFT | Fast Fourier Transform |
HAR | Human Action Recognition |
HMM | Hidden Markov Model |
HM-RNN | Hierarchical Multi-Scale RNN |
MFCCs | Mel-Frequency Cepstral Coefficients |
MIMUs | Magnetic Inertial Measurement Units |
MIR | Music Information Retrieval |
NLP | Natural Language Processing |
OE | Onset Envelope |
OMCSs | Optical Motion Capture Systems |
RNNs | Recurrent Neural Networks |
Rouge-L | Rouge-Longest Common Subsequence |
SMACR | Scoring Method Based On Action Contextual |
STFT | Short-Term Fourier Transform |
TPS | Tempo Period Strength |
References
- Rani, C.J.; Devarakonda, N. An effectual classical dance pose estimation and classification system employing Convolution Neural Network –Long Short-Term Memory (CNN-LSTM) network for video sequences. Microprocess. Microsyst. 2022, 95, 104651. [Google Scholar] [CrossRef]
- Mao, R. The Design on Dance Teaching Mode of Personalized and Diversified in the Context of Internet. E3s Web Conf. 2021, 25, 03059. [Google Scholar] [CrossRef]
- Okubo, H.; Hubbard, M. Kinematics of arm joint motions in basketball shooting. Procedia Eng. 2015, 112, 443–448. [Google Scholar] [CrossRef]
- Camurri, A.; Raheb, K.E.; Even-Zohar, O.; Ioannidis, Y.E.; Markatzi, A.; Matos, J.; Morley-Fletcher, E.; Palacio, P.; Romero, M.; Sarti, A.; et al. WhoLoDancE: Towards a methodology for selecting Motion Capture Data across different Dance Learning Practice. In Proceedings of the 3rd International Symposium on Movement and Computing, Thessaloniki, Greece, 5–6 July 2016. [Google Scholar] [CrossRef]
- Okubo, H.; Hubbard, M. Comparison of shooting arm motions in basketball. Procedia Eng. 2016, 147, 133–138. [Google Scholar] [CrossRef]
- Xinjian, W. An empirical study of parameters in different distance standing shots. J. King Saud Univ. Sci. 2022, 34, 102316. [Google Scholar] [CrossRef]
- Svoboda, I.; Bon, I.; Rupčić, T.; Cigrovski, V.; Đurković, T. Defining the Quantitative Criteria for Two Basketball Shooting Techniques. Appl. Sci. 2024, 14, 4460. [Google Scholar] [CrossRef]
- Weitbrecht, M.; Holzgreve, F.; Fraeulin, L. Ergonomic Risk Assessment of Oral and Maxillofacial Surgeons—RULA Applied to Objective Kinematic Data. Hum. Factors 2023, 65, 1655–1673. [Google Scholar] [CrossRef]
- Dellai, J.; Gilles, M.A.; Remy, O.; Claudon, L.; Dietrich, G. Development and Evaluation of a Hybrid Measurement System. Sensors 2024, 24, 2543. [Google Scholar] [CrossRef]
- Rohrbach, M.; Amin, S.; Andriluka, M.; Schiele, B. A database for fine grained activity detection of cooking activities. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1194–1201. [Google Scholar] [CrossRef]
- Li, R.F.; Wang, L.L.; Wang, K. A Survey of Human Body Action Recognition. Pattern Recognit. Artif. Intell. 2014, 27, 35–48. [Google Scholar] [CrossRef]
- Chen, C.; Jafari, R.; Kehtarnavaz, N. A survey of depth and inertial sensor fusion for human action recognition. Multimed. Tools Appl. 2017, 76, 4405–4425. [Google Scholar] [CrossRef]
- Mimouna, A.; Ben; Khalifa, A.; Ben, A.N.E. Human action recognition using triaxial accelerometer data: Selective approach. In Proceedings of the 2018 15th International Multi-Conference on Systems, Signals Devices (SSD), Yasmine Hammamet, Tunisia, 19–22 March 2018; pp. 491–496. [Google Scholar] [CrossRef]
- Yang, D.; Li, Y.; Gao, B.; Woo, W.L.; Zhang, Y.; Kendrick, K.M.; Luo, L. Using Wearable and Structured Emotion-Sensing-Graphs for Assessment of Depressive Symptoms in Patients Undergoing Treatment. IEEE Sens. J. 2024, 24, 3637–3648. [Google Scholar] [CrossRef]
- Wang, X.; Yu, H.; Kold, S.; Rahbek, O.; Bai, S. Wearable sensors for activity monitoring and motion control: A review. Biomimetic Intelligence and Robotics. Biomim. Intell. Robot. 2014, 3, 100089. [Google Scholar] [CrossRef]
- King, P.H. Wearable Sensors: Fundamentals, Implementation and Applications. IEEE Pulse 2021, 12, 30–31. [Google Scholar] [CrossRef]
- Hu, Q.; Qin, L.; Huang, Q.M. A Survey on Visual Human Action Recognition. Chin. J. Comput. 2013, 36, 2512–2524. [Google Scholar] [CrossRef]
- Zelnik-Manor, L.; Irani, M. Event-based analysis of video. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; Volume 2. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3523–3542. [Google Scholar] [CrossRef]
- Scovanner, P.; Ali, S.; Shah, M. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM International Conference on Multimedia, Augsburg, Germany, 23 September 2007; p. 4. [Google Scholar]
- Aoki, T.; Venture, G.; Kulic, D. Segmentation of Human Body Movement Using Inertial Measurement Unit. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Manchester, Manchester, UK, 13–16 October 2013. [Google Scholar] [CrossRef]
- Yang, D.; Gao, B.; Woo, W.L.; Wen, H.; Zhao, Y.; Gao, Z. Wearable structured mental-sensing-graph measurement. IEEE Trans. Instrum. Meas. 2023, 72, 2528112. [Google Scholar] [CrossRef]
- Chéron, G.; Laptev; ISchmid, C. P-CNN: Pose-Based CNN Features for Action Recognition. In Proceedings of the ACM international conference on Multimedia, Santiago, Chile, 7–13 December 2015. [CrossRef]
- Yao, B.; Jiang, X.; Khosla, A.; Lin, A.L.; Guibas, L.; Fei-Fei, L. Human Action Recognition by Learning Bases of Action Attributes and Parts. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Barcelona, Spain, 6–13 November 2011. [Google Scholar] [CrossRef]
- Ting, Z. Research on the Construction of Music Performance Robot Based on Beat Recognition. Autom. Instrum. 2022, 12, 211–221. [Google Scholar] [CrossRef]
- Gao, Y.F. Research on Music Driven Dance Generation Algorithms. Master’s Thesis, Southwest University of Science and Technology, Mianyang, China, 2023. Volume 24. [Google Scholar]
- Jia, B.; Lv, J.; Liu, D. Deep learning-based automatic downbeat tracking: A brief review. Multimed. Syst. 2019, 25, 617–638. [Google Scholar] [CrossRef]
- Chuang, Y.; Su, L. Beat and downbeat tracking of symbolic music data using deep recurrent neural networks. In Proceedings of the 21st International Society for Music Information Retrieval Conference, Montreal, QC, Canada, 11–16 October 2020; pp. 346–352. [Google Scholar]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.W.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and music analysis in Python. SciPy Proc. 2015, 2015. [Google Scholar] [CrossRef]
- Thickstun, J.; Harchaoui, Z.; Foster, D.P.; Kakade, S.M. Invariances and data augmentation for supervised music transcription. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 2241–2245. [Google Scholar] [CrossRef]
- Kay, W.; Carreira, J.; Simonyan, K.; Zisserman, A. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Karpati, F.J.; Giacosa, C.; Foster, N.E.; Penhune, V.B.; Hyde, K.L. Sensorimotor integration is enhanced in dancers and musicians. Exp. Brain Res. 2016, 234, 893–903. [Google Scholar] [CrossRef] [PubMed]
- Aagten-Murphy, D.; Cappagli, G.; Burr, D. Musical training generalises across modalities and reveals efficient and adaptive mechanisms for reproducing temporal intervals. Acta Psychol. 2014, 147, 25–33. [Google Scholar] [CrossRef] [PubMed]
- Ellis, D.P. Beat Tracking by Dynamic Programming. J. New Music Res. 2007, 36, 51–60. [Google Scholar] [CrossRef]
- Moorer, J.A. A Note on the Implementation of Audio Processing by Short-term Fourier Transform. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 15–18 October 2017. [Google Scholar] [CrossRef]
- Vizcaíno-Verdú, A.; De-Casas-Moreno, P.; Tirocchi, S. Online Prosumer Convergence: Listening, Creating and Sharing Music on YouTube and TikTok. Commun. Soc. 2023, 36, 151–166. [Google Scholar] [CrossRef]
- MacDougall, H.G.; Moore, S.T. Marching to the Beat of the Same Drummer: The Spontaneous Tempo of Human Locomotion. J. Appl. Physiol. 2005, 99, 1164–1173. [Google Scholar] [CrossRef]
- Gwenaelle, C.S.; Minho, L. Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System. Math. Probl. Eng. 2020, 2020, 8478527. [Google Scholar] [CrossRef]
- Liu, Y.Z.; Zhang, T.F.; Li, Z.; Deng, L.Q. Deep Learning-based Standardized Evaluation and Human Pose Estimation: A Novel Approach to Motion Perception. Trait. Signal 2023, 40, 2313–2320. [Google Scholar] [CrossRef]
- Qiu, Y.; Wang, J.; Jin, Z.; Chen, H.; Zhang, M.; Guo, L. Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomed. Signal Process. Control 2022, 72, 103323. [Google Scholar] [CrossRef]
- Lin, C.Y. ROUGE: A Package for Automatic Evaluation of summaries. Workshop on Text Summarization Branches Out (WAS 2004). 2004. Available online: https://aclanthology.org/W04-1013 (accessed on 7 September 2023).
- Lin, J.H. “Just Dance”: The Effects of Exergame Feedback and Controller Use on Physical Activity and Psychological Outcomes. Games Health J. 2015, 4, 183–189. [Google Scholar] [CrossRef]
- Babu, P.A.; Nagaraju, V.S.; Vallabhuni, R.R. Speech Emotion Recognition System with Librosa. In Proceedings of the IEEE International Conference on Communication Systems and Network Technologies, Bhopal, India, 18–19 June 2021. [Google Scholar] [CrossRef]
- Schuldt, C.; Laptev, I.; Caputo, B. Recognizing human actions: A local svm approach. In Proceedings of the Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26 August 2004; Volume 3, pp. 32–36. [Google Scholar] [CrossRef]
Sample No. | Title | Artist | Sweat Difficulty | Difficulty |
---|---|---|---|---|
1 | AnythingIDo | CLiQ | 3 | 4 |
2 | BoyWithLuvNextALT | BTS | 3 | 4 |
3 | MORE | K/DA | 3 | 4 |
4 | RatherBe | Clean Bandi | 3 | 3 |
5 | BringMeToLife | Evanescence | 3 | 3 |
6 | LoveMeLand | Zara Larsson | 2 | 3 |
7 | Telephone | Lady Gaga | 2 | 2 |
8 | SweetButPsycho | Ava Max | 2 | 2 |
9 | AsItWas | Harry Styles | 2 | 2 |
10 | SissyThatWalk | RuPaul | 2 | 1 |
11 | Magic | Kylie Minogue | 1 | 1 |
12 | BeNice | The Sunlight Shakers | 2 | 1 |
Sample No. | All Frames | Keyframes | ||||||
---|---|---|---|---|---|---|---|---|
ST-AMCNN | ST-AMCNN* | ASCS | SMACR | ST-AMCNN | ST-AMCNN * | ASCS | SMACR | |
1 | 96.99 | 68.87 | 93.25 | 63.75 | 98.77 | 71.5 | 94.73 | 59.16 |
2 | 89.63 | 55.97 | 90.11 | 56.3 | 92.14 | 59.13 | 92.69 | 57.65 |
3 | 95.39 | 56.86 | 93.39 | 54.01 | 97.33 | 59.71 | 96.17 | 57.03 |
4 | 88.19 | 55.16 | 87.45 | 55.07 | 90.16 | 58.17 | 89.83 | 73.36 |
5 | 91.84 | 71.87 | 91.75 | 70.33 | 94.63 | 75.47 | 93.62 | 76.49 |
6 | 91.44 | 73.87 | 90.65 | 73.9 | 93.7 | 76.98 | 92.77 | 59.13 |
7 | 95.63 | 61.01 | 93.44 | 56.84 | 97.41 | 64.43 | 94.96 | 67.78 |
8 | 89.69 | 68.08 | 90 | 64.43 | 92.63 | 70.08 | 92.6 | 69.43 |
9 | 94.9 | 69.53 | 95.08 | 66.17 | 97.52 | 73.03 | 96.62 | 72.57 |
10 | 95.83 | 73.26 | 92.09 | 69.04 | 97.36 | 76.6 | 93.97 | 62.51 |
11 | 93.98 | 63.85 | 90.96 | 60.14 | 96.89 | 66.59 | 92.52 | 71.29 |
12 | 93.12 | 67.87 | 91.08 | 69.35 | 94.62 | 71.36 | 92.98 | 66.3 |
AVG. | 93.05 | 65.52 | 91.6 | 63.28 | 95.26 | 68.59 | 93.62 | 66.06 |
Methods | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | AVG |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ST-AMCNN | 10.2 | 11.5 | 10 | 10.4 | 12.7 | 12.9 | 11.4 | 9.36 | 13.4 | 10.3 | 12.9 | 11.2 | 11.3549 |
ASCS | 11.3 | 11.7 | 10.8 | 11.8 | 14 | 13.9 | 12.5 | 11.1 | 15.2 | 12 | 13.7 | 12.8 | 12.5735 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; Huang, X. Intelligent Dance Motion Evaluation: An Evaluation Method Based on Keyframe Acquisition According to Musical Beat Features. Sensors 2024, 24, 6278. https://doi.org/10.3390/s24196278
Li H, Huang X. Intelligent Dance Motion Evaluation: An Evaluation Method Based on Keyframe Acquisition According to Musical Beat Features. Sensors. 2024; 24(19):6278. https://doi.org/10.3390/s24196278
Chicago/Turabian StyleLi, Hengzi, and Xingli Huang. 2024. "Intelligent Dance Motion Evaluation: An Evaluation Method Based on Keyframe Acquisition According to Musical Beat Features" Sensors 24, no. 19: 6278. https://doi.org/10.3390/s24196278
APA StyleLi, H., & Huang, X. (2024). Intelligent Dance Motion Evaluation: An Evaluation Method Based on Keyframe Acquisition According to Musical Beat Features. Sensors, 24(19), 6278. https://doi.org/10.3390/s24196278