Implementation of Sequence-Based Classification Methods for Motion Assessment and Recognition in a Traditional Chinese Sport (Baduanjin)

This study aimed to assess the motion accuracy of Baduanjin and recognise the motions of Baduanjin based on sequence-based methods. Motion data of Baduanjin were measured by the inertial sensor measurement system (IMU). Fifty-four participants were recruited to capture motion data. Based on the motion data, various sequence-based methods, namely dynamic time warping (DTW) combined with classifiers, hidden Markov model (HMM), and recurrent neural networks (RNNs), were applied to assess motion accuracy and recognise the motions of Baduanjin. To assess motion accuracy, the scores for motion accuracies from teachers were used as the standard to train the models on the different sequence-based methods. The effectiveness of Baduanjin motion recognition with different sequence-based methods was verified. Among the methods, DTW + k-NN had the highest average accuracy (83.03%) and shortest average processing time (3.810 s) during assessing. In terms of motion reorganisation, three methods (DTW + k-NN, DTW + SVM, and HMM) had the highest accuracies (over 99%), which were not significantly different from each other. However, the processing time of DTW + k-NN was the shortest (3.823 s) compared to the other two methods. The results show that the motions of Baduanjin could be recognised, and the accuracy can be assessed through an appropriate sequence-based method with the motion data captured by IMU.


Introduction
Traditional Chinese sports are an essential part of physical education (PE) in universities in China. The official document issued by the Ministry of Education requires that the PE curriculum of all universities in China include traditional Chinese sports [1]. An official report in 2012 showed that 76.7% of universities in China had chosen Chinese martial arts as part of their PE curriculum. Although traditional Chinese sports have been incorporated into university PE curriculums, there have been some problems in implementation. The most serious problem is the high student-teacher ratio in Chinese universities. In 1999, the government in China decided to increase the number of students in universities, which led to a rapid rise in the number of students [2]. According to the latest official report [3], the number of students in universities in China hit a record high of 40.02 million in 2019. Of this number, 30.31 million were undergraduates, and the average number of students per university was 15,176. The insufficient number of PE teachers in universities in China and the number of students in PE courses exceeding the recommended numbers have resulted in a high student-teacher ratio [4]. Due to the large number of students in Chinese martial arts classes, teachers can neither provide individual guidance to students nor can they correct the errors in the motions of students [5]. Moreover, the Ministry of Education requires that the assessments in PE focus on formative assessment, instead of only using summative assessment [1]. However, PE teachers are unable to conduct formative assessment due to the high student-teacher ratio.
The application of motion capture (MoCap) in sports provides a way to solve the above problems in teaching traditional Chinese sports in PE in Chinese universities. In the past 10 years, researchers have developed various systems for different sports on motion data captured by MoCap [6,7]. These systems measure the motions of users in real-time on MoCap, analyse the captured motion data, detect errors in the motions, and give feedback to the users to help them recognise and correct their motions. In the study by Yamaoka et al. [7], a system-applied MoCap (Kinect device) for frisbee learners was developed. The 3D motion data of learners were captured using the Kinect device while throwing the flying disc. The system checked the positions in pre/exercise/post-exercise and gave information to learners. The results show that this system can effectively improve the motions of learners.
When considering the actual requirements of PE, it was found that not all MoCaps are suitable for PE courses. The photoelectric system (OMS) shows the highest measurement accuracy in MoCap, but the high cost is a barrier for using OMS in PE [8,9]. Electromagnetic systems (EMSs) are susceptible to interference in the electromagnetic environment [10]. The image processing system (IMS) has better accuracy than the EMS and an improved range compared to the OMS [11]. Many studies used a low-cost IMS (e.g., Kinect device) to capture the motion data to analyse motion in PE6 [12]. However, a low-cost IMS has some disadvantages: low accuracy, limited environment adaptability, and limited motion ranges led us to choose the inertial sensor measurement system (IMU) for this study.
Therefore, we aimed to develop a formative assessment system using IMU to assess the motion accuracy of Baduanjin to assist teachers and students identify errors in motions. Baduanjin is a popular traditional sport in China that consists of eight decomposition motions ( Figure 1). The formative assessment system needs to recognise and assess the motions of Baduanjin effectively. The motion sequences in Baduanjin are fixed (from Motion-1 to Motion-8) and cannot be changed. However, some students have problems practising the correct motions sequence while learning it. Therefore, the system needs to have the ability to recognise Baduanjin motions to evaluate whether the student is following the correct motion sequence. Recognising the motions of Baduanjin can be interpreted as a classification problem. During the classroom teaching session, teachers use the traditional manual grading method (vision) to assess the motion accuracy of Baduanjin. Therefore, the assessment of the motion accuracies of the Baduanjin motions is converted into a classification problem [14]. Commonly used classification methods can be divided into sample-based methods and sequence-based methods [14]. Few studies have applied sequence-based methods to assess and recognise motions of Baduanjin and other traditional Chinese sports, such as tai chi. Chen et al. [12] used DTW to assess the motions of tai chi. Sequence-based methods are used in this study, including dynamic time warping (DTW) combined with classifiers, hidden Markov model (HMM) [15], and recurrent neural networks (RNNs) [16,17].

Materials and Methods
This study is comprised of three steps. The first step was to recruit volunteers and use IMU to capture the motion data of Baduanjin. Teachers and students were recruited from a university in Southwest China to participate in the study. The motion data of Baduanjin of all participants were captured using IMU. The second step involved extracting motion data and keyframes. The raw motion data needed to be converted into quaternion format for data analysis and the extracted keyframes, to prevent data redundancy and reduce processing time. The last step was to verify the effectiveness of sequence-based methods for assessing motion accuracy and recognising the motions of Baduanjin. In order to assess the motion accuracy of Baduanjin, Baduanjin experts were invited to score the motion accuracies of the captured motions and then use the scores to train the models for the different sequence-based methods ( Figure 2).

Recruiting Volunteers and Capturing Motion Data
Undergraduate students or teachers with no clinical/mental illnesses or physical disabilities from a university in Southwest China were invited to participate in this study. All participants were informed and understood the purpose and scope of the data in this study and any associated privacy risks, and they accepted the researcher's commitment to protecting their privacy and the security of data. The study was conducted following the Declaration of Helsinki, and the protocol was approved by the University of Malaya Research Ethics Committee (UM.TNC2/UMREC-558).
The Baduanjin motions of the volunteers were captured with the commercial IMU "Perception Neuron 2.0", developed by Noitom [18]. Perception Neuron 2.0 has 17 inertial sensing units with a 3-axis gyroscope, 3-axis accelerometer, and 3-axis magnetometer [19]. Sers et al. verified the effectiveness the commercial IMU in measuring accuracy [20]. The captured motion data were the "output" in the Biovision Hierarchical Structure [21] motion file through the Axis Neuron software developed by Noitom.

Motion Data Conversion and Keyframes Extraction
Fifty-six participants were recruited and divided into two groups. The first group comprised 20 students and a martial arts teacher. The teacher has bachelor and master's degrees in traditional Chinese sports (martial arts) and more than 10 years of experience teaching Baduanjin. The IMU was used to capture the motions of these participants three times. The second group comprised 35 students, and IMU was used to measure the motions of these participants. Students were undergraduates without disabilities and no clinical or mental illnesses. Therefore, for each motion of Baduanjin, 98 "motion data" were captured. The entire dataset of motions included 98 × 8 = 784 motion data (760 motions of students and 24 motions of a teacher).
The BVH file output from Perception Neuron 2.0 used the rotation data on skeleton points represented by Euler angles. The data on Euler angles were converted to quaternions to avoid universal joint locking and singularity in the rotation data represented by Euler angles [22]. The quaternion is a four-dimensional super-complex number representing the three-dimensional vector space on the real number [23]. The symbol used to represent a quaternion is as follows: In Equation (1), w is the scalar component, and x, y, z are the vectors. Set the rotation order of Euler angles as z, y, x, and set the rotation angles around the x, y, and z axes as α, β, γ, then the Euler angles can be converted to quaternions, as follows: cos(γ/2) cos(β/2) cos(α/2) + sin(γ/2) sin(β/2) sin(α/2) cos(γ/2) cos(β/2) sin(α/2) − sin(γ/2) sin(β/2) cos(α/2) cos(γ/2) sin(β/2) cos(α/2) + sin(γ/2) cos(β/2) sin(α/2) sin(γ/2) cos(β/2) sin(α/2) − cos(γ/2) sin(β/2) sin(α/2) The motion data obtained from MoCap for a single motion is several thousand frames. Table 1 shows the mean duration and the mean number of frames of Baduanjin captured in the study. Due to the limited storage space and bandwidth capacity available to users in actual applications, the large amount of collected motion data may limit its application [24]. The keyframe extraction technique, to extract a small number of representative keyframes from long motion sequences, has been widely used in motion analysis. Therefore, this technique was applied to extract keyframes to reduce the amount of motion data, to improve data storage and subsequent data analysis [24,25]. In order to extract the keyframes on the desired compression rate, a method that extracts the keyframes by a preset compression rate on k-means clustering was used in the study [26]. In this algorithm, Step 1 is to preset the value of k, which is the value of the preset keyframes. In this study, the value of k is determined by the preset compression rate of the extracted keyframes, as follows [27]: In Equation (3), k is the value of the preset keyframes, C_rate is the compression rate of the keyframes to be extracted, and N is the number of original frames. On the preset value of k, Step 2 uses k-means to extract k cluster centroids from the dataset of 3D coordinates ([x, y, z]) of the skeleton points of the original frame. In k-means, the distance is the applied Euclidean distance in this study. The skeleton model has 17 skeleton points. Therefore, a cluster centroid is composed of 51 (17 × 3) vectors. Based on these extracted cluster centroids, the keyframes were extracted by calculating the Euclidean distance of skeleton points between the cluster centroids and the original frames. The algorithm to extract the keyframe is shown in Figure 3.  The reconstruction error between the reconstructed frames and the original frames was calculated to evaluate the effects on extracting keyframes in this study [28]. Motion reconstruction is based on reconstructing non-keyframes by interpolating adjacent keyframes, thereby rebuilding the same number of frames as the original frames. The interpolation method is: p 1 and p 2 are the positions of the skeleton points of adjacent keyframes at time t 1 and t 2 , calculation of p t (representing the position of the non-key frame point at time t) as follows [29]: After reconstructing the motion by interpolation, the reconstruction error is calculated as follows: In Equation (5), n represents the number of skeleton points, in the study m = 17; p i 1,j is the coordinate of j skeleton point for i frame in original frames, and p i 2,j is the coordinate of the j skeleton point for the coordinate frames in reconstructed frames. Dis(p i 1 − p i 2 ) represents the calculated distance of the human posture in the i frame between the original frames and reconstructed frames. m 1 is the original frames, m 2 is the reconstruction frames. N represents the number of frames for reconstructed frames or original frames. Error (m 1 , m 2 ) is the reconstructed errors calculated by the distance of human posture.
Five different compression ratios (5%, 10%, 15%, 20%, 25%) are chosen to extract keyframes and calculate the reconstruction errors of the corresponding keyframes under the different compression ratios ( Figure 4). This figure shows the average reconstruction error between the extracted keyframes on the corresponding compression rate and the original frames. The Y-axis represents the reconstruction error, and the X-axis represents the five compression ratios (5%, 10%, 15%, 20%, 25%) to the extracted keyframes. The reconstruction errors corresponding to the eight motions of Baduanjin under different compression ratios are represented by different symbols and line segments (as shown in the legend in the figure).
It can be seen from Figure 4 that as the compression rate decreases, the error of motion reconstruction increases. When the compression ratio increases from 15% to 25%, the reconstruction error does not change much. However, the reconstruction error increases significantly when the compression ratio is between 5% and 15%. Therefore, keyframes on the compression ratio (15%) are extracted to ensure that the compression ratio and reconstruction error are reasonable.

Traditional Manual Assessment of Baduanjin
Motion recognition can be regarded as a classification issue of time-varying data, where the test motion is matched with a pre-calibrated motion representing typical motion [30]. Therefore, assessing motion accuracy is similar to an issue for classifying motions. In this study, two martial arts teachers who have more than 10 years of experience teaching Baduanjin from a university in Southwest China were invited to assess the motion accuracy for each student's motions. According to the grading scale of Baduanjin, the two teachers assess the motion accuracy of the students' motions into three grades: fail, pass, and good. The Kendall correlation coefficient test is used to calculate the correlation between the scores results from two teachers to ensure the consistency of the evaluation. Table 2 shows that the captured motions of students have three different grades (fail, pass, and good) except for Motion-8, with only two grades (good and pass). The two teachers explained that Motion-8 is a simple motion in Baduanjin, which the students easily pass ( Figure 5). The Kendall coefficient values between the scores of the two teachers are higher than 0.7, indicating that the two teachers have a high degree of consistency in assessing motion accuracy in Baduanjin. Based on the motion data and the scores of the two teachers, the three sequence-based methods were applied to assess motion accuracy: dynamic time warping (DTW) combined with classifiers, hidden Markov model (HMM) and recurrent neural network (RNN).

Dynamic Time Warping (DTW) Combined with Classifiers
In previous studies [12,31], one method assessed the motion accuracy on the differences between the motions of the students and the teacher. The duration of the captured motions was different. DTW, the method used for calculating differences for different time series was applied [32]. Although the difference between motions calculated by DTW can be used to recognise motions, it is difficult to classify the motions based on the difference to assess the motion accuracy without classifiers. Therefore, classifiers were used to classify the motions after calculating the difference between the students' motions and the teacher's motions through DTW. The specific steps are as follows:

•
Step 1: to calculate the minimum cumulative distance between corresponding skeleton points between two motions of students and teacher with DTW [13,32]. In order to prevent DTW from mismatching by excessive time warping, the global warping window was limited in this study, which was set to 10% of the entire window range: 0.1 × max(n, m). n and m are, respectively, the frame length of the two motions. Since the warping path of each corresponding skeleton point of the two motions may be different, the average value of the cumulative distance of the warping path was used to indicate the differences between the points: dis(q i stu , q i tea ) = DTW(q i stu , q i tea ) length(path i ) , i = 1, 2, . . . , n In Equation (6), stu represents the students' motions; tea represents the teacher's motions; qi is the vectors of the quaternion of the i skeleton point in the two motions, and the total number of points is n. length(path i ) is the length of the warping path on the i skeleton point.

•
Step 2: because the human posture is composed of multiple skeleton points, the overall difference between the two motions from students and the teacher was calculated as distance: • Step 3: after calculating the difference between the teacher's motion and the student's motion, the classifier is trained with the scores of the two teachers to classify motion-formotion accuracy. A variety of classifiers were selected in the study, namely k-nearest neighbour (k-NN) [33], support vector machine (SVM) [34], naive Bayes (NB) [35], logistic regression [36], decision tree (DT) [37], and artificial neural network (ANN). Due to the diversity of ANNs, two commonly used ANNs: back propagation neural network (BPNN) and radial basis function neural network (RBFNN) [38], were chosen as classifiers ( Figure 6). BPNN is a multi-layer feedforward network trained according to the error backpropagation algorithm. In this study, BPNN was constructed with three layers. The first layer was the input layer. The number of neurons was equivalent to the dimension of the feature vectors; the second layer was the hidden layer, the tangent sigmoid equation was applied as the activation equation: The third layer is the output layer. The RBFNN is a kind of feedforward network trained using a supervised training algorithm, but the calculation and processing time is lower than that of BPNN. The main advantage of the RBFNN is that it has only one hidden layer and uses the radial basis equation as the activation equation [38]. Unlike BPNN, the Gaussian equation is used as the basic equation in RBFNN, as follows:

Hidden Markov Model (HMM)
HMM is a sequence-based method that has been successfully applied in recognising human motion [39,40]. HMM is a double random process consisting of a hidden Markov chain and a set of explicit discrete probabilities or probability density functions [40]. HMM can be expressed as follows: In Equation (8), N presents the set of states in the HMM model, which is the set of possible states of the event. M is the set of possible observation events corresponding to each state, which is the observation set. π is the a priori probability of state occupancy. A is the state transition probability matrix. B is associated with the observations from the various states, which describes the probability of each observation event in each state. HMM can solve three kinds of problems: evaluation, decoding, and learning. In this study, recognising or classifying motions is an evaluation problem [41]. The method supposes the output probability in one observation sequence: O = o 1 , o 2 , ..., o r of a given HMM(λ) is: P(O|λ) . The state with the highest probability in P(O|λ) , the result of recognition or classification. P(O|λ) can be calculated by the forward algorithm, a dynamic programming algorithm that uses a table to store intermediate values when establishing the probability of an observation sequence. α t (j) in the forward trellis represents the forward probability of the given HMM(λ), which the observed sequence o 1 , o 2 , ..., o r at time t and the state is j: α t (j) is calculated by superimposing all possible paths: Initialisation: In Equations (10) and (11), N represents the number of paths from t − 1 to t when the state at time t is j. Then: Corresponding to the forward algorithm is the backward algorithm, which is used to train the model. β t (i) represents the backward probability of the given HMM(λ), which the observed sequence o t+1 , o t+2 , ..., o T at time t and the state is i: Initialisation: Recursion:

Recurrent Neural Network (RNN)
Motion data of humans is a type of sequence data on time. A RNN is a recursive neural network that takes sequence data as input, recurses in the evolution direction of the sequence, and all nodes (recurrent units) are connected in a chain network [42]. Trained by the end-to-end training method, RNN can solve sequence labelling problems with unknown input-output alignment [43]. However, traditional RNNs have the problem of vanishing gradients or exploding gradients. Vanishing gradients cause the parameters to be affected by short-term information, and the earlier information decays with the increase of the time steps. Exploding gradients cause long-term dependence, which refers to the state of the current system that may be affected by the previous state of the system that can lead to information overload [42]. Long short-term memory (LSTM), a type of RNN, is a neural network especially designed to reduce the problem of vanishing or exploding gradients by using gates to obtain relevant information and forget irrelevant information [44]. Therefore, LSTM was used as a method of RNN to assess the motion accuracy in this study. Figure 7 shows the architecture of LSTM applied in the study. In the figure, X t is input; h t is hidden states; Y t is output; U t is the cell state, and S is the activation function. There are three gates in the hidden state of LSTM: forget, input, and output. The equations in Figure 7 are: LSTM remembers and forgets the long sequence input data using the gates. However, LSTM also has problems: whether the gates included in the architecture of LSTM already provide good predictions and whether additional data training is needed to further improve the predictions [45]. Bidirectional LSTM (BiLSTM) is an improvement to this problem of LSTM, which enables additional training by training the input data twice. Based on the traditional LSTM for positive training on a time series from front to back, BiLSTM adds negative training on a time series from back to front [45]. The architecture of BiLSTM is shown in Figure 8.  [46]. Similar to LSTM, GRU is a positive training model based on time series [47]. However, the architecture of the hidden state in GRU is different from that of LSTM. The architecture of the hidden state in GRU applied in this study is shown in Figure 9.
In Figure 9, X t is input; h t is hidden states; Y t is output, and δ is activation function. There are two gates in the hidden state of LSTM. The equations in Figure 9 are: From Figure 7 to Figure 9, the hidden state of LSTM and GRU is similar, in that the output at time t is calculated using the hidden state at time t − 1 and the value of input at time t. In addition, the equation of the forget gate for LSTM is similar to the equation of the reset gate for GRU. The difference is that LSTM has three gates, whereas GRU has two gates. There is no output gate in GRU as in LSTM, which means GRU directly transmitted the memory to the next cell, but LSTM selects whether to transmit memory by the output gate.

Evaluating the Effectiveness of Methods
In this study, cross-validation is applied to evaluate the effectiveness of different sequence-based methods for assessing motion accuracy. In cross-validation, the dataset is divided into a training set and a test set: (1) the training set is used to train the model and (2) the test set is used to test the trained model. The 10-fold cross-validation was used in the study. The confusion matrix and the assessment accuracy rate were used to express the effectiveness of different sequence-based methods. The confusion matrix shows the degree of confusion in the classification between different motions. The assessment accuracy is calculated as: Assessment Accuracy = Number of correctly classified Overall sample size (20)

Results
The assessment of motion accuracy used captured motion data and the traditional manual scoring results from the two teachers. With 10-fold cross-validation, the models on the sequence-based methods were trained. Using the scoring results from Teacher A as a standard, the accuracies of different sequence-based methods (on assessing motion accuracy) are shown in Table 3. Figure 10 shows an example of the confusion matrices for the sequence-based methods for Motion-1.
From Table 3, the results show that none of the methods had the highest accuracy for all eight motions of Baduanjin. The DTW + k-NN method had the highest accuracy in assessing five motions (Motion-1, Motion-2, Motion-5, Motion-7, and Motion-8). The highest accuracy for Motion-3 was the LSTM method, reaching 80.00%, the highest accuracy for Motion-4 was HMM, and BiLSTM with 90.53%. Lastly, the highest accuracy of Motion-6 was LSTM, with 84.21%. The highest average accuracy was the DTW + k-NN with 84.21%. From Figure 10, it can be seen from the confusion matrix of Motion-1 that, for most methods, the errors in classifying occur mainly in pass and fail motions. DTW + SVM and DTW + logistic regression classified all fail motions as pass motions. Table 3. The accuracies of assessing the motion accuracy of different sequence-based methods using the scoring results from Teacher A. When using the scoring results from Teacher B as a standard, the accuracy of different sequence-based methods on assessing motion accuracy is shown in Table 4. The confusion matrices for the sequence-based methods with Motion-1 is used as an example in Figure 11. Table 4. The accuracies of assessing the motion accuracy on different sequence-based methods using the scoring result from Teacher B. From Table 4, the results show that, in assessing motion accuracy, not one method had the highest accuracy for all eight motions of Baduanjin. However, DTW + k-NN had the highest accuracy in assessing five motions (Motion-1, Motion-2, Motion-4, Motion-6, and Motion-7), with its mean accuracy reaching 83.03%, which is the highest compared to other methods. The highest accuracy for Motion-3 was HMM, at 80.00%, and the highest accuracy for motion-5 was DTW + RBFNN at 86.32%. The highest accuracy of Motion-8 was the GRU at 88.42%. On the confusion matrix of Motion-1 (Figure 11), the result is similar to Figure 10 when using Teacher A as the standard. The classification errors occur mainly in pass and fail, especially DTW + SVM and DTW + logistic regression.  The processing time (training and classifying) of each method is presented in Table 5. The results show that the processing time of DTW + k-NN was the shortest (3.810 s). The processing times of the three recurrent neural network methods (LSTM, BiLSTM, and GRU) were much longer, all exceeding 15 s.

Recognising Motions of Baduanjin
For recognising motion with 10-fold cross-validation, the accuracy and processing time of the sequence-based methods are shown in Table 6. The confusion matrices of different sequence-based methods for motion recognition are shown in Figure 12.       The results show that, by using sequence-based methods to recognise the motions of Baduanjin, the DTW method combined with the classifier will improve their accuracy (except for DTW + RBFNN). It can be seen for the classifier k-NN and SVM, the accuracy of both methods exceeded 99.00% (DTW + k-NN reached 99.47% and DTW + SVM reached 99.61%). It was the highest accuracy among all verification methods tested in this paper. However, the accuracy of DTW + RBFNN was not satisfactory, only reaching 75.79%. Besides DTW + BFNN, the processing time of DTW combined with several classifiers was between 3.823 and 10.163 s. The processing time for DTW + k-NN was the minimum of all methods at 3.823 s. The accuracies of HMM and RNN (LSTM, BiLSTM, and GRU) also exceeded 96%. However, the processing times of these four methods were relatively high (between 61.144 and 239.190 s), especially LSTM, BiLSTM, and GRU. These three methods had much higher processing times compared to DTW combined with classifiers. BiLSTM, with the maximum processing time, was many times longer than DTW + k-NN with the minimum processing time.
The Chi-Square Test of the High Accuracy Methods on Recognising Motions From the confusion matrices, it was found that the accuracy of DTW + k-NN, DTW + SVM, and HMM were all higher than 99%. The chi-square test on the number of correct and incorrect judged motions by the three methods were obtained, and the results are shown in Tables 7 and 8 (calculation on SPSS 23.0).  From the results of the chi-square test, it is found that the three methods have no significant difference in the accuracy of recognising motions. Therefore, the effectiveness of the three methods on accuracy can be considered the same.
Second, the incorrect motions recognised by the three methods are not the same. For example, for DTW + SVM, 3 of the 706 motions were misclassified, including Motion-3, Motion-5, and Motion-7. All three motions were incorrectly recognised as Motion-4. However, for HMM, the problem was in recognising the motions in Motion-7 and Motion-8 as other motions.

Discussion
This research used IMU to capture motion data to assess motion accuracy and recognise Baduanjin motions that can be used to assist students in knowing their errors during practice. To assess motion accuracy, previous studies have suggested and verified that motion accuracy could be assessed by comparing the motions to be assessed with the standard motions [12,31]. For example, in a study involving assessing the motion accuracy of tai chi, DTW was used to calculate the differences between the motions of teachers and students to assess the motion accuracies of the students' motions [12]. The correlation between the assessment results of this method and the assessment results between teachers and students reached 80%. This method assesses the motion accuracy by calculating the difference or similarity between the motions, in which this method needs a quantitative assessment for the accuracy of motions. However, in teaching and learning, the traditional assessment for the accuracy of motions is usually a manually graded assessment, a qualitative assessment, such as the existing method in the course of Baduanjin in universities. Therefore, in this study, the motion accuracy of Baduanjin was assessed as a classification problem.
In this study, three types of sequence-based methods were applied, and the traditional manual assessment results of teachers were used as the classification criteria to train the classifiers. It is proven that the results of some classifiers have the ability to approach teacher assessment scoring results. However, none of the classifiers have the highest accuracies in assessing the motion accuracies of all eight motions of Baduanjin. Among the classifiers, with Teacher A as the classification criteria, DTW + k-NN reached the highest accuracies in five motions (Motion-1, Motion-2, Motion-5, Motion-7, and Motion-8). However, there were some differences in the results using Teacher B as the classification criteria. Among the classifiers, DTW + k-NN reached the highest accuracies in five motions (Motion-1, Motion-2, Motion-4, Motion-6, and Motion-7). Although no selected classifier had the best accuracy for all eight motions of Baduanjin, the highest average accuracy was DTW + k-NN, which also had the lowest average processing time among the selected classifiers.
In addition, the motion recognition of Baduanjin was also studied. In actual applications, the motion can be recognised, and the motion accuracy assessed using the corresponding optimal classifier. The results of motion recognition show that the selected classifiers reached high accuracy (all above 90%), except for DTW + RBFNN (75.79%). The accuracy of the three classifiers (DTW + k-NN, DTW + SVM, and HMM) exceeded 99%. From the confusion matrix of the three classifiers and the chi-square test performed thereon, there was no significant difference in the accuracy of the three types of classifiers, indicating that the accuracies of the three classifiers for motion recognition under the existing experimental data were the same. Therefore, as the shortest processing time among the three classifiers, DTW + k-NN is concluded as the best choice for motion recognition of Baduanjin.
There is a limitation in this study: the captured motions used for assessing motion accuracies were relatively small, which could be why the accuracies of the three types of RNN (LSTM, BiLSTM, and GRU) were not high in assessing the motion accuracy. It is expected that by increasing the motion dataset in further research, the accuracy of RNN will be increased [48].

Conclusions
This study shows that, on the motion data captured by IMU, the sequence-based method to classify the Baduanjin motions can assess the motion accuracies of the Baduanjin motions and recognise the motions with high accuracy and short processing times. The methods verified in this study could be used to assess the motions of Baduanjin in PE. Based on the verified sequence-based method in this study, a formative assessment system for Baduanjin in PE can be developed.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the data involves the human body three-dimensional data of the participants.