Design of Ensemble Stacked Auto-Encoder for Classification of Horse Gaits with MEMS Inertial Sensor Technology

This paper discusses the classification of horse gaits for self-coaching using an ensemble stacked auto-encoder (ESAE) based on wavelet packets from the motion data of the horse rider. For this purpose, we built an ESAE and used probability values at the end of the softmax classifier. First, we initialized variables such as hidden nodes, weight, and max epoch using the options of the auto-encoder (AE). Second, the ESAE model is trained by feedforward, back propagation, and gradient calculation. Next, the parameters are updated by a gradient descent mechanism as new parameters. Finally, once the error value is satisfied, the algorithm terminates. The experiments were performed to classify horse gaits for self-coaching. We constructed the motion data of a horse rider. For the experiment, an expert horse rider of the national team wore a suit containing 16 inertial sensors based on a wireless network. To improve and quantify the performance of the classification, we used three methods (wavelet packet, statistical value, and ensemble model), as well as cross entropy with mean squared error. The experimental results revealed that the proposed method showed good performance when compared with conventional algorithms such as the support vector machine (SVM).


Introduction
Riding is an action that includes horse riding or modern equestrian dressage. There are various kinds of horse riding styles, such as show jumping, horse therapy, and so forth. Normally, horse riding requires the skills taught by the coach. However, with the development of technology, motion capture technology has developed and might replace the coach's role. Motion capture technology is largely divided into acoustical, mechanical, magnetic, and optical sensor. Speaking of their disadvantages, it is difficult for us to collect precise motion using an acoustical sensor, and their movement is restricted because the mechanical type has to wear heavy equipment. Afterwards, optical equipment requires expensive equipment and has a large influence on ambient lighting. Finally, sensors based on magnetic sensors are also sensitive to iron, but horse riding is not closely related to iron. For that reason, magnetic sensors were used.
Normally, horse riding is not taught one-on-one. That is to say, compared with other popularization sports (swimming, badminton, tennis, health), it is very costly. Therefore, horse-riding teaching is acknowledged as an aristocratic sport. For the sake of reducing the education cost of horse riding, a self-coaching system has been designed. In order to do self-coaching system research, the gaits of a horse are classified into different parts: walk, sitting trot, rising trot, and canter. Posture coaching to detect false motion for each horse gaits. This paper also aims to provide real-time horse-riding coaching by providing feedback to the user about posture. The rider's posture is different for each of the four types of horse gaits. For this reason, the motion of an expert can serve as an example for a beginner. This paper is organized as follows. In Section 2, we describe related research on AEs and SAEs. Algorithms are described in terms of mathematical theorems and concepts. Section 3 describes the proposed method. We describe a feature extraction method using a wavelet packet, and the methods applying five statistical values (maximum value, minimum value, average, variance, and standard deviation) are described. Section 4 describes the DB construction method and the outline of the experiment. In addition, it describes how to build a horse-riding DB. Experiments using the proposed method and comparison algorithm are shown. Section 5 summarizes the conclusions and future challenges.

Auto-Encoder
AE belongs to one of the unsupervised learning algorithms. It is a neural network that aims to produce an output of X' similar to the input data of X. In other words, AEs are composed of encoding (compression) and decoding (recovery), so data reconstruction is the purpose of AEs. Figure 1 shows the simple structure of the AE.
Micromachines 2018, 9, x FOR PEER REVIEW 3 of 17 types of horse gaits. For this reason, the motion of an expert can serve as an example for a beginner. This paper is organized as follows. In Section 2, we describe related research on AEs and SAEs. Algorithms are described in terms of mathematical theorems and concepts. Section 3 describes the proposed method. We describe a feature extraction method using a wavelet packet, and the methods applying five statistical values (maximum value, minimum value, average, variance, and standard deviation) are described. Section 4 describes the DB construction method and the outline of the experiment. In addition, it describes how to build a horse-riding DB. Experiments using the proposed method and comparison algorithm are shown. Section 5 summarizes the conclusions and future challenges.

Auto-Encoder
AE belongs to one of the unsupervised learning algorithms. It is a neural network that aims to produce an output of X' similar to the input data of X. In other words, AEs are composed of encoding (compression) and decoding (recovery), so data reconstruction is the purpose of AEs. Figure 1 shows the simple structure of the AE. Generally, the encoding process is designed using fewer nodes than the number of nodes in the previous step. For example, if the number of hidden layers of the first AE is set to 40, the number of second AE is set to 30 less than 40. Through the decoding process, the number of hidden layers becomes 40, and the sizes of input data and output data become equal. Figure 2 shows the structure of AE with softmax classifier. Table 1 shows symbols used in defining the learning operation.  Generally, the encoding process is designed using fewer nodes than the number of nodes in the previous step. For example, if the number of hidden layers of the first AE is set to 40, the number of second AE is set to 30 less than 40. Through the decoding process, the number of hidden layers becomes 40, and the sizes of input data and output data become equal. Figure 2 shows the structure of AE with softmax classifier. Table 1 shows symbols used in defining the learning operation.   The j-th weight of the i-th output layer neuron z nj The output value of the j-th hidden layer neuron for the n-th learning vector η Learning rate x ni The output value of the i-th output layer neuron for the n-th learning vector x ni The i-th element of the n-th learning vector θ ij Bias of the j-th hidden layer neuron w ji The j-th weight of the j-th hidden layer neuron b i Bias of i-th output layer neuron AE activates the unit z of the hidden layer in the same way as the multi-layer perceptron (weighted sum). z is described as Equation (1).
σ is an active function, mainly a sigmoid function and ReLu function. Next, the auto-encoder decoding step is a result of projecting the new weight value and summing the bias values with z obtained in Equation (1). X' is defined as Equation (2).
The difference between the input data X and the output data X' is minimized through the objective function L.
The objective function L is a cross entropy, and it is defined as shown in Equation (4).
Partial differentiation of L into weights is shown in Equation (5) using chain rules. Figure 3 shows process of training AE.

Symbol Definition Symbol Definition
Size of input layer (= output) Size of hidden layer The j-th weight of the i-th output layer neuron The output value of the j-th hidden layer neuron for the n-th learning vector Learning rate ′ The output value of the i-th output layer neuron for the n-th learning vector The i-th element of the n-th learning vector Bias of the j-th hidden layer neuron The j-th weight of the j-th hidden layer neuron Bias of i-th output layer neuron AE activates the unit z of the hidden layer in the same way as the multi-layer perceptron (weighted sum). z is described as Equation (1). (1) is an active function, mainly a sigmoid function and ReLu function. Next, the auto-encoder decoding step is a result of projecting the new weight value and summing the bias values with z obtained in Equation (1). X' is defined as Equation (2).
The difference between the input data X and the output data X' is minimized through the objective function L.
The objective function L is a cross entropy, and it is defined as shown in Equation (4).
Partial differentiation of L into weights is shown in Equation (5) using chain rules.
Equation (7) is a learning operation of the hidden layer weight, and the weight of hidden layer means the connection strength between the input and the hidden layer. v ij (t + 1) = v ij (t) + η X ni + x ni 1 − x ni x ni z nj (6) Micromachines 2018, 9, 411 5 of 17 Equation (7) is a learning operation of the hidden layer weight, and the weight of hidden layer means the connection strength between the input and the hidden layer.
The bias learning operation of the output layer neuron is shown in Equation (9).
Finally, the bias learning operation of the hidden layer neuron is shown in Equation (10).

Pseudocode of Auto-Encoder
Procedure

Stacked Auto-Encoder
The stacked auto-encoder (SAE) is a neural network consisting of multiple layers of AE, in which the outputs of each layer are connected to the inputs of the successive layer. SAE requires an enormous amount of computation and risks falling into the local minimum when learning the weights as the number of layers and nodes increases. There is also a problem of vanishing gradient (VG), in which weights are gradually reduced in the process of updating small values continuously. Therefore, a designer can build a network by stacking AEs according to performance.
The generated network can extract important features from the input data. The parameters of each layer node compare the output and the input data in the output layer through the hidden layer. It can be found that the parameters are determined according to the output data; also, the input data runs in the same way. Once the parameters of a hidden layer are determined, the output layer is removed, and the output of the trained hidden layer is used as input data to design another AE that has a hidden layer and an output layer. An SAE is not a deep generative model. The reason is that RBM depends on probability, and it anticipates test data with input data. However, an SAE trains the model in a deterministic manner. It trains h = s(Wx + b), not p(h = 0, 1) = s(Wx + b). The advantage of SAE is that the learning speed is fast, and the properties of the deep neural network can be adjusted. SAE stacks block structurally. Primarily, the hidden layer is activated by the input data, and then the active hidden layer reconstructs the input data.
The error between the input data and the reconstructed data is reduced by using the objective function. The parameters are updated according to activate the hidden layer. In the SAE network, the labels of the sample are also added into the softmax classifier, and the network parameters are tuned using the BP algorithm. Figure 4 shows the structure of an SAE. The target function for fine-tuning the network parameters is as follows: Micromachines 2018, 9, x FOR PEER REVIEW 6 of 17 The generated network can extract important features from the input data. The parameters of each layer node compare the output and the input data in the output layer through the hidden layer. It can be found that the parameters are determined according to the output data; also, the input data runs in the same way. Once the parameters of a hidden layer are determined, the output layer is removed, and the output of the trained hidden layer is used as input data to design another AE that has a hidden layer and an output layer. An SAE is not a deep generative model. The reason is that RBM depends on probability, and it anticipates test data with input data. However, an SAE trains the model in a deterministic manner. It trains , not 0,1 . The advantage of SAE is that the learning speed is fast, and the properties of the deep neural network can be adjusted. SAE stacks block structurally. Primarily, the hidden layer is activated by the input data, and then the active hidden layer reconstructs the input data.
The error between the input data and the reconstructed data is reduced by using the objective function. The parameters are updated according to activate the hidden layer. In the SAE network, the labels of the sample are also added into the softmax classifier, and the network parameters are tuned using the BP algorithm. Figure 4 shows the structure of an SAE. The target function for fine-tuning the network parameters is as follows:

Compression Method Using Wavelet Packet
Motion data of a horse rider consists of eight features that are compressed from 49,000 to 12,250 data samples using wavelet packets. The data that pass through the low-frequency filter are extracted as features, and the data that pass through the high-frequency filter are difficult to classify, because they are sparse in terms of their characteristics. High performance was obtained by selecting the two-layer wavelet feature. The computation for the generation of wavelet packets is simple when using an orthogonal wavelet. The sequence of functions shown as follows: We have

Compression Method Using Wavelet Packet
Motion data of a horse rider consists of eight features that are compressed from 49,000 to 12,250 data samples using wavelet packets. The data that pass through the low-frequency filter are extracted as features, and the data that pass through the high-frequency filter are difficult to classify, because they are sparse in terms of their characteristics. High performance was obtained by selecting the two-layer wavelet feature. The computation for the generation of wavelet packets is simple when using an orthogonal wavelet. The sequence of functions shown as follows: We have in which W 0 (x) = ϕ(x) is the scaling function and W 1 (x) = ϕ(x) is the wavelet function. In this paper, W 3,0 , W 2,0 , and W 1,0 are used for input. Figure 5 shows the method of decomposition based on wavelet packets.
Micromachines 2018, 9, x FOR PEER REVIEW 7 of 17 in which is the scaling function and is the wavelet function. In this paper, , , , , and , are used for input. Figure 5 shows the method of decomposition based on wavelet packets.

Feature Extraction Based on Statistical Methods
In the previous study [26], we experimented with the elbow angle and the y-axis coordinate data of the hip, whereas in this study, we experimented with forty additional features. Feature values are extracted from 12,000 frames. The feature extraction method gradually extracts five feature values (average, maximum, minimum, variance, and standard deviation) from 1 to 20 frames like mask filter. The data are characterized by eight feature values: y-axis coordinates of the hip, backbone angle, right elbow angle, left elbow angle, right knee angle, left knee angle, elbow distance, and knee distance. We experimented sequentially with 10-100 frames; however, the best performance was achieved at 20 frames, and many features were obtained by taking advantage of the big data feature of AEs. Statistical values are a powerful tool for analyzing time series data. Eight features are generated from the sensor data, and five features are extracted for every frame based on eight features. Finally, 40 features are built. As a result, it is better to apply five feature values than to use a single minimum, a maximum, and an average value. Figure 6 shows a method of constructing statistical data.

Feature Extraction Based on Statistical Methods
In the previous study [26], we experimented with the elbow angle and the y-axis coordinate data of the hip, whereas in this study, we experimented with forty additional features. Feature values are extracted from 12,000 frames. The feature extraction method gradually extracts five feature values (average, maximum, minimum, variance, and standard deviation) from 1 to 20 frames like mask filter. The data are characterized by eight feature values: y-axis coordinates of the hip, backbone angle, right elbow angle, left elbow angle, right knee angle, left knee angle, elbow distance, and knee distance. We experimented sequentially with 10-100 frames; however, the best performance was achieved at 20 frames, and many features were obtained by taking advantage of the big data feature of AEs. Statistical values are a powerful tool for analyzing time series data. Eight features are generated from the sensor data, and five features are extracted for every frame based on eight features. Finally, 40 features are built. As a result, it is better to apply five feature values than to use a single minimum, a maximum, and an average value. Figure 6 shows a method of constructing statistical data.

Ensemble Stacked Auto-Encoder
The softmax classifier provides us with the probabilities for each class label. It is more convenient for humans to interpret probabilities rather than margin scores of an SVM. The ESAE is constructed by building multiple SAE. By combining the probability values of the classifiers, an ensemble form is created. Thus, we could improve the classification performance by changing the structure. The softmax classifier is defined by Equation (16).
The ensemble (sum) in the softmax classifier can be denoted as Equation (17). N is the number of SAE.
The ensemble (product) in the softmax classifier can be denoted as Equation (18).
In ML, ensemble methods use multiple learning algorithms to obtain better predictive performance than what is obtainable from any of the constituent learning algorithms alone. Unlike ML in statistical mechanics, which is usually infinite, an ML ensemble consists of only a concrete finite set of alternative types but typically allows a much more flexible structure to exist among those alternatives. For this purpose, this paper proposes an ESAE. An ESAE consists of two or more SAEs as feature extractors and improves the classification performance by averaging and multiplying the probability values extracted from the softmax classifier using the ensemble (sum) and ensemble (product) values. The performance of the data can be improved owing to the synergy. In this work, we changed the size of hidden nodes in (1) and (2) and experimented in ensemble form. Figure 7 shows the structure of ESAE. The training data is input to SAE (1), SAE (2), and SAE (3) respectively. Learning is performed in the ensemble form according to the change in the hidden layer. Finally, the probability values are obtained from the softmax. Performance is improved through sum and product methods are improved using probability values.
(product) values. The performance of the data can be improved owing to the synergy. In this work, we changed the size of hidden nodes in (1) and (2) and experimented in ensemble form. Figure 7 shows the structure of ESAE. The training data is input to SAE (1), SAE (2), and SAE (3) respectively. Learning is performed in the ensemble form according to the change in the hidden layer. Finally, the probability values are obtained from the softmax. Performance is improved through sum and product methods are improved using probability values.

Pseudocode of Stacked Auto-Encoder
Procedure Ensemble Stacked Auto-Encoder

Sensors of MVN Based Upon Miniature MEMS Inertial Sensor Technology
Sensors of Xsens are a camera-less 3D human motion measurement system. They are based on state-of-the-art MEMS inertial sensors, biomechanical models, and sensor fusion algorithms.

Sensors of Xsens are ambulatory and can be used indoors and outdoors regardless of lighting conditions.
The results of the sensor trials require minimal post-processing, as there is no occlusion or lost markers. Results can easily be exported to other software applications. Figure 8 shows a horse rider with a motion capture suit.

Sensors of MVN Based Upon Miniature MEMS Inertial Sensor Technology
Sensors of Xsens are a camera-less 3D human motion measurement system. They are based on state-of-the-art MEMS inertial sensors, biomechanical models, and sensor fusion algorithms. Sensors of Xsens are ambulatory and can be used indoors and outdoors regardless of lighting conditions. The results of the sensor trials require minimal post-processing, as there is no occlusion or lost markers. Results can easily be exported to other software applications. Figure 8 shows a horse rider with a motion capture suit.

Database
Acceleration sensors were used for data acquisition [26]. In order to obtain the reliability of data, data was additionally acquired. Accuracy can be trusted, because data is acquired over many days. There are four types of horse gaits in the database, including walk, sitting trot, rising trot, and canter. The database consists of 40 feature values with 2400 sizes. To describe the 40 feature values, there are eight features such as elbow angles (2), knee angles (2), elbow and knee distance (2), a

Database
Acceleration sensors were used for data acquisition [26]. In order to obtain the reliability of data, data was additionally acquired. Accuracy can be trusted, because data is acquired over many days. There are four types of horse gaits in the database, including walk, sitting trot, rising trot, and canter. The database consists of 40 feature values with 2400 sizes. To describe the 40 feature values, there are eight features such as elbow angles (2), knee angles (2), elbow and knee distance (2), a backbone angle (1), and a hip-y-axis coordinates (1). The 40 features were extracted from 20 frames using mean, maximum, minimum, variance, and standard deviation with eight features. The size of the database for classification is 2400 × 40. The motion data of the horse rider is sized 49,500 × 1. It is projected into each data to obtain the angle data. Owing to the fact that it is an 8-angle value, data of 49,500 × 8 can be constructed. The dimension is reduced through a wavelet packet algorithm. A2 is selected to reduce the dimension, and statistical feature values are extracted through the reduced-size data.
Several AEs are modeled by using statistical feature values as input to the SAE, and by changing the hidden node. The average is obtained in an ensemble form through the probability values obtained from the respective softmax classifier. When all the results are collected, the final probability value can be obtained. To summarize, there are 4 data. The first data was applied to the AE without preprocessing the data acquired from the sensor to take advantage of the strength of the deep running. Performance is relatively low and excluded. The second data is the hip y data obtained from the sensor with the size 2800 × 70. The third is data using wavelet packet, and the size is 2800 × 18. The parameter settings are as follows. The weights of SAE are set to 0.0001, the max epoch is set to 3000, and the hidden size is 40. Finally, statistical features are extracted after the wavelet packet. The size is 2400 × 40. Training data and testing data is divided 50:50. Figure 9 shows the process of horse-riding coaching. Figure 10 shows hip y data with four kinds of horse gaits [26]. Hip y data with wavelet packet was experimented on in this paper. probability value can be obtained. To summarize, there are 4 data. The first data was applied to the AE without preprocessing the data acquired from the sensor to take advantage of the strength of the deep running. Performance is relatively low and excluded. The second data is the hip y data obtained from the sensor with the size 2800 × 70. The third is data using wavelet packet, and the size is 2800 × 18. The parameter settings are as follows. The weights of SAE are set to 0.0001, the max epoch is set to 3000, and the hidden size is 40. Finally, statistical features are extracted after the wavelet packet. The size is 2400 × 40. Training data and testing data is divided 50:50. Figure 9 shows the process of horse-riding coaching. Figure 10 shows hip y data with four kinds of horse gaits [26]. Hip y data with wavelet packet was experimented on in this paper.

Environment
Motion data is acquired from an expert of horse riding who made one or two revolutions per gaits (walk, sitting, trot, rising trot, and canter) of an oval horse-riding course 20 m in length and 10 m in breadth while wearing a motion capture suit. Using the 3D motion capture suit based on Xsens inertial sensors, data were extracted in the order of Jeju (137 cm or less), Thoroughbred (160 cm), and Warm Blood (150-173 cm). It took 1 to 2 min to measure a file. Figure 11 shows the environment for data acquisition.
(c) (d) Figure 10. Hip y data with four kind of horse gaits. (a) Hip y data for walk, (b) hip y data for rising trot, (c) hip y data for sitting trot, and (d) hip y data for canter.

Environment
Motion data is acquired from an expert of horse riding who made one or two revolutions per gaits (walk, sitting, trot, rising trot, and canter) of an oval horse-riding course 20 m in length and 10m in breadth while wearing a motion capture suit. Using the 3D motion capture suit based on Xsens inertial sensors, data were extracted in the order of Jeju (137 cm or less), Thoroughbred (160 cm), and Warm Blood (150-173 cm). It took 1 to 2 min to measure a file. Figure 11 shows the environment for data acquisition.

Coaching for Horse Riding
There are various approaches to coaching horse riding, such as muscle utilization, posture, and tacit understanding with a horse. We develop a system that allows users to visualize their data and compare professional posture with amateur posture. Numerically, a range of the maximum value and the minimum value is visually expressed from each feature value. Self-coaching can be achieved

Coaching for Horse Riding
There are various approaches to coaching horse riding, such as muscle utilization, posture, and tacit understanding with a horse. We develop a system that allows users to visualize their data and compare professional posture with amateur posture. Numerically, a range of the maximum value and the minimum value is visually expressed from each feature value. Self-coaching can be achieved by comparing feature values for each frame with an expert. Figure 12 shows a coaching system for horse riding, and Table 2 shows a numerical comparison of walk and canter.  Figure 12 shows a coaching system for horse riding, and Table 2 shows a numerical comparison of walk and canter.  Characteristics of walk and canter are analyzed for the sake of coaching. Generally, the data value of horse riding has a cycle. By checking the frame-by-frame period, the user's motion can be recognized. Figure 13 shows a comparison of two feature values (hip y, an angle of backbone) by gaits. In the case of Figure 13, the change in the value controls the rider's hip height range. We can see it move more significantly in the canter than in the walk. The canter represents greater movement than the walk regarding elbow angle, knee angle, and backbone angle. Also, the cycle of canter is shorter than walk. Figure 14 shows a comparison of two feature values (an angle of right elbow, an angle of left elbow) by gaits. Figure 15 shows comparison of two feature values (an angle of right knee, an angle of left knee) by gaits. Figure 16 shows a comparison of two feature values (a distance of elbow, a distance of knee) by gaits.  Characteristics of walk and canter are analyzed for the sake of coaching. Generally, the data value of horse riding has a cycle. By checking the frame-by-frame period, the user's motion can be recognized. Figure 13 shows a comparison of two feature values (hip y, an angle of backbone) by gaits. In the case of Figure 13, the change in the value controls the rider's hip height range. We can see it move more significantly in the canter than in the walk. The canter represents greater movement than the walk regarding elbow angle, knee angle, and backbone angle. Also, the cycle of canter is shorter than walk. Figure 14 shows a comparison of two feature values (an angle of right elbow, an angle of left elbow) by gaits. Figure 15 shows comparison of two feature values (an angle of right knee, an angle of left knee) by gaits. Figure 16 shows a comparison of two feature values (a distance of elbow, a distance of knee) by gaits.

Experiment and Result
This study focused on employing an ESAE to classify horse riding gaits to facilitate real-time coaching. To classify the actual horse-riding gaits, an ESAE with a higher classification rate and real-time posture coaching should be used. In summary, the ESAE exhibited the best performance for classification. According to classification results and the motion information such as the hip value, which is the main parameter for motion analysis and coaching, we can apply the proposed method to the coaching system, for each horse gaits, and for a rider under real or simulated environments. When three SAEs were used, the hidden size of ESAE was set to 30, 20, and 10, respectively, and when two SAEs were used, they were set to 46 and 15, respectively. When a single AE was applied, the average performance was 96.1%, and when a single SAE was applied, the performance was 96.8%. Two kinds of data were used: the hip value and eight characteristic values. The statistical data exhibited good performance in all algorithms, because it could separate all data characteristics well. Among them, the ESAE showed the best performance. Table 3 indicates a comparison of performance using hip y data with wavelet packet. Figure 17 shows the performance of ESAE for 40 feature data with wavelet packet. Figure 18 shows a comparison of performance using 40 feature data with wavelet packet.  Figure 16. Comparison of two feature values (distance of elbow, distance of knee) by gaits.

Experiment and Result
This study focused on employing an ESAE to classify horse riding gaits to facilitate real-time coaching. To classify the actual horse-riding gaits, an ESAE with a higher classification rate and real-time posture coaching should be used. In summary, the ESAE exhibited the best performance for classification. According to classification results and the motion information such as the hip value, which is the main parameter for motion analysis and coaching, we can apply the proposed method to the coaching system, for each horse gaits, and for a rider under real or simulated environments. When three SAEs were used, the hidden size of ESAE was set to 30, 20, and 10, respectively, and when two SAEs were used, they were set to 46 and 15, respectively. When a single AE was applied, the average performance was 96.1%, and when a single SAE was applied, the performance was 96.8%. Two kinds of data were used: the hip value and eight characteristic values. The statistical data exhibited good performance in all algorithms, because it could separate all data characteristics well. Among them, the ESAE showed the best performance. Table 3 indicates a comparison of performance using hip y data with wavelet packet. Figure 17 shows the performance of ESAE for 40 feature data with wavelet packet. Figure 18 shows a comparison of performance using 40 feature data with wavelet packet.

Classification Performance
The accuracy Performance of SVM, TREE, KNN, and Ensemble Bagging in Table 3 is used as the ratio of correct classification to the number of total classified samples. The accuracy can be formulized as follows:

Accuracy
TP TN TP TN FP FN (18) TP is the number of correct predictions for positive samples, TN is the number of correct predictions for negative samples, FN is the number of incorrect predictions for positive samples, and FP is the number of incorrect predictions for negative samples. The performance of ESAE models presented in this study is obtained by softmax method described in Section 3.3.

Classification Performance
The accuracy Performance of SVM, TREE, KNN, and Ensemble Bagging in Table 3 is used as the ratio of correct classification to the number of total classified samples. The accuracy can be formulized as follows:

Classification Performance
The accuracy Performance of SVM, TREE, KNN, and Ensemble Bagging in Table 3 is used as the ratio of correct classification to the number of total classified samples. The accuracy can be formulized as follows: Accuracy = TP + TN TP + TN + FP + FN (18) TP is the number of correct predictions for positive samples, TN is the number of correct predictions for negative samples, FN is the number of incorrect predictions for positive samples, and FP is the number of incorrect predictions for negative samples. The performance of ESAE models presented in this study is obtained by softmax method described in Section 3.3.

Conclusions
This paper proposed the use of an ESAE to classify gaits of horse riding to facilitate real-time coaching. When the ESAE is used, the classification rate is 98.5%. According to classification results, and the motion information such as the hip value, which is the main parameter for motion analysis and coaching, we can apply the proposed method to the coaching system for each horse gait and for a rider under real or simulated environments. Moreover, ANFIS faced limitations in performance and time. Therefore, we could solve this problem by applying deep learning. Additionally, the ESAE is used for classifying horse gaits. AE can be converted into an ensemble form, which can have a synergistic effect on performance enhancement. As future work in the field of horse riding data analysis, we will study aspects of coaching in detail. Further, we plan to employ other body signals in the ESAE algorithm.