A Unified Deep-Learning Model for Classifying the Cross-Country Skiing Techniques Using Wearable Gyroscope Sensors

The automatic classification of cross-country (XC) skiing techniques using data from wearable sensors has the potential to provide insights for optimizing the performance of professional skiers. In this paper, we propose a unified deep learning model for classifying eight techniques used in classical and skating styles XC-skiing and optimize this model for the number of gyroscope sensors by analyzing the results for five different configurations of sensors. We collected data of four professional skiers on outdoor flat and natural courses. The model is first trained over the flat course data of two skiers and tested over the flat and natural course data of a third skier in a leave-one-out fashion, resulting in a mean accuracy of ~80% over three combinations. Secondly, the model is trained over the flat course data of three skiers and tested over flat course and natural course data of one new skier, resulting in a mean accuracy of 87.2% and 95.1% respectively, using the optimal sensor configuration (five gyroscope sensors: both hands, both feet, and the pelvis). High classification accuracy obtained using both approaches indicates that this deep learning model has the potential to be deployed for real-time classification of skiing techniques by professional skiers and coaches.


Problem Definition
Cross-country (XC) skiing is a whole-body exercise endurance sport, which requires prolonged complex cyclical motions performed using skis and poles on the snow [1]. There are two main styles in XC-skiing: the classical and the skating style. The classical style can be performed both on prepared trails with pairs of parallel grooves cut into the snow or on natural undisturbed snow whereas the skating style is generally performed on firm and smooth snow surfaces. Each of the classical and skating styles have four techniques or gears. These are diagonal stride (DS), double poling (DP), push-off (P-Off), and kick-double poling (KDP) for the classical style, and V2 skate (V2), V2A skate (V2A), V1 skate (V1), and free skate (FS) for the skating style, respectively. For ease of understanding and reference, we will henceforth refer to these techniques using the abbreviations mentioned after the name of the technique.
In XC-skiing, the performance of the skiers depends on the biomechanical and the physiological aspects of the motions of the body parts and the sequence in which the skiing techniques are performed on the uphill and downhill tracks (commonly known as a natural course) and flat tracks (flat course). As the results of the skiing races can be determined by time steps as small as a few milliseconds, it becomes imperative for the professional coaches to understand both these aspects of XC-skiing to recommend an improved set of techniques for optimizing the performance of the skiers. Traditionally, these analyses have been performed using video-based systems [2][3][4][5][6] and/or force measurement systems [7][8][9]. However, the utilization of equipment in both these systems interferes with the natural movements of the body parts and the heavy cost involved limits their practical usage to only a few researchers across the world [10].
Body worn sensors, particularly the inertial sensors have recently emerged as a convenient substitute for such systems due to their small size, light weight, and low cost. Inertial sensors are sensors based on inertia and relevant measuring principles. In general, inertial sensors include gyroscopes used for measurements of the sensor's angular velocity and accelerometers for measurements of linear acceleration. These sensors can sample at high frequencies and are easily attached to the skier's body without interfering with the natural motion during skiing. This ease of use has made it possible to carry out experiments that require sensor data outside the controlled environment of the laboratory and provide a more realistic analysis of the task at hand. Marshland et al. [11] were the first to demonstrate this potential of body worn microsensors in the identification of XC-skiing techniques by plotting acceleration and angular velocity curves for eight athletes for both the classical and skating techniques. By visual inspection of the cyclical patterns in these plots, they concluded that all the classical and skating techniques can be clearly identified for each skier, with certain variations unique to each skier.

Literature Review and Proposed Work
Traditionally, studies of XC-skiing techniques have been limited to the kinematical and biomechanical analysis of various techniques [12,13]. These studies aim to determine numerous hard rules, like cycle time, poling/pushing time, recovery time [5,14,15], number of recovery motions, the sign of forearm angular velocity [16,17], correlation of the angular velocity of arms and legs [18], and figures showing identifiable cyclic patterns in the gyroscope and accelerometer data for the classification of techniques. These approaches, however, are extremely time-consuming as the derivation of the classification rules requires manual analysis of the gyroscope and accelerometer data from multiple sensors. Recently, many researchers have analyzed and classified techniques of XC-skiing using algorithms, like markov chains of multivariate distributions, and more advanced machine learning techniques. Stoggl et al. [19] utilized an accelerometer attached to the chest of professional skiing skiers to classify skating techniques. They collected data of 11 skiing skiers on a treadmill and developed a classification model based on the markov chains of multivariate distributions. Their model achieved an accuracy of 86% ± 8.9% on the test set when the training data included data from all the skiers, which rose to 90.3% ± 4.1% when separate classification models were developed for each skier. Rindal et al. [1] utilized a neural network for the classification of skating techniques by utilizing two sensors-a gyroscope on the arm for cycle identification and an accelerometer on the chest for technique classification. They achieved an accuracy of 93.9% on the test set. In both studies, the raw data was passed through a gaussian filter for the removal of ringing effects and undesirable time shifts at different frequencies. Ristner [20] implemented a Markov model and a k-nearest neighbors (KNN) algorithm for classifying XC skiing techniques using a 3D accelerometer attached on the chest of the skiers. The comparison showed that the KNN algorithm showed much lower error rates (0.19%) than the Markov model (7.22%). All these studies are impressive and show promise for the automatic and reliable classification of XC-skiing techniques using inertial sensors. However, these studies suffer from many limitations. In the study performed by [19], the data is collected in the controlled environment of the laboratory, which will be different from the actual on-field data. The neural network model used by [1] takes in data of each cycle after flattening it into a single vector. This leads to information loss as the spatial and temporal patterns in the data are lost. Both studies develop models that classify either only the classical or skating techniques and do not obtain a single model, which could be employed for both the styles. Table 1 summarizes the relevant details of the aforementioned studies.
In this paper, we propose a unified convolutional neural network (CNN) and long-short term memory (LSTM) based deep learning classification model, which can be used to classify both the classical and skating style techniques of the skiers simultaneously. The first novelty of our approach lies in using convolutional layers for merging the local interactions among the time-series data obtained from each sensor and recurrent layers (long-short term memory layers) for extracting the temporal patterns. In this way, the model is able to extract important features for the classification of various techniques automatically from the raw data, thus eliminating the need for manually designed features required by machine learning algorithms. To prove this point, we present a comparison of the results obtained from our model with a KNN model developed by manually extracting features on the same training, validation, and test datasets. We collected the flat and natural course data of four professional skiers in total and pose a working hypothesis that the generalization accuracy of the proposed deep learning model increases as the amount of training data is increased. To prove this point, the model is first trained on the flat course data of two professional skiers and tested on the flat and natural course data of a third skier in a leave-one-out fashion. Secondly, the model is trained over the flat course data of three skiers and tested on the flat and natural course data of the fourth skier. An increase in the accuracy of classification when the size of the training data is increased confirms this hypothesis.
The second novelty of our approach lies in developing a unified model, which can be used for the classification of both classical and skating techniques simultaneously. We present strong evidence in favor of using only the flat course data for training the model and using it to classify XC-skiing techniques both on flat and natural courses, thus eliminating the need for collecting natural course data for training, which is extremely difficult to procure. Finally, the comparison of accuracies among five different combinations of sensors, which establishes the sports biomechanics configuration (both hands, both feet, and the pelvis sensors) as the optimal set of sensors, provides empirical evidence to researchers to base their future studies on this optimal configuration.

Inertial Sensors, Synchronization and Calibration
XSens MVN motion capture system (Xsens Technologies B.V., Enschede, The Netherlands) consisting of 17 body-wired inertial motion trackers (Figure 1) was used to record the participants' kinematical data at a frequency of 240 Hz. Each motion tracker is 36 × 24.5 × 10 mm in dimension, weighs 10 g, and is mounted at a specific body location with the help of a wearable lycra suit. The wired motion trackers are connected to an on-body data hub (known as bodypack), which is responsible for synchronization and gathering data on its internal memory (Xsens MVN Technical Report, March 2018). Calibrations were performed by the system software, Xsens MVN Analyze, prior to data collection, which requires the subject's height and foot length to estimate the dimensions and proportions of the person being tracked. After subject calibration, he/she is asked to stand still in a T-pose and walk a few meters back and forth for the purpose of the sensor to segment calibration and development of a biomechanical human model. This is followed by a slight forward movement for defining the positive X-axis. The Y-axis is perpendicular to the X-axis in the horizontal plane while the Z-axis is perpendicular to the horizontal surface. Each motion tracker has 3 sensors: accelerometer, gyroscope, and magnetometer, which provide raw recorded data of linear acceleration, angular velocity, and magnetic field intensity along the x, y, and z axis local to each sensor, respectively.
The on-body recording function was used for data collection, which allows recording of the subject's motions without the need for a laptop or PC by storing the motion trackers' data on the body pack. After finishing the data recording, the body pack was connected with a laptop to import the recordings for further processing and analysis.

Training and Validation Data Acquisition
For the purpose of training and validating the trained model, XC-skiing data from 3 professional skiers (Table 2) from the Korea National Sport University was collected. They performed the classical and skating XC-skiing techniques on outdoor flat and natural courses in Pyeongchang, South Korea, where the 2018 Winter Olympic Games took place. All of them were informed of the purpose of this study and they participated voluntarily in the experiment after reading the research guidelines and signing consent forms. The study was ethically approved by the Korea National Sport University Institutional Review Board (IRB Number 20170424-004). In the training dataset, each skier performs only one technique of either the classical or skating styles on the flat course repeatedly. Each subject performs 5-6 laps of each technique on a 500 m long track. As there are 8 techniques (4 classical and 4 skating) and 3 skiers, a total of 24 such files are obtained. In the validation dataset, each skier is allowed to perform either all the 4 classical or skating techniques on either a 2.5 km long flat or natural course, similar to what he/she would perform under competitive conditions. The skiers are free to make transitions from one technique to the other; however, the skiing style remains the same. A total of 11 files are obtained in this manner (1 file could not be obtained due to unavailability of the tracks). For both the datasets, a video recording of the skiers while performing the skiing techniques is also shot. Table 3 lists the type of data collected for each subject. In order to examine whether our developed model could classify skiing techniques for skiers with different skill levels, we allowed the skiers to freely choose their own preferred skiing speeds and exercise intensities during the data collection. Table 3. Training and validation data collected for three professional skiers characterized by the type of course (flat/natural) and the number and type of skiing techniques (classical/skating) that the subject is allowed to perform simultaneously.
The classical style data on the natural course for skier 1 is not available.
Each file in the training data exclusively contains the data of one of the techniques of one of the skiing styles, and hence does not require any labelling. However, the 4 techniques of each XC-skiing style in validation data are performed in a combined way and hence have to be labelled after data collection. The ground truth labels for the validation data are developed by professional cross-country skiing players from the Korea National Sport University by simultaneously watching the recorded video from a digital camera, human model video from the XSens MVN Analyze, and marking the frames corresponding to each technique in the raw data files. Labelling follows a 0-9 convention for each file: 0: start/end of the recording, 1: DS, 2: P-Off, 3: KDP, 4: DP, 5: V2, 6: V2A, 7: V1, 8: FS, 9: descending). The labels have been double checked by a professional XC skiing coach at the Korea National Sport University.

Test Data Acquisition
For the purpose of testing the generalization of the trained model, two test datasets of a new skier (skier 4) were collected. The test subject (gender: Female, age: 24 years, weight: 55 kg, height: 156 cm) is also a professional XC-skiing player from the Korea National Sport University and prior information about the purpose of the study was provided to her followed by signing of consent forms (IRB Number 20170424-004).
The 2 types of test data are as follows: (i) Test set 1: In the first type of test data, the subject is allowed to perform only one of the techniques of one of the XC-skiing styles on the flat course. As there are 4 classical and 4 skating techniques, this test set consists of 8 files. (ii) Test set 2: In the second type of test data, the subject performs all the skating style techniques on a natural course simultaneously. The subject is allowed to make transitions between the various skating techniques similar to what would be performed during a competition. One data file is obtained in this manner.

Data Selection and Preprocessing
In this study, each training instance represents one cycle of one of the techniques of either the classical or skating styles. Thus, it becomes extremely important to select the data that can represent these cyclic patterns most clearly. Figure 2 represents the typical data patterns in linear acceleration and angular velocity data for each of the 4 classical and 4 skating techniques. These figures were plotted after filtering the raw data using a low pass butterworth filter of fourth order and a cutoff frequency of 0.007 Hz. As is clear from the figure, angular velocity data, which is obtained via the gyroscope, shows more easily identifiable cyclic patterns as compared to linear acceleration data, which comes from the accelerometer. Due to the ease of identifying the cycles and low computational cost from the less sensor data, only the gyroscope data is used in this study for developing the classification models.

Training Dataset
Each of the 24 files in the training dataset, in which each skier performs only 1 technique of either the classical or skating style on the flat course repeatedly, has turning points at the end of a lap each time the track is traversed. During the turning points, the skier is not performing any technique and hence these points are removed from the data of each skier by manually identifying the frames corresponding to such durations using the Xsens human model videos and the recorded videos from a digital camera. This gives data files with continuous repetitions of the same technique. A low-pass butterworth filter of fourth order and a cutoff frequency 0.007 Hz is applied to smoothen the raw data, following which the z-axis angular velocity of the gyroscope on the left leg is used for finding the locations of peaks. The distance between two consecutive peaks in the filtered data is variable and represents 1 cycle. Each input to the CNN-LSTM model must have the same dimensions. Thus, the locations of the peaks in the filtered data is used to resample the raw data (after removing the turning points) to a fixed cycle-length of 333 time-steps (which is the mean number of time-steps for all the techniques in the training data) using an anti-aliasing finite impulse response low pass filter, which resamples the data at (333/n) × 240 Hz, where n is the number of time steps in a given cycle before resampling and 240 Hz is the original sample rate. Resampling is performed over the raw data and not over the filtered data. It is because a neural network works best with raw data from which it automatically extracts the features required for the classification task. These resampled cycles are arranged into tensors, which make it suitable for passing them to a CNN layer and later to an LSTM layer as a multivariate time series. We thus obtain a total of 24 tensors. The number of cycles of each technique of each XC-skiing style performed by each skier on the flat course is shown in Table 4. The dimensions of each matrix in a tensor is 333 × 51 (17 gyroscopes, 3 axes each, hence 51 columns).

Validation Dataset
In the validation dataset, each skier performs either all the 4 classical or skating techniques on either a flat course or a natural course simultaneously. Each of the 11 files in the validation dataset has starting and end points of the recording (labelled as 0), descending points (labelled as 9), and transition points (not labelled), which are classified as noise. The start/end and descending points are deleted from the data and the transition points are removed by manually identifying the frames corresponding to such time durations using the Xsens human model videos. The filtering of this data for cycle detection followed by resampling and arrangement into tensors is in accordance with what was performed for the training dataset. A total of 11 tensors are obtained in this manner and the dimensions of each matrix in a tensor is 333 × 51 (17 gyroscopes, 3 axes each, hence 51 columns). Table 5 provides further information about this dataset. As can be observed from Table 5, different skiers have different preferences in terms of the techniques they use on a particular course. For example, skier 1 does not perform double poling (DP) on the flat course at all whereas skier 2 performs it 26 times and skier 3 performs it 46 times (highest among all classical techniques) on the same course. However, it is clear that there is a somewhat more even distribution among the usage of skating techniques as compared to classical techniques.
On the natural course, all the skiers do not use free skate (FS) and seldom use push off (P-Off) and kick double poling (KDP). Thus, it is clear that the skiers have certain preferences on the techniques they utilize on different courses (Table 5). This preference makes this dataset highly imbalanced and a perfect validation set for validating the performance of the trained deep learning model.

Test Dataset
In test set-1, the test subject (skier 4) performs only one of the techniques of one of the XC-skiing styles repeatedly on the flat course. It has 8 files (one file for each technique of the classical and skating styles) and resembles the training data. Thus, its preprocessing is performed analogously to the training data. Similarly, the test set-2 resembles the validation data and its preprocessing is performed analogously to the validation data. Thus, 8 tensors are obtained for test set-1 and 1 for the test set-2. Table 6 contains more information about the 2 types of test sets.

Architecture of the Deep Network
The deep learning model developed to classify the XC-skiing techniques is motivated from Deepsense proposed by Yao et al. [22], which in the authors' words "provides a general signal estimation and classification framework [for regression and classification problems] that accommodate a wide range of applications." The training, validation, and testing data of our problem is arranged into 3D tensors, where each matrix corresponds to one training (or testing) example and has 333 rows and 51 columns. Each column contains data along one of the axes as recorded by one sensor and represents a time-series with 333 time steps. Thus, the classification problem is posed as a multivariate time-series sequence classification task. To capture the interactions among these time series, they are passed through convolutional layers. Deepsense first performs fourier transformation on the raw data of each sensor, passes the frequency data of each sensor through convolutional layers individually, and then combines the data of all the sensors to pass it through another convolutional layer. This approach requires a greater number of convolutional layers and selecting the number of frequencies that must be passed to the network, which introduces an element of human decision-making. We solve this problem by making 2 simple modifications to our model: (i) We pass the raw sensor data to the convolutional layers instead of the frequency data, and (ii) we convolve the raw data of all the sensors in a single step, which reduces the total number of convolutional layers required for training.
As the skiers may perform the same skiing techniques at varying speeds and intensities, it is necessary that the deep network layers be robust to the scale of the data and be able to capture features that may be found at different time-steps in the time-series. CNNs are very powerful in extracting local spatial coherence and dependencies in the data, and the scale invariance introduced by the max-pooling layers allows them to learn hidden features regardless of the position of the feature or its scale. We pass the raw data through two 1D convolutional layers with 64 filters each and 20 and 10 kernels, respectively, followed by a max-pooling layer with a pool-size of 4. This convolved data is then passed through 2 long-short term memory (LSTM) layers, with 300 and 200 units, respectively, to capture long-term temporal dependencies in the time series. As LSTMs are highly prone to overfitting, a dropout layer with a dropout probability of 0.2 is added after each LSTM layer. The network is trained over 12 epochs with a batch size of 40. Various steps involved in data preprocessing and the architecture of the deep network are summarized in Figure 3.

Results
We now present results for the training, validation, and test sets obtained by training the proposed CNN-LSTM based deep learning model. To compare the results with a traditional machine learning algorithm, we also trained a k-nearest neighbor (KNN) classifier by extracting manually designed features from the training data. Five different combinations of sensors are used for training the model as shown in Table 7. We will compute and compare the results for each of the sensor configurations and come up with the best subset of the 17 sensors, which should be used for the analysis of XC-skiing techniques in future studies.

Training and Validation Set Results Using Deep Learning
The deep learning model, trained on the training data of the three skiers according to the architecture described in Section 2.4, resulted in a training dataset accuracy of at least 97.8% for the whole body, upper body, lower body, and sports biomechanics sensors configuration, and 79.4% when only the pelvis sensor is used (Table 8). Table 9 provides the confusion matrix for the training dataset for the sports biomechanics configuration.  As is clear from Table 9, the model is able to classify all the techniques almost perfectly except the classical push-off and double poling techniques. Twenty-eight push-off techniques have been wrongly classified as double poling techniques and nine double poling techniques as push-off techniques.
The validation dataset accuracies for the five different configurations of sensors are shown in Table 10. The mean classification accuracy with the pelvis, the upper, and the lower body sensors is 64%, 80%, and 70% respectively, which increases to approximately 87% for the 17 sensors (the whole body) and the sports biomechanics configuration. As the accuracies while using all the 17 sensors and the five sensors in sports biomechanics configuration are the same, the sports biomechanics configuration of sensors is the optimal set due to much smaller number of sensors. The confusion matrices for the validation sets for skier 3 for the sports biomechanics configuration are shown in Tables 11-14. The confusion matrices for the validation sets of skier 1 and skier 2 can be found in Appendices A.1 and A.2. Table 11. Confusion matrix for the natural course, classical style validation set of skier 3 when using the sports biomechanics configuration of sensors.   It is interesting to note that although the model has been trained on classical and skating styles data simultaneously, it has almost perfectly learnt to differentiate between these two styles. In Table 12, three V2 techniques have been incorrectly classified as V2A, and 23 out of the 28 V2A techniques have been incorrectly classified as V1. In Table 13, five push-off techniques have been incorrectly classified as DP and 6 KDP techniques as push-off. These are certain areas of misclassification errors, which the model is not robust to. However, despite these small misclassification errors, the mean classification accuracy achieved for skier 3 is approximately 90%, which is a very high value considering that our model is trained only on the flat course data and has never observed natural course data.

Leave-One-Out Testing Results
To assess the generalization accuracy of the model, we perform a leave-one-out type of testing in which the flat course data of two out of the initial three skiers is used for training and the flat and natural course data of the remaining third skier is used for testing. As the third skier can be chosen in 3 C 1 = 3 ways, we have a total of three combinations of training and test sets. For example, combination 1 includes subject 2 and subject 3's flat course data as the training set, and subject 1's flat and natural course data as the test set. We present the results of leave-one-out testing for both the proposed deep learning model as well as a traditional k-nearest neighbors (KNN) machine learning algorithm. For the KNN algorithm, the feature vector corresponding to a cycle in the training data consists of the pairwise correlation values between the time-series represented by each axis of each sensor in a cycle. For example, while using the whole-body configuration of sensors (17 sensors), there are a total of 51 time series in each cycle, which correspond to a feature vector of length 1276 (= 51 C 2 + 1). Table 15 presents the results of the proposed deep learning model and Table 16 for the KNN machine learning model when a leave-one-out type of testing is performed using the sports biomechanics configuration of sensors. The results for the other four configurations of sensors are available in the Appendix A.3. Table 15. Classification accuracies for the first three skiers using the proposed deep learning model when a leave-one-out type of testing is performed using the sports biomechanics configuration of sensors.  The overall mean accuracy for the leave-one-out type of testing for the three skiers using k-nearest neighbors' algorithm is~65%, which increases to~80% when the proposed deep learning model is used. Also, the mean accuracy values for each skier is higher in the case of the deep learning model as compared to the KNN model.

Test Set Result Using Deep Learning
To further assess the generalization performance of the model, it is trained on the flat course datasets of the first three subjects and tested on the two test datasets obtained from subject 4. The classification accuracies for test set-1 and test set-2 for all five sensor configurations are as shown in Table 17. Again, the sports biomechanics configuration with five sensors has maximum accuracy for both the test sets, reaffirming the aforementioned proposition that this configuration is the optimal set of sensors.  Tables 18 and 19 show the confusion matrices for test set-1 and test set-2 for the sports biomechanics configuration, respectively. In test set-1, all the techniques except the classical push-off (P-Off) and double poling (DP) have been classified almost perfectly. One hundred and seventy-two (out of total 241) push-off techniques have been incorrectly classified as double poling and 74 (out of total 295) double poling as push-off, leading to a low classification accuracy for the push-off and the double poling. These are the same two techniques that were confused by the model in the training set, and hence some misclassification in the test set was also expected. For test set-2, a very high overall accuracy of 95.1% is obtained for the sports biomechanics configuration of sensors. Thus, we achieve an overall mean accuracy of 91.15% on the test set of skier 4.
It should be emphasized that the overall mean accuracy when a leave-one-out type of testing is performed over the first three skiers is~80%, which increases to~91.1% when testing is performed over skier 4 using the same deep learning model. This is due to the fact that the deep learning model tested over skier 4 has been trained over the data of three skiers whereas the same model when tested over each of the first three skiers has been trained over the data of two skiers in a leave-one-out fashion. Thus, the generalization accuracy of the proposed model increases as the size of the training dataset is increased. These results provide strong evidence in favor of our hypothesis that the accuracy of the deep learning model increases as the training datasets become larger.

Validation and Test Set Results Using K-Nearest Neighbors Algorithm
To compare the results of the deep learning model with a traditional machine learning algorithm, we trained a k-nearest neighbors classifier on the training data of three subjects and tested it on the fourth subject. The feature vector corresponding to a cycle in the training data consists of the pairwise correlation values between the time-series represented by each axis of each sensor in a cycle. Table 20 represents the validation set and Table 21 represents the test set-1 and test set-2 accuracies for all five configurations of the sensors.

Discussion
We developed a unified CNN-LSTM based deep learning model for classifying both the classical and skating style techniques simultaneously using the gyroscope data. Even though our model was trained only on the outdoor flat course data, it achieved an accuracy of 87.2% and 95.1% on the flat and natural course test sets, respectively, leading to an overall mean accuracy of 91.15%, using the optimal gyroscope sensor configuration (five sensors: both hands, both feet, and the pelvis). This presents strong evidence in favor of using only the flat course data for training the model and using it to classify XC-skiing techniques both on flat and natural courses, thus eliminating the need for collecting natural course data for training, which is extremely difficult to procure. To the best of our knowledge, we are the first ones to propose a unified deep learning model for classifying classical and skating techniques simultaneously with high accuracy. A KNN model with manually designed features for the skiing technique classification on the same datasets was further used as a benchmark for evaluating the performance of our unified deep learning model. The KNN algorithm was chosen since it is preferable due to less error rates in classifying the classical style and skating style simultaneously when compared with a Markov model according to an earlier study [20]. The comparison between the accuracies obtained from these two approaches for the validation dataset ( Figure 4) and two test sets (Table 22) clearly showed that deep learning is more effective and has higher classification accuracy than the KNN. This result is in line with the findings from a recent study [23], which used a 3D accelerometer to classify only two free skating style techniques (gear 2, gear 3) and reported that the deep learning had the highest accuracy among all investigated classification models.  Even though the developed deep learning model achieved high overall classification accuracy for eight skiing techniques simultaneously, in-depth analysis of the confusion matrices showed that most incorrect classifications occurred for classical push-off and double poling techniques. The classical push-off and double poling techniques have identical motions of upper body and pelvis, the only differences between these two techniques are that a classical push-off begins with a slight jump for the propulsive force and the body movements are faster and exaggerated as compared to double poling. Such exceedingly similar physiological and biological characteristics are the cause for the confusion of the model and lead to misclassifications. In addition, some misclassifications occurred for V2A, which is a typical technique used in level terrain up to moderate uphill inclines or during transitions between V2 and V1. In V2A skate, the timing sequence for pole push is the same as V2 skate, but it employs one double pole with every second skate, which is different from one double pole with every skate in V2 skate. V1 skate is an uphill technique, which employs an asymmetrical poling with every second skate [3,24]. Transitions between similar techniques, V2A and V2, V2A and V1 lead to high classifications errors on V2A. This finding is consistent with the result from a previous study [19].
In order to provide empirical evidence to researchers to base their future studies on the optimal sensor configuration for analysis of XC-skiing techniques, we compared classification accuracies among five different combinations of sensors on the training, validation, and test datasets. The five combinations include whole body with 17 sensors, upper body with 11 sensors, lower body with 7 sensors, sports biomechanics configuration with 5 sensors, and the pelvis configuration with 1 sensor only. Collective results (Tables 10, 20 and 22) show that the sports biomechanics configuration (both hands, both feet, and the pelvis sensors) can achieve a very similar accuracy as the whole body with 17 sensors. The classification accuracy from the sport biomechanics configuration is much higher than the accuracies from the pelvis, the upper, and the lower body sensors. A low classification accuracy for the pelvis sensor indicates that this sensor alone is not sufficient to capture the complex motions of all the body segments during the XC-skiing. Moderate, but not high, classification accuracies for the upper and lower body configuration of sensors are not surprising because in both the configurations, the data of only half of the body segments is available for training the model. In the sports biomechanics configuration, only five sensors, those on the hands, on the feet, and the pelvis, are used. Out of these five body segments, four body segments, both hands and feet, are at the extremes of the body where the motions of the segments are most exaggerated and vigorous, and the pelvis is close to the centre of the mass of the body, which represents an overall motion of the body segments. As the results while using all 17 sensors and five sensors in sports biomechanics configuration are very close to each other, we infer that the other 12 sensors are almost inconsequential and provide no additional information. Thus, the sports biomechanics configuration of sensors is the optimal set and future studies of XC-skiing classification can be based on the data obtained from this set with strong experimental proof.
Several previous studies have attempted to classify XC-skiing techniques by numerous hard rules or machine learning algorithms. Seeberg et al. [18] classified the classical XC-skiing techniques by deriving hard rules based on the data of 11 skiers and achieved an overall sensitivity of 99~100%. They, however, classified only the diagonal stride, double poling, and kick double poling techniques while leaving out push-off from the classification. Among the classical XC-skiing techniques, push-off and double poling are the only two techniques that are substantially misclassified by our algorithm, as is evident from the test set-1 confusion matrix in Table 18. One hundred and seventy-two (out of total 241) push-off techniques have been incorrectly classified as double poling and 74 (out of total 295) double poling as push-off. Moreover, they utilized six IMUs for classification of XC-skiing techniques and a total of 18 sensors were used, since each IMU contains one accelerometer, one gyroscope, and one magnetometer, whereas our model performs classification only with five gyroscope sensors. In addition, our model development does not require expert domain knowledge and a tedious process to derive the hard rules for classification. Rindal et al. [1] classified classical XC-skiing techniques by utilising two sensors, one accelerometer and one gyroscope, on the data of 10 participants and achieved an overall accuracy of 93.9% ± 3%. We achieved an overall mean accuracy of 91.1% by utilising data of four subjects and five gyroscope sensors. Our results rivals the results of [1] in terms of accuracy, but at an additional cost of three extra sensors. However, the data in [1] is a combination of data obtained from outdoor tracks and that obtained on a treadmill in the controlled environment of the laboratory whereas our data is obtained only from natural outdoor tracks. Additionally, they classified only the classical XC-skiing techniques whereas our model classifies both the classical as well as the skating techniques simultaneously. At the same time, our model shows considerable improvement in classification accuracy when the size of the training data is increased. Stoggl et al. [19] classified skating techniques by utilising a single accelerometer on the data of 11 skiers obtained on a treadmill in the controlled environment of the laboratory, and achieved an accuracy of 86% ± 9% on the test set. As the accuracy achieved by our model is higher and our model has additional advantages in terms of performing classification of both classical and skating techniques simultaneously, we conclude that our model has higher potential of being deployed as a real time classification model for XC-skiing techniques.
Despite the inherent advantages in terms of automatic selection of features, high accuracy, and simultaneous classification of classical and skating XC-skiing techniques, this study suffers from certain limitations. First, due to practical constraints, such as the unavailability of the skiing tracks and tight training schedule of professional skiers, we only obtained the experimental data from a relatively small sample size (four professional skiers) for training, validating, and testing our models, further study should be carried out with a larger sample size for the verification of these results. Second, although the CNN-LSTM network promises high accuracy, the model is slow to train as compared to a model developed using a traditional machine learning algorithm due to the large size of training data that is fed to it. Traditional machine learning approaches rely on manually designed features, compact the raw data into a small number of features after pre-processing, and are much faster to train. Thus, there is a compromise between the time spent in data pre-processing in the case of traditional algorithms and training a deep learning model. However, by utilising computer systems with good software configurations, the deep learning model can be trained in a reasonable time to remain suitable for real time deployment. In this study, we assumed turning points in flat course data and descending and transition points in natural course data as noise and removed them manually by finding frames corresponding to them. These points, however, can be treated as dummy techniques and passed to the model, the study and analysis of which should be taken up as future research work. Last, but not the least, we utilized only the angular velocity due to clear cyclic patterns and for achieving a higher test time efficiency. The development of classification models using linear acceleration and magnetic fields is left as a future research work.

Conclusions
We utilized a novel CNN-LSTM based deep learning approach to develop a unified model for the classification of eight techniques used in classical and skating styles for XC-skiing. Overall, we achieved an accuracy of 87.2% and 95.1% on the flat and natural course test sets using the optimal sensor configuration (five gyroscope sensors: both hands, both feet, and the pelvis). High classification accuracy on both the test sets indicates that this deep learning based approach is very promising for automatic identification and classification of different XC-skiing techniques. The essence of our approach lies in eliminating the need of manually designed features required for traditional machine learning approaches and substituting the video-based and force measurement systems for classification of the XC-skiing techniques. Our model has the potential to be trained in the wild and as data of more skiers is made available, the fine tuning of the parameters will improve the accuracy as well as the scope of generalization continuously. This increases the practical value of our model and makes it suitable for real-time deployment by sports professionals. We optimized for the number of sensors and obtained the sports biomechanics configuration with five sensors as the optimal set, providing empirical evidence to researchers to base their future studies on this optimal configuration.

Conflicts of Interest:
The authors declare no conflict of interest.
Tables A1-A4 show the confusion matrices for the test set of skier 2 when using the sports biomechanics configuration of sensors. Table A1. Confusion matrix for the natural course, classical style validation set of skier 2 when using the sports biomechanics configuration of sensors. Tables A5-A7 show the confusion matrices for the test set of skier 1 when using the sports biomechanics configuration of sensors. Table A5. Confusion matrix for the natural course, skating style validation set of skier 1 when using the sports biomechanics configuration of sensors.