Classification of Alpine Skiing Styles Using GNSS and Inertial Measurement Units

In alpine skiing, four commonly used turning styles are snowplow, snowplow-steering, drifting and carving. They differ significantly in speed, directional control and difficulty to execute. While they are visually distinguishable, data-driven classification is underexplored. The aim of this work is to classify alpine skiing styles based on a global navigation satellite system (GNSS) and inertial measurement units (IMU). Data of 2000 turns of 20 advanced or expert skiers were collected with two IMU sensors on the upper cuff of each ski boot and a mobile phone with GNSS. After feature extraction and feature selection, turn style classification was applied separately for parallel (drifted or carved) and non-parallel (snowplow or snowplow-steering) turns. The most important features for style classification were identified via recursive feature elimination. Three different classification methods were then tested and compared: Decision trees, random forests and gradient boosted decision trees. Classification accuracies were lowest for the decision tree and similar for the random forests and gradient boosted classification trees, which both achieved accuracies of more than 93% in the parallel classification task and 88% in the non-parallel case. While the accuracy might be improved by considering slope and weather conditions, these first results suggest that IMU data can classify alpine skiing styles reasonably well.


Introduction
The increasing miniaturization and efficiency of sensing hardware enable novel applications for low-friction activity monitoring. Today, a plethora of wearable consumer devices is available (see [1]). One category of sensors, the inertial measurement unit (IMU), is used in many consumer electronics to track motion and orientation in three-dimensional (3D) space [2]. They are a popular choice for health and sports applications, where they are mainly used to track equipment (e.g., [3,4]) and the human body (e.g., [5,6]).
This work classifies alpine skiing turns into four alpine skiing styles: snowplow, snowplow-steering, drifted and carved based on a global navigation satellite system (GNSS) and IMU data. These four skiing styles can be better understood as falling into one of two broad categories: Parallel and non-parallel. In parallel skiing, both skis act parallel to one another, while during non-parallel skiing, the skis are placed in a wedge [7]. The IMU chips are factory-calibrated. A central requirement of the wearable system is its "plug and play" character; therefore, no further in-field sensor calibration was performed. While we understand that changes in environmental conditions could influence the measurements, we could not discern any differences between experiments and between individual systems, suggesting that external conditions have negligible impact. In addition, GNSS signals of a mobile phone (iPhone 6) were recorded at a sampling rate of 1 Hz.

Data Generation
Twenty advanced skiers were recruited to participate in this study. All participants were either ski instructors or competitive alpine skiers, including three former FIS World Cup athletes. Participants were informed of the testing procedures in detail, including possible risks and benefits of the investigation, prior to signing the consent form as approved by the local ethics committee (EK-GZ. 11/2018). This experiment was conducted in accordance with the Declaration of Helsinki.
In order to construct a dataset containing as many skiing styles as possible, participants completed a series of nine skiing runs, performing at least ten consecutive turns of a given style per run. Video footage of each run confirmed that the participants were skiing according to the instructions.
The target turn radii were defined relative to the width of a snowcat track, which is approximately five meters. During the parallel-style runs (carving and drifting), participants completed one run each of long radius turns (≥three times snowcat track width ≈12 m), medium radius turns (≥two times snowcat track width ≈8 m) and short radius turns (<two times snowcat track width). Participants skied a total of six parallel-style runs.
During the non-parallel-style runs, participants performed a pure snowplow run and a snowplow-steering run in which the turn initiation and turning phases were skied in a snowplow, and the completion phase was skied in the parallel technique [8]. Finally, participants completed a "race" run, where participants were instructed to ski at their highest intensity or maximum The IMU chips are factory-calibrated. A central requirement of the wearable system is its "plug and play" character; therefore, no further in-field sensor calibration was performed. While we understand that changes in environmental conditions could influence the measurements, we could not discern any differences between experiments and between individual systems, suggesting that external conditions have negligible impact. In addition, GNSS signals of a mobile phone (iPhone 6) were recorded at a sampling rate of 1 Hz.

Data Generation
Twenty advanced skiers were recruited to participate in this study. All participants were either ski instructors or competitive alpine skiers, including three former FIS World Cup athletes. Participants were informed of the testing procedures in detail, including possible risks and benefits of the investigation, prior to signing the consent form as approved by the local ethics committee (EK-GZ. 11/2018). This experiment was conducted in accordance with the Declaration of Helsinki.
In order to construct a dataset containing as many skiing styles as possible, participants completed a series of nine skiing runs, performing at least ten consecutive turns of a given style per run. Video footage of each run confirmed that the participants were skiing according to the instructions.
The target turn radii were defined relative to the width of a snowcat track, which is approximately five meters. During the parallel-style runs (carving and drifting), participants completed one run each of long radius turns (≥three times snowcat track width ≈12 m), medium radius turns (≥two times snowcat track width ≈8 m) and short radius turns (<two times snowcat track width). Participants skied a total of six parallel-style runs.
During the non-parallel-style runs, participants performed a pure snowplow run and a snowplow-steering run in which the turn initiation and turning phases were skied in a snowplow, and the completion phase was skied in the parallel technique [8]. Finally, participants completed a "race" run, where participants were instructed to ski at their highest intensity or maximum performance. All trials were supervised by an investigator and repeated if the trial was not completed according to instructions. All data collection occurred between January and March 2019 at three Austrian ski resorts. In order to control for slope conditions, all data collection took place on blue or red slopes. Data were collected in various snow conditions including freshly groomed, icy, soft and up to 10 cm of fresh snow. Table A5 in the Appendix A gives an overview of the snow conditions during data collection. Although not conducted in a systematic manner, this provides a diverse dataset of skiing with which to train a robust classifier.
All participants skied on the same commercially available recreational race skis. Long and medium radius turns were performed on giant slalom skis (Atomic Redster G9, 171/177/183 cm length, 18.6 m radius). Short radius and non-parallel turns were performed on slalom skis (Atomic Redster S9, 155/165 cm length, 12.7 m radius). Participants completed 2-3 runs on each ski prior to testing to familiarize themselves with the test skis.

Data Pre-Processing
The IMU hardware assigned a relative timestamp (i.e., milliseconds since power on) to each data point. In order to synchronize left and right IMU, the smartphone replaced the original timestamp with the phone's timestamp at the time of data arrival minus some offset. This offset was half of the Bluetooth connection's round-trip time, which was continuously re-calculated to avoid clock drift. With both IMU and GNSS data on the same timeline, the application resampled the IMU data together with the speed values at 54 Hz with linear interpolation. In case one of the two IMUs lost connection, the data of the still operational IMU were treated as both the left and right signal until the connection was automatically re-established. If both IMU lost connection simultaneously, the application stopped recording after 60 missed values (~1.1 s). Finally, one trial produced one labeled, multivariate time series of IMU data from both ski boots, synchronized to an absolute timestamp. For further analysis, the data from both accelerometer and gyroscope were filtered with a zero-lag, 4th order, low-pass Butterworth filter with a cutoff frequency of 6 Hz. This filter cutoff was chosen in order to maintain 95% of the signal power.
To give an idea of the signal shape for each style, Figures 2 and 3 show the mean and standard deviation of the time normalized filtered absolute accelerometer and gyroscope signal of all turns within a run (without the first two "warm-up" turns). The data for these figures were extracted from one exemplary subject.
Sensors 2020, 20, x FOR PEER REVIEW  4 of 22 performance. All trials were supervised by an investigator and repeated if the trial was not completed according to instructions. All data collection occurred between January and March 2019 at three Austrian ski resorts. In order to control for slope conditions, all data collection took place on blue or red slopes. Data were collected in various snow conditions including freshly groomed, icy, soft and up to 10 cm of fresh snow. Table A5 in the Appendix A gives an overview of the snow conditions during data collection. Although not conducted in a systematic manner, this provides a diverse dataset of skiing with which to train a robust classifier.
All participants skied on the same commercially available recreational race skis. Long and medium radius turns were performed on giant slalom skis (Atomic Redster G9, 171/177/183 cm length, 18.6 m radius). Short radius and non-parallel turns were performed on slalom skis (Atomic Redster S9, 155/165 cm length, 12.7 m radius). Participants completed 2-3 runs on each ski prior to testing to familiarize themselves with the test skis.

Data Pre-Processing
The IMU hardware assigned a relative timestamp (i.e., milliseconds since power on) to each data point. In order to synchronize left and right IMU, the smartphone replaced the original timestamp with the phone's timestamp at the time of data arrival minus some offset. This offset was half of the Bluetooth connection's round-trip time, which was continuously re-calculated to avoid clock drift. With both IMU and GNSS data on the same timeline, the application resampled the IMU data together with the speed values at 54 Hz with linear interpolation. In case one of the two IMUs lost connection, the data of the still operational IMU were treated as both the left and right signal until the connection was automatically re-established. If both IMU lost connection simultaneously, the application stopped recording after 60 missed values (~1.1 s). Finally, one trial produced one labeled, multivariate time series of IMU data from both ski boots, synchronized to an absolute timestamp. For further analysis, the data from both accelerometer and gyroscope were filtered with a zero-lag, 4th order, low-pass Butterworth filter with a cutoff frequency of 6 Hz. This filter cutoff was chosen in order to maintain 95% of the signal power. To give an idea of the signal shape for each style, Figure 2 and Figure 3 show the mean and standard deviation of the time normalized filtered absolute accelerometer and gyroscope signal of all  turns within a run (without the first two "warm-up" turns). The data for these figures were extracted from one exemplary subject. With respect to the extended activity recognition chain [11], these time series were pre-processed and segmented into individual turns before classification. This was implemented according to Martínez et al. [3,18]. This turn detection algorithm was designed specifically for alpine skiing and is not only accurate and precise but can process live data streams, meaning that turns can be automatically detected during the actual skiing activity without any additional action from the user. The first and last turn of each run were discarded to eliminate potentially atypical acceleration or deceleration. The final enriched and pre-processed data set used for classification consisted of 2063 individual turns (851 drifting, 920 carving, 201 snowplow and 91 snowplow-steering), each with 57 features (after feature extraction; see Section 2.3.2). Figure 4 illustrates the entire classification process. With respect to the extended activity recognition chain [11], these time series were pre-processed and segmented into individual turns before classification. This was implemented according to Martínez et al. [3,18]. This turn detection algorithm was designed specifically for alpine skiing and is not only accurate and precise but can process live data streams, meaning that turns can be automatically detected during the actual skiing activity without any additional action from the user. The first and last turn of each run were discarded to eliminate potentially atypical acceleration or deceleration. The final enriched and pre-processed data set used for classification consisted of 2063 individual turns (851 drifting, 920 carving, 201 snowplow and 91 snowplow-steering), each with 57 features (after feature extraction; see Section 2.3.2). Figure 4 illustrates the entire classification process.

Training and Testing Data
In order to train the classifiers and evaluate their performance, the enriched turn data were split into two subsets (see Figure 5). A total of 75% of all turns were used as the training set with which the models were generated using 5-fold cross validation. The remaining 25% served as test data to validate the trained model, discover possible overfitting and compare the final classification models. The turns were assigned randomly to either the training or the testing data set. As participants were a relatively heterogeneous mix of experts, runs were split record-wise in order to include a high number of heterogeneous participants and in order to avoid an underfitting of the model and a high classification error (see [26][27][28] for discussions on the topic of data splits).

Training and Testing Data
In order to train the classifiers and evaluate their performance, the enriched turn data were split into two subsets (see Figure 5). A total of 75% of all turns were used as the training set with which the models were generated using 5-fold cross validation. The remaining 25% served as test data to validate the trained model, discover possible overfitting and compare the final classification models. The turns were assigned randomly to either the training or the testing data set. As participants were a relatively heterogeneous mix of experts, runs were split record-wise in order to include a high number of heterogeneous participants and in order to avoid an underfitting of the model and a high classification error (see [26][27][28] for discussions on the topic of data splits).

Pre-Classification into Parallel and Non-Parallel
For pre-classification of the turn into parallel and non-parallel, a single decision tree was built with three features selected based on domain knowledge and visual data analysis over all available

Pre-Classification into Parallel and Non-Parallel
For pre-classification of the turn into parallel and non-parallel, a single decision tree was built with three features selected based on domain knowledge and visual data analysis over all available features: the maximum symmetry of the roll axis angular velocity, the maximum symmetry of the yaw axis angular velocity and the maximum absolute roll axis angular velocity. Figure 6 shows the distribution of these three features dependent on the skiing style. The boxplots indicate that the maximum roll axis angular velocity was much higher for parallel (carved and drifted) than for non-parallel turns (snowplow-steered and snowplow). The maximum symmetry of the yaw and roll axis angular velocity was on average smaller for parallel turns than for non-parallel.

Pre-Classification into Parallel and Non-Parallel
For pre-classification of the turn into parallel and non-parallel, a single decision tree was built with three features selected based on domain knowledge and visual data analysis over all available features: the maximum symmetry of the roll axis angular velocity, the maximum symmetry of the yaw axis angular velocity and the maximum absolute roll axis angular velocity. Figure 6 shows the distribution of these three features dependent on the skiing style. The boxplots indicate that the maximum roll axis angular velocity was much higher for parallel (carved and drifted) than for nonparallel turns (snowplow-steered and snowplow). The maximum symmetry of the yaw and roll axis angular velocity was on average smaller for parallel turns than for non-parallel. After training the decision tree with 75% of the training data, the best model was determined to be a two-knot tree based on two final features: maximum absolute roll axis angular velocity (max_TD_AbsRate_Roll) and the maximum symmetry of the roll axis angular velocity (max_TD_Symmetry_Roll). Testing the tree with the 25% remaining test data resulted in an accuracy of 95.85%. Figure 7 shows the decision tree of the pre-classification model visually.

Feature Extraction
As visualized in Figure 4 several features were extracted from each turn's filtered and unfiltered accelerometer and gyroscope signals. The extracted features can be broadly categorized into (i) statistics (mean, max, min, standard deviation) of the average filtered and unfiltered signal of the left and right IMU combined (e.g., the maximum gyroscope roll axis angular velocity of the turn), (ii) symmetry (i.e., absolute distance) between the left IMU and right IMU signal, (iii) features estimated from the phone's GPS (speed and turn size) and (iv) skiing specific features, such as inclination angle [12] and resultant acceleration. those which can be calculated online during the feature extraction step (step 3, Figure 4) of the activity recognition chain.
After training the decision tree with 75% of the training data, the best model was determined to be a two-knot tree based on two final features: maximum absolute roll axis angular velocity (max_TD_AbsRate_Roll) and the maximum symmetry of the roll axis angular velocity (max_TD_Symmetry_Roll). Testing the tree with the 25% remaining test data resulted in an accuracy of 95.85%. Figure 7 shows the decision tree of the pre-classification model visually.

Feature Extraction
As visualized in Figure 4 several features were extracted from each turn's filtered and unfiltered accelerometer and gyroscope signals. The extracted features can be broadly categorized into (i) statistics (mean, max, min, standard deviation) of the average filtered and unfiltered signal of the left and right IMU combined (e.g., the maximum gyroscope roll axis angular velocity of the turn), (ii) symmetry (i.e., absolute distance) between the left IMU and right IMU signal, (iii) features estimated from the phone's GPS (speed and turn size) and (iv) skiing specific features, such as inclination angle [12] and resultant acceleration. Further descriptions of the features extracted and used for classification are listed in Tables A2 and A3 in the Appendix A. The features extracted for this model are limited to those which can be calculated online during the feature extraction step (step 3, Figure  4) of the activity recognition chain.

Feature Selection
In order to decrease the number of features that were used to develop the following skiing style classification models, a preliminary feature selection was applied. Starting with a candidate feature set of 57 features per turn, we performed recursive feature elimination in combination with random forests, as explained by Granitto et al. [29]. In recursive feature elimination, the model is first fit with all candidate features. Then, the features are sorted by their importance in the model. At each iteration of the algorithm, only the top ranked features are kept and the model is refit with this reduced set of candidate features. In this work, models of the size 1, 2, 3, 4, 5, 10, 15 and 20 and all features as input variables were tested and compared via the accuracy metric on a cross-validated data set.

Feature Selection
In order to decrease the number of features that were used to develop the following skiing style classification models, a preliminary feature selection was applied. Starting with a candidate feature set of 57 features per turn, we performed recursive feature elimination in combination with random forests, as explained by Granitto et al. [29]. In recursive feature elimination, the model is first fit with all candidate features. Then, the features are sorted by their importance in the model. At each iteration of the algorithm, only the top ranked features are kept and the model is refit with this reduced set of candidate features. In this work, models of the size 1, 2, 3, 4, 5, 10, 15 and 20 and all features as input variables were tested and compared via the accuracy metric on a cross-validated data set.

Classification Methods
This work compares three different classification approaches and focuses on tree-based algorithms for the classification of the alpine skiing style. In order to identify important features, we focused on algorithms whose explanation and interpretation degree are higher than in deep learning models [30,31]. The proposed methods are either directly explainable or interpretable post-hoc [31], which can therefore identify the distinction of alpine skiing styles' important features: (i) decision trees [32], (ii) random forest [33] and (iii) gradient boosted decision trees [34].
Gupta et al. [35] mention advantages of decision trees, such as the easy interpretability and visualization, the possibility to handle categorical as well as numerical outcomes and the little data preparation that is required.
The random forest developed by Leo Breiman [33] is a bagging method and consists of multiple independent trees. Each tree is grown randomly by a bootstrapped sample of the training set. For each node, a subset of features is selected at random. Their best split is used to split the node [33].
The main idea of boosting is to combine a lot of weak classifiers and increase the accuracy due to this combination [36]. Boosted decision trees are a combination of a lot of weak decision trees and are explained by Hastie et al. [34]. Random forests and gradient boosted trees are slower to construct, but they are usually more robust and have better performance than single decision trees [37].
We generated three models using each of these learning algorithms and our training data set. To generate the simple decision tree, we used a recursive partitioning algorithm [38] which is mainly based on the classification and regression tree (CART) algorithm [32]. For the gradient boosted decision trees, the extreme gradient boosting (xgboost) algorithm [39] was applied.
Several parameters of the learning algorithms, such as the number of trees in a random forest, have to be chosen before model training. In order to find the best parameter setup for each model, cross-validation was used and the model parameters with the highest mean accuracy of the folds were chosen. Table A4 in the Appendix A lists the parameters used for all three algorithms. The feature selection as well as the model fitting were applied separately for parallel and non-parallel turns.

Performance Measures
In order to compare the classification performance of the models, four metrics for comparison were used: Accuracy, sensitivity, specitivity and the geometric mean. Furthermore, confusion matrices for a visual interpretation of the results are provided. In the alpine skiing ski style classification, two outcome classes for each of the two style classification problems exist. An unknown turn can be thus, if parallel, classified into drifting or carving or, if non-parallel, into snowplow-steering or snowplow. As the turn may be classified either correctly or incorrectly, four possible outputs exist for each of the two classification problems. Tables 1 and 2 show the confusion matrices for the two classification cases in general.   Table 3 summarizes the performance metrics that were used for model comparison of the classifications numerically. Performance measures for classfication problems are described in detail by [40][41][42]. In the field of sports analytics, the most commonly used performance metric is accuracy [43]. In addition to accuracy, we also calculate sensititvity and specificity and report the geometric mean, as this measure is less sensitive to imbalanced data than other metrics [42]. Table 3. Performance metrics for model comparison (for the abbreviations, please see Table 1 or Table 2).

Metrics Formula
Accuracy (acc) Accuracy = The algorithms and methods of this paper were calculated with the statistical software R [44]. The algorithms and libraries used for this analysis are listed in Table A1 in the Appendix A of this work.

Feature Selection
The recursive feature elimination of the parallel turns showed that the accuracy of the cross-validated data set increased depending on the number of input features from 0.603 for one feature to 0.928 for all 57 features. As there was no large increase in accuracy between 25 and 57 features used (0.001 points-See Figure A1 in the Appendix A), the final prediction model for the parallel turns was calculated with the most important 25 features, which are listed in Table A2 in the Appendix A of this article.
In the case of the non-parallel turns, the recursive feature elimination process showed (see Figure A1 in the Appendix A) that accuracy was highest with 0.923 in the model containing 20 explanatory features and lowest with 0.752 for the single feature model. The list of the 20 features used for the classification of non-parallel turns is attached in Table A3 in the Appendix A.

Important Features for Classifcation of the Alpine Skiing Styles
The most important variables of the final decision tree for the parallel classification task were the maximum speed per turn, the standard deviation of the absolute roll axis angular velocity of the turn and the standard deviation of the gyroscope roll axis angular velocity of the turn. The non-parallel classification tree consisted of the mean of the maximum estimated inclination angle of the left and right foot of the turn, the mean of the maximum of the acceleration of the X-axis of left and right foot of the turn and the turn duration (see Figures A2 and A3 in the Appendix A). Figures 8 and 9 show the 10 most important features for ski style classification of the gradient boosted tree and the random forest based on importance metrics. These metrics display the contribution of each variable based on the total gain of this variable's splits and the mean decrease in Gini impurity (see [39,45] for descriptions of the metrics). The most important features for the classification models of the parallel turns were speed (maximum, minimum and mean of each turn), the standard deviation of the yaw axis angular velocity and the maximum gyroscope roll axis angular velocity of the turn.   For the non-parallel turn classification, the most important features in the models were the mean estimated inclination angle, the maximum symmetry of the roll axis angular velocity of the turn and the mean yaw axis angular velocity of the acceleration.

Comparison of Model Performance
Tables 4 and 5 summarize the performance metrics of the three models. In the parallel case, the random forest and the gradient boosted decision tree performed similarly well. In the non-parallel case, the accuracy for the test set ranged from 0.822 to 0.890 and the geometric mean from 0.769 to 0.807. Again, as in the non-parallel case, the performance of the random forest and the boosted decision tree was similar, but sensitivity was smaller than specificity. This implies that the two nonparallel skiing styles were not predicted equally well.

Comparison of Model Performance
Tables 4 and 5 summarize the performance metrics of the three models. In the parallel case, the random forest and the gradient boosted decision tree performed similarly well. In the non-parallel case, the accuracy for the test set ranged from 0.822 to 0.890 and the geometric mean from 0.769 to 0.807. Again, as in the non-parallel case, the performance of the random forest and the boosted decision tree was similar, but sensitivity was smaller than specificity. This implies that the two non-parallel skiing styles were not predicted equally well. Tables 6-11 summarize the results of the decision tree, the random forest and the gradient boosted decision tree prediction with confusion matrices visually.  The classification of the parallel turns into carving and drifting with random forest and gradient boosted decision tree showed similar results. The random forest predicted 227 out of 242 carved and 193 out of 201 drifted turns correctly. The gradient boosted tree explained 232 out of 242 carved and 190 out of 201 drifted alpine skiing turns.
In the non-parallel classification task, the performance of the random forest and the gradient boosted tree were again similar. The random forest predicted 11 out of 16 snowplow-steering and 54 out of 57 snowplow turns correctly. The gradient boosted tree also explained 11 out of 16 snowplow-steering and 53 out of 57 snowplow turns.

Classification Performance
The simplest method, a decision tree, is easily interpretable but performed worse compared to the other two classification methods applied in the current study. Both random forest and gradient boosted classification trees achieved accuracies of more than 93% in the parallel classification task (drifted versus carved) and 88% in the non-parallel case (snowplow versus snowplow-steering). This classification accuracy might be improved by considering slope and weather conditions. One further reason for the lower classification accuracy of the non-parallel turns may be due to the smaller sample size of the non-parallel turns available for the training of the algorithm.
Within the class of parallel turns, the drifted and carved turns were equally well predictable. As for non-parallel turns, the snowplow and snowplow-steered turn classifier showed smaller sensitivity than specificity, indicating that snowplow-steered turns were predicted more accurately than snowplow turns. Looking at individual misclassified turns did not imply a unique reason for turn style misclassification. Some misclassified carving turns that were incorrectly classified as drifted turns were slower and smaller turns than the corrected classified carved turns. On the other hand, some misclassified drifted turns that were misclassified as carved showed higher mean estimated inclination angle values than the corrected classified drifted turns. Additionally, in the non-parallel case, misclassified snowplow-steered turns that were wrongly classified as snowplow turns had lower mean and maximum estimated inclination angle values compared to correctly classified snowplow-steered turns.

Limitations
Although the classification models achieved an accuracy between 88.5% and 95.3% for parallel and 82.2% and 89.0% for non-parallel turns, the compared methods have disadvantages. Decision trees based on the CART algorithms may be unstable and split only by one variable [35]. Random forests, on the other hand, are more robust than decision trees but may be slow to construct [37]. Additionally, gradient boosted trees are slow to train and may suffer from overfitting [37].
Currently, the classification is based on a two-step process where the turn is first classified into parallel or non-parallel and then classified again into the detailed skiing styles. Further research may test and compare a one-step classification where the final skiing style is classified in a single step for a more general model.
As reported in the Methods section, the data for the trained model were generated by a limited number of participants and under controlled skiing conditions. Therefore, this algorithm is adequately prepared to classify skiing performed on moderate slopes with limited fresh snowfall. In order to validate the model with other conditions, such as powder snow, bumpy or mogul slopes, the model would need to be tested on additional data sets containing well labeled data from those conditions. In the same context, it is not yet known whether this model will be able to accurately classify skiing styles on various other slopes, such as flatter beginner slopes or extremely steep expert slopes.
As the turns included in this study were generated by intermediate and expert skiers, the classifier has only been validated for those skier abilities. Therefore, we suggest further validation of the algorithm on datasets containing additional skier abilities, including beginner skiers, that also can test the current trained model of possible overfitting.
In order to check the robustness of the suggested models, different random seeds for the splitting of the turns into training and testing data were set, and new models, based on these different random seeds, were developed with the different training sets. These models showed very similar classification performances for the parallel classification. Accuracies were lowest for the decision trees and highest for the gradient boosted trees. The random forest and gradient boosted tree showed accuracies between 92% and 94% for the parallel classification. Additionally, for the non-parallel classification, task accuracies were lowest for the decision tree. However, the accuracies of the non-parallel turns varied between 87% and 95% depending on the different seeds, which is why we conclude that the model for the non-parallel classification is not as robust as the parallel classification model. Although accuracies of the non-parallel classification were always larger than 87%, the models performed worse and were more unstable than the models for the parallel classification. We therefore suggest additional data collection of especially non-parallel skiing turns (snowplow-steering and snowplow) as additional observations of non-parallel turns may improve the model accuracies for the non-parallel classification. Furthermore, instead of considering an isolated, single and independent turn for classification, it might be beneficial for the prediction accuracy to also consider recent turns when classifying the current turn and to account for dependencies between the turns. Applying deep learning methods, such as a convolutional neural network or a long short-term memory network [46] may also improve classification performance.

Application of the Classifier
The results of the proposed classifier are suitable for implementation within a framework such as the extended activity recognition chain (eARC) proposed by Brunauer and colleagues [11]. Previous work by this group has already implemented the preprocessing, segmentation [3,18] and feature extraction [12] steps of the eARC. This work presents a concrete example of the eARC's fourth step-classification-and by doing so furthers the development of a data-driven, automated motor training and coaching system, as suggested by Kos et al. [47]. All algorithms used in the proposed processing chain are capable of analyzing live data streams and produce information about individual turns with little delay.

Sensor Setup
The sensor setup is simple, uses widely available hardware and can be replicated with low cost. It requires neither calibration routines nor any other special attention from the user. However, the proposed classification method uses two IMU sensors simultaneously for capturing left and right boot dynamics, respectively. This setup was able to capture symmetry information, which was shown to be an important factor for classifying the non-parallel styles. Future work could further investigate the relationship of the left and right ski and its impact on skiing performance.

Conclusions
We have presented a classifier based on GNSS and IMU data that is able to distinguish turns into four ski turn styles: drifting and carving (i.e., parallel turns) and snowplow and snowplow-steering (i.e., non-parallel turns). Overall, the gradient boosted decision trees produced slightly better models than the random forest. The prediction accuracy of the best models is over 95% and 89% for the classification of parallel and non-parallel turns, respectively. Nevertheless, we recommend further research that validates the accuracy of the boosted classification model with respect to different snow and slope conditions and various skill levels of the skiers.

S17
hardpack S19 hardpack S20 hardpack S21 hardpack groomed S23 ice S24 refrozen spring snow Figure A1 displays the accuracy measure for the different subset sizes during the feature selection process for the parallel and non-parallel classification tasks.