Foot Strike Angle Prediction and Pattern Classification Using LoadsolTM Wearable Sensors: A Comparison of Machine Learning Techniques

The foot strike pattern performed during running is an important variable for runners, performance practitioners, and industry specialists. Versatile, wearable sensors may provide foot strike information while encouraging the collection of diverse information during ecological running. The purpose of the current study was to predict foot strike angle and classify foot strike pattern from LoadsolTM wearable pressure insoles using three machine learning techniques (multiple linear regression―MR, conditional inference tree―TREE, and random forest―FRST). Model performance was assessed using three-dimensional kinematics as a ground-truth measure. The prediction-model accuracy was similar for the regression, inference tree, and random forest models (RMSE: MR = 5.16°, TREE = 4.85°, FRST = 3.65°; MAPE: MR = 0.32°, TREE = 0.45°, FRST = 0.33°), though the regression and random forest models boasted lower maximum precision (13.75° and 14.3°, respectively) than the inference tree (19.02°). The classification performance was above 90% for all models (MR = 90.4%, TREE = 93.9%, and FRST = 94.1%). There was an increased tendency to misclassify mid foot strike patterns in all models, which may be improved with the inclusion of more mid foot steps during model training. Ultimately, wearable pressure insoles in combination with simple machine learning techniques can be used to predict and classify a runner’s foot strike with sufficient accuracy.


Introduction
Recreational running is a globally accessible activity due to the limited necessity of sport-essential equipment and facilities. Due to its full-body nature, the human anatomical system has many ways to affect running performance. Some factors of paramount importance are joint angles (which thus affect stride length), flight time, and the minimization of lateral force-dissipation [1,2]. The selection of the running shoe also appears to affect performance [3,4], the subjective experience of comfort [5,6], and the injury risk of runners [3,7]. Equipment-based recommendations should include the consideration of a runner's foot strike pattern (FSP) [8]. A midsole design that facilitates the repetitive and comfortable execution of the preferred FSP (i.e., rear foot (RF), mid foot (MF), or fore foot (FF)) can aid the consumer-based shoe selection and recommendation process [3]. Such a recommendation thus requires a reliable method for the discrete classification of a runner's FSP as a prerequisite.
for unbiased variable selection and do not require pruning based on resampling [34]. They are based on conditional inference procedures for testing independence between response and each input variable [34]. Alternatively, robust random forest frameworks encourage accuracy gains with the development of multiple variable-randomized trees [35,36].
Ultimately, to confirm best-practice recommendations, the prediction and classification of foot strike using kinetic sensors should be approached from distinctly different statistical techniques. Thus, the purposes of the current study were to compare the accuracy and precision of i) continuous FSA prediction and ii) FSP classification as calculated from three statistical methodologies (multiple regression, conditional inference tree, and random forest) using independent variables derived from the Loadsol TM pressure insoles.

Participants and Experimental Approach
Thirty injury-free recreational male runners (Mean ± SD; 1.79 ± 0.07 m; 80.1 ± 9.6 kg; 34.0 ± 6.9 yr) provided written informed consent approved by the institutional review board to participate in the study. Participants appeared for one testing occasion where they were asked to perform over-ground running using six types of FSPs at a comfortable speed (average velocity = 2.69 ± 0.40 m·s −1 ). The first condition investigated was their natural running pattern (NA; no constraints), followed by extreme-FF, FF, MF, RF, and extreme-RF in a randomized counterbalanced order. The extreme-FF and extreme-RF conditions were instructed by asking the participants to over-exaggerate their performance of the FF and RF conditions, respectively. Participants were not given any condition-based feedback. All trials were performed with participants running back-and-forth (i.e., shuttle-wise) in a laboratory environment; participants ran a straight distance (5 m) over a force platform located in the center of the straight phase. Participants then quickly changed direction before running the same straight phase. For each participant, 20 non-consecutive left foot-fall instants were recorded per foot strike condition (n = 120 steps). Participants were allowed 5-10 minutes for a self-selected running warm up and familiarization laps were performed before each condition until the participants expressed comfortability with the desired foot strike condition. Importantly, the measured foot falls were labelled as their true pattern or angle, regardless of the condition in which it was performed (see subsequent sections). However, the consistency of participant's performance of the FSA was assessed in a supplementary analysis which boasted generally good consistency [37].

Measurements
Insole pressure, force plate kinetics, and kinematics were recorded for 3,489 foot falls (originally, 120 steps per participant × 30 participants = 3,600 foot falls; however only 3,489 are reported due to collection error or data loss). The pressure measurements were achieved with a two-sensor (fore-aft) wireless insole (Loadsol TM ; Novel GmbH; Munich, Germany) inserted into standardized shoes worn by the participants (Adidas Duramo 6; weight = 280 g., heel drop = 11 mm). The Loadsol TM system was applied over the shoe's insole and recorded at its maximum sampling rate (100 Hz). Kinetic data from a force platform (AMTI; Watertown, MA, USA; BP6001200) and three-dimensional (3D) motion capture was recorded with a Qualysis system (13-camera setup; 2019.3, Göteborg, Sweden) and sampled at 100 Hz to match the maximum sampling rate of the Loadsol TM system. A six-marker anatomical marker set was applied to the left foot segment (over the shoe when necessary); retroreflective markers were secured on the medial and lateral malleoli, the head of the 2nd metatarsal, the heel (placed at the same height as the 2nd metatarsal), the medial side of the 1st metatarsal, and the lateral side of the 5th metatarsal ( Figure 1A,B) [38]. The kinematic and Loadsol TM data were synchronized by aligning the peak force of a stomp measured by the AMTI force platform (data logging with Qualysis) and Loadsol TM at the beginning of each trial.

Data Processing
Initial contact (IC) and toe off (TO) were identified from the Loadsol TM measurements as the frame in which the loading rate of the pressure insoles was greater than 1500 or −1500 Newtons per second, respectively [13]. Ten force and time related variables were extracted from the measurements. Two parameters were calculated from the first third (IC to 33%) and eight parameters from the entire (IC to TO; 100%) stance phase (Table 1). Finally, the FSA was identified for each IC captured. To achieve this, raw kinematic data were filtered using a low-pass 15 Hz filter. Visual 3D ×64 Professional (v6.03.06; Germantown, MD, USA) was used to model the foot segment so that the shoeelicited angulation was negated and the subsequent foot segment angle (in relation to the laboratory coordinate system) was reported [39].

Data Processing
Initial contact (IC) and toe off (TO) were identified from the Loadsol TM measurements as the frame in which the loading rate of the pressure insoles was greater than 1500 or −1500 Newtons per second, respectively [13]. Ten force and time related variables were extracted from the measurements. Two parameters were calculated from the first third (IC to 33%) and eight parameters from the entire (IC to TO; 100%) stance phase (Table 1). Finally, the FSA was identified for each IC captured. To achieve this, raw kinematic data were filtered using a low-pass 15 Hz filter. Visual 3D ×64 Professional (v6.03.06; Germantown, MD, USA) was used to model the foot segment so that the shoe-elicited angulation was negated and the subsequent foot segment angle (in relation to the laboratory coordinate system) was reported [39]. Impulse ratio between the insole sensor and total foot during the entire or first third of the stance phase Ratio of peak force measured from the insole sensor and total foot during the entire stance phase Ratio of peak RFD between the insole sensor and total foot Natural logarithm of the occurrence of the peak RFD (as a stance phase %) Fore % of Stance Ln(%RFD_Fore) Aft % of Stance Ln(%RFD_Aft) RFD = rate of force development; FF = fore foot; RF = rear foot; N = Newton; s = second.

Modeling Approaches
As a pre-requisite for model development, all variables were assessed for normality (i.e., skewness or kurtosis statistic ≤ 2.58). If the assumption of a normal distribution was not met for any of the variables, a natural logarithm transformation was performed to ensure their use was appropriate for parametric statistics (noted in Table 1). The data was then split record-wise into two sets; one was a training data set (70%; n = 2442 steps) and the other a validation ("test" or "hold out") set (30%; n = 1047 steps). This was done to avoid model under-fitting and high classification errors [40][41][42].
Three modelling techniques (multiple linear regression, conditional inference tree and random forest) were then trained using the training data set to predict FSA and to classify FSP from the pressure insole data. For the classification of FSP, all models employed the degree-based ranges defined by Altman and Davis [14] to categorize steps into either FF (FSA < −1. Steps were classified regardless of the trial condition in which they were performed (i.e., the extreme FF and FF conditions were primarily classified as FF strikes, and similarly, extreme RF and RF conditions as RF strikes).

Model Development
First, a parametric stepwise multiple linear regression (MR) to predict the FSA at IC was modelled using SPSS Statistics (SPSS Inc.; Version 26.0, Chicago, IL, USA). Seven significant (α = 0.05) regression equations were developed (F-to-enter ≤ 0.050, F-to-remove ≥ 0.0100), therefore the Akaike Information Criterion (AIC) and Schwartz-Bayesian Information Criterion (BIC) were calculated for each regression to guide model selection for the subsequent comparisons [43]. The resulting model (Equation (1)) retained the lowest AIC and BIC, and highest model fit (R 2 = 0.914, R 2 ADJUSTED = 0.914; p < 0.001; standard error of the estimate = 5.10 • ; df = 2434). The same MR model was used for classification by categorizing the predicted FSA (calculated from Equation (1)) of the validation set according to the previously mentioned FSP ranges [14].
Two conditional inference trees were modeled with the statistical software R ("ctree" function of "partykit" package) [34,44,45]. The two models differed in their outcomes: one model predicted continuous FSA (TREE PRED ), while the other classified FSP (TREE CLASS ; defined classes: RF, MF, FF). For both models, the significance level was set to α = 0.01 (minimum splitting criterion = 0.99). A maximum depth of eight was achieved for TREE PRED , and TREE CLASS achieved a depth of six.
Finally, two random forest models as developed by Breiman [35] were trained using the statistical software R ("randomForest" package) [44,46]. The first model was trained for the purpose of continuous FSA prediction (FRST PRED ) and the second for FSP classification (FRST CLASS ). A large number of trees (n = 500) was selected for the development of the FRST PRED and FRST CLASS models to decrease out-of-bag errors [47]. Variable selection was randomly initialized in order to define candidates for each split. The final models were chosen because they had the lowest root mean squared error (RMSE; FRST PRED ) and the highest mean accuracy (FRST CLASS ) in a 5-fold cross-validation comparison of the different parameter settings ("caret" package) [44,48]. The important variables for the FRST PRED and FRST CLASS can be seen in Figure 2, where high "Mean Decrease Gini" is associated with decreased node impurity, and therefore higher variable importance [49].
Sensors 2020, 20, x FOR PEER REVIEW 6 of 14 variables for the FRSTPRED and FRSTCLASS can be seen in Figure 2, where high "Mean Decrease Gini" is associated with decreased node impurity, and therefore higher variable importance [49].

Model Accuracy and Precision
The models for FSA (MR, TREEPRED, FRSTPRED) and FSP (MR, TREECLASS, and FRSTCLASS) were tested with the remaining validation set (n = 1047 steps). Accuracy and precision metrics were calculated for each of the models using the comparison of the true FSA/FSP (measured with 3D kinematics) and the estimated FSA/FSP (i.e., estimated from Loadsol TM metrics).

Model Accuracy and Precision
The models for FSA (MR, TREE PRED , FRST PRED ) and FSP (MR, TREE CLASS , and FRST CLASS ) were tested with the remaining validation set (n = 1047 steps). Accuracy and precision metrics were calculated for each of the models using the comparison of the true FSA/FSP (measured with 3D kinematics) and the estimated FSA/FSP (i.e., estimated from Loadsol TM metrics).
For model comparison of the three approaches that predicted FSA (MR, TREE PRED , and FRST PRED ), four performance metrics were calculated per recommendations of Galdi and Tagliaferri [50]. These included the mean squared error (MSE), RMSE, mean absolute error (MAE) and mean absolute percentage error (MAPE) of the true versus predicted FSA outcomes. The precision of the prediction models was quantified by calculating the limits of agreement (LoA) and bias of the predicted data set according to Bland and Altman [51]. Specifically, the 95% LoA was calculated using the mean difference (true FSA-predicted FSA) ± 1.96 standard deviations of the differences, and the maximum precision was reported as the difference between the subsequent limits.
Confusion matrices were created for the FSP classification models (MR, TREE CLASS , and FRST CLASS ) utilizing the true classes (measured by kinematic FSA) and the estimated classes (i.e., the class estimated from each model). From these confusion matrices, three metrics were computed as recommended by Galdi and Tagliaferri [50] for model comparison. These included the model accuracy (Equation (2)), classifier recall (Equation (3)), and classifier precision (Equation (4)).
where total correct = number of cases correctly classified, true class = true positives + false negatives of a classifier, estimated class = true positives + false positives of a classifier

Results
Descriptive statistics (mean ± standard deviation) are presented in Table 2 for each of the independent variables of each step according to their FSP class (FF, MF, RF).

FSA Prediction
The Bland-Altman Bias and Precision of the FSA prediction models is shown in Figure 3. Further FSA prediction model accuracy can also be seen in Table 3. In general, the FRST PRED performed with greater prediction accuracy than the MR or TREE PRED . The MR and FRST PRED had minimal biases (MR = −0.01, FRST PRED = −0.11; Figure 3) and the maximum precision of the two methods was less than 15 • (MR = 13.75 • , FRST PRED = 14.30 • ). A larger maximum precision was found for the TREE PRED (19.02 • ).

FSA Prediction
The Bland-Altman Bias and Precision of the FSA prediction models is shown in Figure 3. Further FSA prediction model accuracy can also be seen in Table 3. In general, the FRSTPRED performed with greater prediction accuracy than the MR or TREEPRED. The MR and FRSTPRED had minimal biases (MR = −0.01, FRSTPRED = −0.11; Figure 3) and the maximum precision of the two methods was less than 15° (MR = 13.75°, FRSTPRED = 14.30°). A larger maximum precision was found for the TREEPRED (19.02°).

FSP Classification
The confusion matrices developed for each FSP classification model (MR, TREE CLASS , FRST CLASS ) are displayed in Table 4A. The associated accuracy (Equation (2)), recall (Equation (3) (Equation (4)) results are presented in Table 4B. All models yielded classification accuracies larger than 90% (Table 4B). The MF condition had markedly lower recall and precision than its RF and FF counterparts for all models calculated (Table 4B). Table 4. Confusion matrices are displayed to indicate where correct (white) and incorrect (grey) classifications occurred for three types of classification methods (multiple linear regression, conditional inference tree, and random forest). Matrices are reported for the validation data set that was not included in model training. All models classified foot strikes into three classes: RF = rear foot, MF = mid foot, and FF = fore foot.

Discussion
The purposes of the current study were to compare three statistical techniques used to (i) predict FSA and (ii) classify FSP using independent variables derived from the Loadsol TM pressure insoles. Generally, clear differences in the three foot strike styles were noticeable by similarly stratified independent variables (Table 2), with the exception of the variable PF_Fore. For this variable, the differentiation between FF and MF strike types is not clear. This lack of dichotomy may be a result of speed or flight time inconsistencies during MF strike pattern performance, which is supported by the fact that the MF condition was the most difficult condition for participants to perform [37]. However, the apparent stratification of the independent variables for each strike condition thus confirms the applicability of the fore/aft Loadsol TM sensors to estimate FSA and FSP [26,27]. Supporting this, the MR and FRST PRED models developed for the prediction of FSA were both evidently good fits (MR = 91.4% and FRST PRED = 95.42% of variance explained) and the classification accuracy of FSP for all statistical techniques was greater than 90% (Table 4B).

FSA Prediction
The three models (MR, TREE PRED and FRST PRED ) assessed for FSA prediction had comparable performance when tested using the validation set ( Figure 3 and Table 3). The most important independent variable in the FRST model was RFD_Aft as evidenced by the highest mean decrease in node impurity (Gini; Figure 2). Importantly, RFD_Aft is also a predictor used in the TREE PRED and MR models. However, the specific variable importance of RFD_Aft in the MR (via beta coefficients) cannot be interpreted because the model violated the assumption of collinearity [52]. Collinearity considered, the overall prediction of the model should be unaffected [52].
The linear approach of the MR as suggested by Fritz and colleagues [27] appears to be appropriate to generally explain the variance of the FSA (R 2 = 0.914). In a similar application, a univariate linear regression to determine strike index via the onset time difference of a fore and aft pressure sensor resulted in a lower coefficient of determination (R 2 = 0.836) [26]. Although participants were asked to perform RF, MF, and FF foot strikes, Cheung and colleagues [26] did not carry out further analyses to confirm the performance of the FSPs or if there was a stratified model fit. Importantly, a strong linear relationship between the strike index and 3D FSA kinematics is supported in literature, however the relationship appears to be driven primarily by FF and RF strike types [14,16]. Upon visual inspection, those foot strikes that fell closer to the MF range of FSA had the largest standard errors [16]. A similar visual phenomenon is seen in the current study's data, however the more extreme FF and RF also appear to be indicative of greater prediction errors (Figures 3 and 4). The methodological inclusion of the extreme FF and RF conditions in the current study make it possible to see the potential that there are two linear relationships (Figure 4). Thus, greater accuracy in FSA prediction using MR may be gained from developing a model for the RF and FF FSPs independently. considered, the overall prediction of the model should be unaffected [52].
The linear approach of the MR as suggested by Fritz and colleagues [27] appears to be appropriate to generally explain the variance of the FSA (R 2 = 0.914). In a similar application, a univariate linear regression to determine strike index via the onset time difference of a fore and aft pressure sensor resulted in a lower coefficient of determination (R 2 = 0.836) [26]. Although participants were asked to perform RF, MF, and FF foot strikes, Cheung and colleagues [26] did not carry out further analyses to confirm the performance of the FSPs or if there was a stratified model fit. Importantly, a strong linear relationship between the strike index and 3D FSA kinematics is supported in literature, however the relationship appears to be driven primarily by FF and RF strike types [14,16]. Upon visual inspection, those foot strikes that fell closer to the MF range of FSA had the largest standard errors [16]. A similar visual phenomenon is seen in the current study's data, however the more extreme FF and RF also appear to be indicative of greater prediction errors ( Figures  3 and 4). The methodological inclusion of the extreme FF and RF conditions in the current study make it possible to see the potential that there are two linear relationships (Figure 4). Thus, greater accuracy in FSA prediction using MR may be gained from developing a model for the RF and FF FSPs independently.  Importantly, the midfoot and more extreme RF strikes are not as well predicted by the MR than by the FRST PRED ( Figure 3). However, both models exhibit higher numbers of residuals outside of the Bland-Altman limits of agreement at the extremes of FF foot strike pattern. Additional proportional bias may be evidenced in the extreme FF range of the MR. However, because these extreme foot strike patterns were considered "exaggerated" to the participants (as was their instruction), the bias present there may not influence the practical application of such models. Further, the stratification seen in the TREE PRED Bland-Altman makes it apparent that it's use for continuous FSA prediction is limited to the number of outcomes (i.e., maximum tree depth) included in the model (Figure 3). Ultimately, the TREE PRED appears to be better suited for discrete classification problems, whereas the FRST PRED is arguably the most appropriate model for prediction problems that include a large range of FSAs or number of MF strikes.

FSP Classification
Although the overall classification accuracy of the MR was greater than 90%, the MF strike was only properly classified with 38% recall (Table 4). Conversely, TREE CLASS and FRST CLASS classified the MF strike with approximately 73% and 75% recall (Table 4). This is similar to the findings of Delgado-Gonzalo and colleagues [53], who found that the MF condition was classified with the least recall and precision using accelerometer-based inputs. Importantly, the MF strike pattern in the current study may have been classified with the least accuracy because it had the least number of samples in the training set (MF = 197, RF = 1495, FF = 650). Supporting this theory, the RF pattern classified with the highest recall in the MR and TREE CLASS methods (98-99%) and was the greatest sample contributor in the training set. Further, the most important variable for the FRST CLASS was RFD_Aft (Figure 2), which is consistent with the first splitting node variable of TREE CLASS . The models may be best suited to distinguish between RF and FF strikes primarily due to the lack of independent variable or sensor differentiation regarding the middle region of the foot. Thus, a three-part sensor insole that highlights the central region of the foot (thus allowing a variable such as the mid-region rate of force development) may be better suited for MF classifications. However, Lieberman and colleagues [9] found that habitually shod runners primarily perform RF strike patterns, therefore the current models should serve recreational runners well.
For populations of shod runners who have consciously altered or retrained their running foot strike pattern (i.e., those investigated by Cheung and Davis [11]), the higher accuracy of the FRST CLASS may provide further confidence in the MF classifications. However, the future use of simple methods like the MR or TREE CLASS methods should not be discounted because equal class sizes in the training set may improve the recall of MF classifications and overall model accuracy.

Application
The results of the current study support that a two sensor (fore and aft) pressure insoles can be used to predict and classify foot strike with sufficient accuracy. Compared to previous works with the aim of estimating FSA using IMU sensors [15], the current results boast lower bias when compared to a reference 3D motion capture camera system (FRST PRED of current study = −0.11 • vs. IMU = 3.9 • ) and only slightly worse precision (FRST PRED of current study = 14.30 • vs. IMU = 10.6 • ). This raises the potential of an insole sensor to provide the holistic pairing of kinetic and kinematic information regarding performance and injury indicators during running. An ankle joint torque MR prediction model has already been developed with adequate accuracy (R 2 ADJUSTED = 0.831, RMSE = 6.91 Nm) using the independent input of 99-sensor pressure insoles [54]. Further, vertical GRF from pressure insoles have been used to predict the 3D GRF components using MR and Artificial Neural Networks, supporting that power and injury related variables can be considered a possibility via simple wearable sensors [55]. From an application standpoint, although many independent variables are used in the models of the current study, they all are derived from a single system. The use of a single system thus reduces the necessity of the synchronization and additional processing power of a supplementary system. A larger range of running conditions could be studied in the future, which may allow for the reduction of independent variables and further encourage the potential to transition toward "real time" foot strike pattern and angle detection.
The current study thus lays the framework for FSA and FSP detection in insoles with larger numbers of sensors (like those used by Billing et al. and Fong et al. [54,55]). This framework may be useful in the push to define and detect running power accurately. The calculation of power during running is a controversial topic due to the complexity of the human biomechanical system, and many of the current commercial systems do not have proven validity in calculating the metric [56]. Therefore, a kinetic approach may exceed current IMU-based calculation methods (i.e., Stryd TM foot pods) due to the immense information a multi-sensor pressure insole can provide.

Conclusions
The current study supports the feasibility of two-sensor pressure insoles to detect FSA and FSP, and therefore aids in the research and coaching of running movements, as well as consumer-based shoe prescription. Simple machine learning techniques can be used to predict and classify runners' foot strike patterns with accuracies greater than 90%. However, foot falls that are a true MF strike are incorrectly classified more often than RF or FF strikes by these methods. A greater accuracy can be accomplished with the application of a more complex machine learning technique like a FRST. The current study was limited in its collection of MF steps, therefore more MF steps or using over-or under-sampling techniques may improve the classification of the MF pattern in the future. Further, the machine learning techniques should be applied to running with higher ecological validity that encompasses variable metabolic intensities (i.e., speeds), and limited changes of direction. Funding: No funding was received for this work from the National Institutes of Health, the Welcome Trust, the Howard Hughes Medical Institute, or other funding agencies to PubMed Central. Funding was provided based on the Salzburg "Trans-4-Tech" project "Sport Sense" and by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry for Digital and Economic Affairs, and the federal state of Salzburg under the research program COMET-Competence Centers for Excellent Technologies-in the project Digital Motion in Sports, Fitness and Well-being (DiMo).