1. Introduction
Running is an increasingly popular sport, with multiple health benefits. Fifty million Europeans engage in running according to a recent estimate [
1]. Health benefits can be psychological, such as a sense of accomplishment [
2], or physical, such as a decreased chance of developing chronic diseases and higher longevity [
3]. However, running comes with associated risks, in particular pain and injuries [
4]. To maximize health benefits and minimize chances of being injured, load should be carefully managed. For instance, overloading and training stress are associated with increased injury risk [
5,
6]. Accurate, continuous detection of fatigue during a high intensity or long duration running activity can be used to provide feedback to runners in order to avoid excessive training stress and overloading which can lead to lower limb injuries.
Fatigue is a multi-factorial phenomenon. A model developed by Kluger et al. [
7] divides fatigue in two distinct components that have the capacity to influence each other: perception of fatigue, caused by homeostatic and psychological factors, and performance fatigability, which is influenced by central and peripheral factors. During running, the human body undergoes shocks due to impacts with the ground. Performance fatigability can be assessed e.g., by means of changes in biomechanical quantities that are related to coping with such shocks. However, fatigue identification and management are commonly based solely on subjective estimates of fatigue that measure perception of fatigue. Subjective estimates of fatigue are very easy to use in practice, but they lack any assessment of performance fatigability.
Inertial measurement units (IMUs) are non-intrusive sensors widely adopted to measure biomechanical changes in human movement. IMUs can record biomechanical parameters continuously, which can show changes due to physical fatigue [
8,
9]. Extensive research has been performed to detect biomechanical changes due to fatigue in running. Hip flexion at initial contact was found to decrease between the start and the end of a fatiguing run [
8,
10]. Maximum knee flexion angle can decrease [
8] or increase [
11] with fatigue depending on different running settings and subject characteristics. Maas et al. [
12] showed that running experience could influence knee flexion, among other biomechanical parameters. Peak tibial (PTA) and peak sacral (PSA) accelerations are also recurring parameters studied in association with fatigue. Reenalda et al. [
13] and Schutte et al. [
14] found an increase in PTA due to fatigue, while Ruder et al. [
15] found a decrease. Reenalda et al. further investigated shock attenuation between the tibia and the pelvis, finding an increase due to fatigue although both PTA and PSA increased as a consequence of fatigue [
13]. Assessment of asymmetry in ankle, knee, and hip kinematics between a rested and fatigued state in running resulted in internal rotation of the knee showing the largest increase in asymmetry with fatigue [
16]. Although significant changes in biomechanics have been repeatedly found when measuring running mechanics with IMUs, it is not clear yet whether these changes are sufficient to reliably detect fatigue over time in real-world applications.
While fatigue detection in running has been based on non-automized detection of changes in biomechanical parameters, machine learning algorithms could have the benefit of rapid and easy application to identify fatigue. Machine learning algorithms could use as an input well-established biomechanical variables, as well as a wide range of statistical variables. Translation of biomechanical changes due to fatigue into machine learning fatigue detection algorithms has been performed in other fields. A clear example of such practice is in the area of industry work. Feeding a wide range of biomechanical parameters into a support vector machine classification algorithm led to a fatigue detection accuracy of 90% in working tasks [
17]. Yet, few studies have focused on the detection of a fatigue condition in sports and running, especially in out-of-the-lab environments. Gholami et al. used machine learning techniques to detect the perceived exertion of runners on a treadmill using textile wearable sensors and assessed the importance of each sensor location, with the hip contributing more than the knee and the ankle to the final coefficient of determination of 0.96 [
18]. Buckley et al. located IMUs at the shanks and lumbar spine and compared three different locations and various machine learning classifiers to detect fatigue in outdoor running, obtaining a 75% accuracy with a single IMU placed at the lumbar spine [
19]. While minimal sensor setups present the unequivocal advantage of being easy to wear, they might be missing substantial biomechanical information to improve fatigue detection accuracy.
Here we aimed to assess the optimal combination of IMU locations at the lower limbs and trunk to detect fatigue levels in an outdoor run with a machine learning classification algorithm. We segmented IMU data into gait cycles and extracted biomechanical and statistical features, labeling data points with fatigue levels identified by means of subjective assessment of fatigue and heart rate (HR). IMU combinations of interest were selected and their fatigue detection performance was compared. It was hypothesized that larger biomechanical changes reported in the literature such as peak tibial accelerations would reflect in the combinations of sensor with higher fatigue detection accuracy. However, we expected statistical features derived from biomechanical quantities to also have a positive impact in the performance of the classifier. Findings of this study aim to assist translating current state-of-the-art knowledge of the biomechanical changes due to fatigue in running into detection of fatigue in real-world scenarios.
3. Results
Table 5 presents the performance of our random forest classification algorithm in detection of the three fatigue levels for the sensor configuration that resulted in highest accuracy from each of the four IMU-setup configuration categories. For the minimally intrusive category, best configuration consists of one IMU placed on the left tibia (Accuracy = 0.761 ± 0.220). For the quasi-minimally intrusive category, best configuration consists of two IMUs placed on the left tibia and the left thigh (Accuracy = 0.867 ± 0.112). For the 3+ IMUs category, best configuration is represented by four IMUs on the left and right tibias and left and right thighs (Accuracy = 0.903 ± 0.085). Whole body configuration of eight IMUs resulted in an accuracy of 0.905 ± 0.081. Confusion matrices with aggregate classification accuracy for all datapoints from all subjects are shown in
Figure 5.
Full assessment of the different sensor configurations for each category is shown by means of confusion matrices in
Appendix A, respectively in
Figure A1 (minimally intrusive configuration),
Figure A2 (quasi-minimally intrusive configuration),
Figure A3 (3+ IMUs configuration) and
Figure A4 (full limbs and whole body configurations). Left limb sensors outperform right limb sensors in the minimally intrusive configurations, although the difference at the tibia segment (best location) is almost negligible. Configurations including at least one knee joint perform better than configurations without a knee joint, both in the quasi-minimally intrusive and 3+ IMUs configurations. Using our experimental paradigm, we find that increasing the number of sensors generally increases performance of the machine learning classification algorithm. However, adding one sensor could also slightly decrease accuracy of the classifier, although this was not observed when the additional sensor also resulted in one additional joint angle.
Table 6 presents the five higher ranked features across participants in the leave-one-subject-out cross validation approach. The most recurring feature is the STD of the tibial pitch angular velocity. Each configuration that includes joint angles presents at least one biomechanical feature derived from a joint angle. Each configuration shows at least one biomechanical and one statistical feature in the best five features. Features from the left limb are predominant in configurations with both limbs present. Spatiotemporal, symmetry and shock attenuation features are not present as best features in any configuration.
4. Discussion
The purpose of this study was to assess the performance of a machine learning algorithm to detect fatigue during a prolonged outdoor run using single or multiple IMUs. We assessed the detection accuracy of selected IMU configurations of interest and the trade-off between higher fatigue classification accuracy and sensor reduction.
4.1. Sensor Location Optimization
We demonstrated in this study how various minimal sensor setups can be able to detect fatigue at satisfying levels when sensor location is chosen wisely. We obtained an accuracy above 76% using a random forest algorithm with only one IMU sensor and 12 features. This is in line with previous studies performed to detect fatigue in running and work tasks. An AUC-ROC of 0.68 [
25] and an accuracy of 75% [
19] were already found in running using a single IMU sensor on the tibia, although with different fatiguing protocols and device types. While single IMU configurations present a clear advantage of a low intrusiveness, we used a structured approach to evaluate fatigue detection performance of IMU setups up to eight sensor locations. We observed that increasing the number of IMU locations from one to two leads to an improvement in accuracy up to 87%, while increasing the number of IMU locations to four leads to an improvement in accuracy up to 90% that remains at the same level when increasing the number of IMU locations to eight.
Fatigue detection accuracy was highest at the tibias, both in minimally intrusive and more intrusive configurations. We expected that the tibias would generate the best fatigue detection performance due to the documented changes in peak tibial acceleration due to running-induced fatigue [
13,
14,
15]. However, the most recurring features with highest importance in the best configurations were linked to the variation of tibial pitch angular velocity in the sagittal plane and acceleration magnitude. Statistical features that are indicative of gait variability were also among the features with highest importance.
IMU configurations with an increasing number of joint angles resulted in an increase of accuracy. For example, the left thigh and foot in minimally intrusive configurations resulted in a low level of accuracy (respectively 60% and 59%). Still, when coupled to the left tibia sensor in a three IMUs configuration they resulted in an accuracy of 87%. Knee and ankle joints resulted in considerably higher accuracies in the quasi-minimally intrusive configurations compared to minimally intrusive configurations without joint angles. However, increasing the number of joint angles from one to two, two to three and three to six resulted in minimal increases in accuracy. This indicates that knee and ankle joints are more suited than hips to detect fatigue in outdoor runs using IMUs. Gholami et al. obtained opposite results using textile wearable sensors to detect fatigue in running, with the hip being the most reliable sensor location and the knee and ankle being less reliable [
18]. However, wearable sensors used in the study measured biomechanical parameters only in the sagittal plane. The trade-off observed between number of sensors and detection accuracy in the present study suggests a sensor setup including only one joint angle, preferably the left or right knee.
IMUs placed on the left lower limb generally resulted in higher fatigue detection accuracy than the right lower limb (e.g., full left lower limb accuracy = 0.900, full right lower limb accuracy = 0.809). Since only recreational runners were included, it was not possible to reliably estimate their dominant leg in running. However, a change of direction was included halfway through the runs in the running protocol to eliminate the effect of running direction on the biomechanics of the left and right leg. Leg dominance had already been found not to have effects on kinematic differences due to fatigue in running [
26]. Further studies are needed to confirm whether the non-dominant leg is better suited for sensor placement when detecting fatigue.
4.2. Machine Learning and Biomechanics
Machine learning has been successfully used in plenty of biomedical applications in recent years. While the amount of publications involving machine learning increases almost every year since the early 2000s [
27], the field of biomechanics is still anchored to salient features such as peak tibial accelerations and peak joint angles. Traditional biomechanical parameters derived by IMU measurements have the advantage of being highly interpretable. However, they might not fully capture some underlying mechanisms such as gait variability due to fatigue. While statistical variables might not show significant differences, they still relate to an underlying running gait variability that is expected to increase with fatigue [
28]. We observed in this study that a machine learning approach to detect fatigue in running has benefited from both statistical and biomechanical parameters, as already shown in studies performed in work scenarios [
29].
Traditional biomechanics focus predominantly on group level averages of salient variables. However, previous studies have shown that changes in specific variables are extremely subject-dependent [
8,
9,
14]. Subject-specific characteristics (e.g., running experience, body morphology, gender, speed) can influence running biomechanics, often not allowing drawing conclusions at a subject level as well as at group level. Machine learning applications in biomechanics have the potential to fill this gap. By applying leave-one-subject-out cross validation, machine learning algorithms can make predictions on subjects that were never observed before. This technique can help identifying biomechanical features that best describe the predicted outcome (e.g., fatigue) on different subjects, improving generalization of the prediction on new subjects.
4.3. Toward Real-World Applications
The deployment of a large scale IMU-based fatigue detection device remains a challenge. The current study aimed to add knowledge in the translation of biomechanical changes due to fatigue into a real-world application. We introduced the use of a moving average in detection of fatigue in running using IMUs. Such technique has two main implications. First, the algorithm would not give live feedback, but a delay of the duration of the moving average (e.g., time to complete one athletic track full lap) would be present. Second, the classification algorithm could be specific for a run on an athletic track. However, running on a track is a very popular option for runners. Further studies should be performed to apply this classifier to other running scenarios (i.e., different durations, intensities, surfaces), although taking into account that IMUs placement and skin displacements could affect the results.
A satisfying level of fatigue detection accuracy for real-world applications is difficult to determine. Every runner is different, and so are runners’ expectations and interactions with a fatigue detection device. While recreational runners could be satisfied with an average fatigue detection score at the end of a run, elite runners would likely require more detailed information about fatigue progression throughout a training session. While a minimal threshold for fatigue detection accuracy cannot be universally established, it is important to remark how a balance between sensitivity and specificity should be pursued. In fact, a low sensitivity would result in a system that cannot be trusted by the runner, while a low specificity would result in many false alarms that can mine the willingness of the runner to use the device.
4.4. Limitations
A first limitation of this study was the definition of fatigue levels. RPE scales have been widely used as a measuring tool for perception of fatigue, since RPE has been already found to be a reliable surrogate for exercise intensity [
30], but they do not represent a gold standard for measurement of physical fatigue. During the last, heavy fatigue condition of our experimental protocol we observed on average a decrease of RPE, although very subject-dependent. This was probably due to a sudden decrease in speed from the previous fatiguing protocol and could have encumbered the classifier ability to distinguish between a mild and heavy fatigue condition. While a change in running intensity could impact perception of fatigue, performance fatigability might still be increasing at a muscle level.
A second limitation of this study is related to the amount of IMU combinations taken into account. It would have been impractical to analyze all 255 combinations deriving from the eight IMU locations chosen in the present study. However, we believe that our set of assessed configurations includes the ones of highest relevance in real-world applications. Configurations with five to seven IMU locations were excluded because they were not expected to differ significantly from the whole body configuration with eight IMUs, while less intrusive sensor setups were mostly reduced to IMU configurations including at least one joint angle.
A third limitation regards the relatively small sample size of eight subjects that participated in the study. Although conducting a similar study with a larger population would present benefits with respect to generalization of the results and would consent to draw appropriate statistical conclusions, the large number of strides generated per subject and the leave-one-subject-out cross-validation approach allowed our classification algorithm to generalize well to unseen data.
4.5. Future Research
Objective assessment of fatigue would have direct benefits for a runner. We suggest validating IMU-based techniques against gold standards in the detection of physical fatigue such as electromyography (EMG) and maximal oxygen consumption (VO2 max). Furthermore, performing IMU-based fatigue detection studies with larger, less homogeneous populations could allow the application of deep learning techniques and help generalizing a fatigue detection algorithm more widely.