Modeling Physiological Predictors of Running Velocity for Endurance Athletes

Background: Properly performed training is a matter of importance for endurance athletes (EA). It allows for achieving better results and safer participation. Recently, the development of machine learning methods has been observed in sports diagnostics. Velocity at anaerobic threshold (VAT), respiratory compensation point (VRCP), and maximal velocity (Vmax) are the variables closely corresponding to endurance performance. The primary aims of this study were to find the strongest predictors of VAT, VRCP, Vmax, to derive and internally validate prediction models for males (1) and females (2) under TRIPOD guidelines, and to assess their machine learning accuracy. Materials and Methods: A total of 4001 EA (nmales = 3300, nfemales = 671; age = 35.56 ± 8.12 years; BMI = 23.66 ± 2.58 kg·m−2; VO2max = 53.20 ± 7.17 mL·min−1·kg−1) underwent treadmill cardiopulmonary exercise testing (CPET) and bioimpedance body composition analysis. XGBoost was used to select running performance predictors. Multivariable linear regression was applied to build prediction models. Ten-fold cross-validation was incorporated for accuracy evaluation during internal validation. Results: Oxygen uptake, blood lactate, pulmonary ventilation, and somatic parameters (BMI, age, and body fat percentage) showed the highest impact on velocity. For VAT R2 = 0.57 (1) and 0.62 (2), derivation RMSE = 0.909 (1); 0.828 (2), validation RMSE = 0.913 (1); 0.838 (2), derivation MAE = 0.708 (1); 0.657 (2), and validation MAE = 0.710 (1); 0.665 (2). For VRCP R2 = 0.62 (1) and 0.67 (2), derivation RMSE = 1.066 (1) and 0.964 (2), validation RMSE = 1.070 (1) and 0.978 (2), derivation MAE = 0.832 (1) and 0.752 (2), validation MAE = 0.060 (1) and 0.763 (2). For Vmax R2 = 0.57 (1) and 0.65 (2), derivation RMSE = 1.202 (1) and 1.095 (2), validation RMSE = 1.205 (1) and 1.111 (2), derivation MAE = 0.943 (1) and 0.861 (2), and validation MAE = 0.944 (1) and 0.881 (2). Conclusions: The use of machine-learning methods allows for the precise determination of predictors of both submaximal and maximal running performance. Prediction models based on selected variables are characterized by high precision and high repeatability. The results can be used to personalize training and adjust the optimal therapeutic protocol in clinical settings, with a target population of EA.


Introduction
The benefits of regular physical exercise are widely debated and include reducing the risk of obesity [1] or cardiovascular diseases [2]. On the other hand, improperly performed training with excessive intensity may negatively affect the organism's homeostasis and increase the risk of injury [3].

Materials and Methods
The TRIPOD guidelines [18] have been applied to this research (see Supplementary Material S1, TRIPOD Checklist for Prediction Model Development and Validation). This was a retrospective data analysis from the registry of CPET performed in the years 2013-2021 at a tertiary care sports diagnostic clinic (SportsLab Clinic, Warsaw, Poland, www.sportslab.pl).

Ethical Approval
The study protocol was approved by the Institutional Review Board of the Bioethical Committee at the Medical University of Warsaw (AKBE/32/2021) and met the necessary regulations of the Declaration of Helsinki. Mandatory written consents to undergo incremental CPET were obtained from each EA before participating in the study. Written informed consent for participation was not required for this study, in accordance with the national legislation and the institutional requirements

Study Design
Participants were endurance runners who underwent CPET on the treadmill (TE).

Somatic, [La − ] b Measurements, and CPET Protocol
First, body mass (BM) stratified by body fat (BF) and fat-free mass (FFM) measurements were obtained via the bioimpedance method (BIA) using a body composition (BC) analyzer (Tanita, MC 718, Tokyo, Japan) with the multifrequency of 5 kHz/50 kHz/250 kHz. Conditions during BC and CPET were the same: 40 m 2 indoor, air-conditioned area, 40-60% humidity, temperature 20-22 • C, altitude 100 m ASL, and the subjects had their skin cleaned before testing. In our standardized laboratory practice, each EA had received recovery and dietary instructions via email a few days prior to testing to enable them to prepare appropriately for the CPET and BC tests. Our recommendations included: eating a high carbohydrate meal 2-3 h before the CPET and staying hydrated with sports drinks, and female EAs were advised to be well beyond their menstrual phase [22]. They also received information stating that the CPET would be performed on a mechanical TE and that they should be familiar with the characteristics of this type of effort, as well as the running technique involved.
Running tests were performed on a mechanical TE (h/p/Cosmos quasar, Nussdorf-Traunstein, Germany). CPET indices were measured using the breath-by-breath method during 15 s intervals [23], utilizing a Hans Rudolph V2 Mask (Hans Rudolph, Inc., Shawnee, KS, USA), a gas exchange analyzing device Cosmed Quark CPET (Rome, Italy), and specialized software (Quark PFT Suite powered by Omnia 10.0E). The gas analyzer device was calibrated prior to the testing protocol (16% O 2 ; 5% CO 2 ; ventilation accuracy ±2% or 100 mL·min −1 ). The analyzer measurement mode takes into account the manufacturer's standard settings, i.e., 3-step smoothing and removing erroneous breaths from the analysis. HR was measured through the ANT+ and torso belt as a part of the Cosmed Quark set (accuracy similar to ECG; ±1 beats·min −1 ). [La − ] b was examined using a Super GL2 analyzer (Müller Gerätebau GmbH, Freital, Germany) employing an enzymatic-amperometric electrochemical technique. The lactate analyzer was also calibrated before each round of analysis for each participant.
CPET began with a 5 min preparatory protocol (walking or slow running at a declared "conversation" pace). The primary speed was 7-12 km·h −1 at a 1% inclination (the differences in the starting pace resulted from the training level of the participants and were selected on the basis of an interview on their previous sports results). The pace was increased by 1 km·h −1 every 2 min. VO 2 or HR plateau (no increase in VO 2 or HR with an increase in CPET intensity) or volitional inability to maintain intensity was the moment when the test was terminated [23,24]. Subjects were encouraged verbally to make a maximum effort. HR was considered the highest value at CPET (not averaged). Maximal VO 2 was recorded as an average from stable VO 2 in 10 s intervals directly before the termination of the CPET [23,24]. AT and RCP were assessed via non-direct methods based on the ventilatory concept. AT was achieved if the following measures were fulfilled: (1) VE/VO 2 curve started to grow with the constant VE/VCO 2 curve and (2) end-tidal partial pressure of O 2 started to grow with the constant end-tidal partial pressure of CO 2 [25]. RCP was achieved if the following measures were fulfilled: (1) a reduction in partial pressure of end-tidal CO 2 (PetCO 2 ) after attaining a maximal intensity; (2) a fast nonlinear growth in VE (second deflection); (3) the VE/VCO 2 ratio achieved the lowest value and started to grow; and (4) a nonlinear growth in VCO 2 versus VO 2 (linearity divergence) was achieved [25]. [La − ] b was assessed by obtaining a 20 µL blood sample from a fingertip: before the test, after any speed increase, and 3 min after termination. A sample for [La − ] b analysis was taken during running without interruption or pace decrease. Each time, the sample was from the same initial puncture. The first few drops were drained onto a swab and the proper blood sample was drawn. In further analysis, the corresponding values of [La − ] b for AT, RCP, and maximal VO 2 were determined.

Data Analysis and Predictors Selection
Data were saved into an Excel file (Microsoft Corporation, Redmond, WA, USA) and Python script. Further, they were calculated according to frequency (percentage) and mean (±standard deviation; SD, or 95% confidence intervals; CI) for continuous variables and the median for categorical variables. Intergroup differences (each was a continuous variable) were calculated using the Student t-test for independent variables. If there were lacking data (only for [La − ] b ; in 1190 cases for males and 266 cases for females in total), imputation with the random forest method (RF) was applied to fill in the gaps [26,27].
The XGBoost machine learning approach was used to select variables with the highest prediction value [28]. In order to select the variables, the population was divided into 3 groups: 60% for derivation (building group), 20% for testing (testing group), and the remaining 20% for validation (validation group). After selection, 11 variables were included in the further analysis: VO 2max , VO 2RCP , VO 2AT , [La − ] bRCP , [La − ] bAT , VE max , VE RCP , age, BM, BMI, and BF. Next, selected parameters were input into multiple linear regression (MLR) modeling. As a result, only significant predictors (with p < 0.05) were included in the final models. The derived equations are characterized by the coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE). A 10-fold crossvalidation technique [29] and the Bland-Altman plots analysis [30] were used to establish the model's precision and accuracy during internal validation. To clarify, in the 10-fold crossvalidation, the population is divided into 10 random parts. The candidate model is built on [10 − 1 = 9] training sets; then, the derived model is evaluated on the test set consisting of the remaining one part. By respectively conducting building procedures on training sets and validation on the test set 10 times, we chose the final formula with the lowest possible inaccuracy validation score (defined in this paper as the lowest RMSE and MAE) [29]. Other implemented tests to reach the complete fulfillment of MLR modeling requirements include Ramsey's RESET test (for the correctness of specificity in MLR equations), the Chow test (for stability assessment between different coefficients), and the Durbin-Watson test (for autocorrelation of residuals). Each model was examined under the above-mentioned requirements and any irregularities have not been noted.
Our comprehensive machine learning approach enables the evaluation of each formula according to preliminary variable precision (at the stage of selection), accuracy (during model building), and recall (in internal validation).

Somatic Measurements and CPET Results
The participants' anthropometric data are presented in Table 1. The full population consisted of 4001 people, of which 3330 (83.23%) were male and 671 (16.77%) were female. All data showed a normal distribution. The mean age was 35.90 ± 8.15 years for males and 33.86 ± 7.74 years for females and the overall age ranged from 18 up to 74 years. BMI was 24.07 ± 2.44 kg·m −2 for men, while women had 21.64 ± 2.38 kg·m −2 . BF percentage was relatively low, estimate at 15.49 ± 4.53 in males and 22.04 ± 5.46 in females. Significant differences between genders has been observed for height (p < 0.0001), BM (p < 0.0001), BMI (p < 0.0001), BF (p < 0.0001), and FFM (p < 0.0001).

Prediction Models for V AT , V RCP , and V max
Complete MLR prediction models for males and females are presented in Table 3 (left columns), while Figure 2 shows their performance in the derivation cohort (illustrated as an analysis of observed vs. predicted values). The importance of all CPET variables, based on XGBoost selection, included in the modeling is presented in Figure 3. The following variables showed the strongest impact in building the models: VO 2 , [La − ], VE age, and BMI. Model performance is presented as R 2 , along with RMSE and MAE. Briefly, R 2 for male equations ranged from 0.57 for V AT and V max to 0.62 for V RCP . For female formulae, R 2 ranged from 0.62 for V AT to 0.67 for V RCP . The obtained RMSE was the lowest for the female V AT equation (=0.828) and the highest for the male V max (=1.202), while the observed MAE was the lowest for the female V AT equation (=0.657) and the highest for male V max (=0.944). Model performance at the stage of derivation has been shown in the left columns. Briefly, our equations showed high accuracy and explained approximately 60-70% of the differences between participants. The results of internal validation via the 10-fold cross technique are presented in the right columns, and they showed a precise transferability, despite a limited sample size for internal validation. We are presenting one R 2 because of the 10-fold cross-validation for the same group of participants as the derived validation.

Internal Validation
The evaluation of each model is also presented in Table 3 (right columns). In summary, the performance of our prediction equations was similar to that observed in the derivation cohort. A slightly higher RMSE and MAE were noted. Overall, RMSE values are located between 0.838-1.205 km·h −1 and MAE between 0.665-0.944 km·h −1 . The best working model (defined as having the highest replicability and the lowest risk of inaccuracies in the test set) was for V AT for females (RMSE = 0.838, MAE = 0.665), and the worst was for males V max (RMSE = 1.205, MAE = 0.944). The most and least accurate models were the same in regards to the derivation and validation. Figure 4 illustrates the Bland-Altman plots, with a comparison of observed vs. predicted velocity using newly derived prediction models at the stage of validation.
at the stage of derivation has been shown in the left columns. Briefly, our equations showed high accuracy and explained approximately 60-70% of the differences between participants. The results of internal validation via the 10-fold cross technique are presented in the right columns, and they showed a precise transferability, despite a limited sample size for internal validation. We are presenting one R 2 because of the 10-fold cross-validation for the same group of participants as the derived validation.  Performance of novel prediction equations for treadmill velocity. Abbreviations: VAT, velocity at anaerobic threshold; VRCP, velocity at respiratory compensation point; Vmax, maximal velocity. Colored dotted lines illustrate a 1:1 correspondence between measured and predicted velocities, respectively green for males (left row; A-C panels) and red for females (right row; D-F panels).

Figure 3.
Heat map showing the importance variables regarding predicted velocity based on XGBoost selection. Abbreviation: VO2max, maximal oxygen uptake; VO2RCP, relative oxygen uptake at respiratory compensation point; VO2AT, relative oxygen uptake at anaerobic threshold; [La − ]bRCP, blood lactate concentration at respiratory compensation point; [La − ]bAT, blood lactate concentration at anaerobic threshold; [La − ]bmax, maximal blood lactate concentration; VEmax, maximal pulmonary ventilation; VERCP, pulmonary ventilation at respiratory compensation point; VEAT, pulmonary ventilation at anaerobic threshold; RERmax, maximal respiratory exchange ratio; RERRCP, respiratory exchange ratio at respiratory compensation point; RERAT, respiratory exchange ratio at anaerobic threshold; HRmax, maximal heart rate; HRRCP, heart rate at respiratory compensation point; HRAT, heart rate at anaerobic threshold; BF, body fat; FFM, fat-free mass; BMI, body mass index; VAT, velocity at anaerobic threshold; VRCP, velocity at respiratory compensation point; Vmax, maximal velocity. Panel (A) presents data for males, while panel (B) shows the data for females. The cross means that the variable has not fulfilled preliminary selection-stage requirements (only in HR). The maps present a variable's importance regarding the predicted velocity during the model-building stage. In the final prediction models, only the variables with significant impact (p < 0.05) were included. ] bmax , maximal blood lactate concentration; VE max , maximal pulmonary ventilation; VE RCP , pulmonary ventilation at respiratory compensation point; VE AT , pulmonary ventilation at anaerobic threshold; RER max , maximal respiratory exchange ratio; RER RCP , respiratory exchange ratio at respiratory compensation point; RER AT , respiratory exchange ratio at anaerobic threshold; HR max , maximal heart rate; HR RCP , heart rate at respiratory compensation point; HR AT , heart rate at anaerobic threshold; BF, body fat; FFM, fat-free mass; BMI, body mass index; V AT , velocity at anaerobic threshold; V RCP , velocity at respiratory compensation point; V max , maximal velocity. Panel (A) presents data for males, while panel (B) shows the data for females. The cross means that the variable has not fulfilled preliminary selection-stage requirements (only in HR). The maps present a variable's importance regarding the predicted velocity during the model-building stage. In the final prediction models, only the variables with significant impact (p < 0.05) were included. are located between 0.838-1.205 km·h −1 and MAE between 0.665-0.944 km·h −1 . The best working model (defined as having the highest replicability and the lowest risk of inaccuracies in the test set) was for VAT for females (RMSE = 0.838, MAE = 0.665), and the worst was for males Vmax (RMSE = 1.205, MAE = 0.944). The most and least accurate models were the same in regards to the derivation and validation. Figure 4 illustrates the Bland-Altman plots, with a comparison of observed vs. predicted velocity using newly derived prediction models at the stage of validation.

Discussion
In the current study, we applied advanced machine-learning properties in a comprehensive evaluation of running physiology. The obtained equations include several physiological-only measures (both anthropometric and directly measured during CPET) to provide a feasible utility for the prediction of V AT , V RCP , and V max with substantial accuracy. The availability of this type of machine-learning tool in exercise diagnostics enables better training recommendations for EA and facilitates rehabilitation prescriptions for patients suffering from cardiovascular or respiratory diseases [7,31]. The novelty and main advantage are that there are no comparable studies that first select the variables with the strongest predictive abilities, and then directly evaluate their accuracy in the derived prediction models. An additional attribute is a relatively large group of healthy adult EA (n = 4001) who have undergone the CPET under an identical protocol, by which the maximum precision and similarity of the collected data were obtained. This enabled us to better examine whether parameters such as age [32], BC and BF [33] or VO 2max [16] exerted a possible significant impact on the predictive performance of the model (as they were previously classified as relevant variables in the literature. Moreover, the inclusion criteria enable us to avoid the disturbing influence of factors such as smoking [34] or medications [35].

Model Performance and Physiological Properties
Performance measurements show precise prediction abilities which were fairly replicable between the training and test sets (see Figures 2 and 3). The obtained R 2 explained approximately 60% to 70% of the differences, while errors were moderate-to-low, under 1 km·h −1 for most cases. With additional internal validation, they were both still located in the upper sensitivity range. Thus, the model accuracy was only minorly reduced. In previous publications, such as that by Petek et al. for VO 2peak [36], similar results were observed. However, usually, previous researchers have not carried out an initial selection of the most suitable variables, and so far, studies have been based on previously established parameters, only changing their proportions. Our study showed that VO 2 at RCP and maximal VO 2 were the most important parameters responsible for the prediction of middleto long-distance running velocity (a lower impact of VO 2 at AT was noted). This confirms previous findings by Thompson et al. and Lanferdini et al. [16,37] that the VO 2 can be described as the universal and comparable performance measure, and that it is strongly related to running speed. Moreover, according to the physiological relationship between exercise performance and [La − ] b at AT, at RCP, and maximal VO 2 , they also significantly influence the predicted velocities (but in the varied order compared with VO 2 , with more impact from sub-maximal levels at AT or RCP than maximal [La − ] b values). This is confirmed in studies by Tanaka and Matsuura [12] and Schabort et al. [19], as growing [La − ] b and training intensity were positively correlated in both. Thus, of greater improtance seems to be the ability to rapidly utilize and prevent excess growth in [La − ] b by EA than working at maximal value for a prolonged time. Our study confirmed the previous findings by Farrell et al. [38] on this point. Another important variable was pulmonary ventilation. The majority of the influence was created by VE RCP , and only for V max, was there a significant impact of VE max . The higher it was, the better running velocity was observed. Thus, it can be concluded that the higher oxygen (O 2 ) supply and better carbon dioxide (CO 2 ) utilization yielded an improvement in running performance. This is a well-documented concept that was stated by Sjodin and Svedenhag in the 1980s [32]. Our insights on both VO 2 and VE also confirmed that performance at RCP is strongly correlated with other running and general exercise indices [15]. When it comes to somatic parameters, they also showed a relevant effect on velocity. Higher BMI [19] and increasing age [39] were associated with lower endurance performance. On the other hand, BC, described as a percentage of BF and FFM, showed some effect on the predicted velocity, despite their impact on males being not enough to be included in the modeling for this gender. It is worth mentioning that the influence of BF was more noticeable in females, perhaps because they naturally have a higher level of BF [40]. HR was one of the variables with the lowest impact on velocity (see Figure 3). Moreover, we emphasize that HR, which shows high inter-individual variability and is difficult to precisely estimate for EA [21], was not included in any of our equations. To summarize, the degree of the relationships between the variables is interesting. It is very promising to assess how precisely we can estimate V AT , V RCP , and V max based on the above-mentioned parameters.

Clinical Considerations
Our results also have important clinical applications for patients from the general and athletic populations. The development of sports cardiology has resulted in a higher number of EA patients, including former cardiac patients or those suspected of having exertional cardiac abnormalities. TE CPET is often performed to some level of submaximal intensity or until refused. However, those who are less experienced may quit earlier, before reaching their optimal diagnostic intensity level, because they are not mentally adapted to perform such demanding activities [31,41]. The calculation (MET × running velocity) is used by medical professionals to provide personalized recommendations for cardiac rehabilitations [31]. Selection of the most important variables and additional comparison of those directly achieved with the predicted velocity verify whether an optimal level of intensity was achieved.

Practical Applications
The characteristics of selected variables and prediction models could be used in the preparation of exercise recommendations for both professional and recreational EA as patients in clinical settings [7]. The highest accuracy of the observed repeatable values would be for EA, mainly for running activities (i.e., during long-distance running or football), due to the characteristics similar to those in the derivation cohort [36]. Thanks to the use of V AT , V RCP , and V max prediction models, there would be no need to run the full CPET protocol and measure all parameters, but only the most significant and contributing ones [19]. This is a matter of importance, as CPET is often impossible to perform according to the full protocol due to the limited availability of specialized clinics and equipment or the high cost of the procedure [42]. This model can also be used to verify/assess whether the athlete obtains sufficient running speed on the basis of the directly measured parameters. Of course, it currently would not be the gold standard or method of choice. Thus, results should be generalized carefully. However, they could be used as a valuable supplement to direct measurements in the present. We encourage other researchers to test our velocity prediction models and evaluate the proportion of the obtained variables using different populations to assess to what extent the results can be extrapolated and transferred.

Limitations
A possible limitation is that participants underwent CPET in different phases of the day (circadian rhythm), month (menstrual cycle for female athletes), or season [43,44]. Moreover, we did not evaluate the training volume of the EA. The participants received dietary and preparation tips, but we cannot be sure that they were rigorously implemented; thus, some BIA results for BC should be analyzed with caution. Some data in [La − ] b were missed (not all participants decided on the [La − ] b test because it was an optional variable in the clinic's CPET portfolio) and RF imputation was applied. RF is recognized as the best method for filling data gaps, and our imputation did not cause a significant negative effect on the [La − ] b data precision. The models still showed high prediction abilities at the building and validation (i.e., out-of-bag error) stages. A comparison of both datasets (first set only with directly measured [La − ] b and second only with imputed [La − ] b ) did not show significant differences between them (p = 0.4) [26]. Volunteers individually declared the intensity level on the Borg scale, and the evaluation could differ between participants. The above limitations result from the specifics of the study, which is population-based, and not a controlled trial. In order to minimize their importance, the above-described internal validation was applied, which revealed the high data precision and replicability of the derived equations.

Future Directions
We advise that future prediction models used to estimate running velocities should be applied in cohorts with comparable characteristics to those from which they were primarily created (similar to other prediction models used in sports diagnostics) [36]. It is especially important in narrow and specified populations, including well-trained EA or cardiology patients [36]. We underline that there is a significant necessity for more accurately adjusted contributing factors and the development of new, advanced machine-learning prediction algorithms using unified TRIPOD recommendations [18]. This will enable the subsequent choice of the appropriate protocol to use in medical diagnostic and training prescriptions, depending on the participant's disease type or fitness level [7]. We recommend assessing our methods in an external environment, such as the 3000 m distance run, to cover all evaluation sites [45,46]. It is worth mentioning that, as stated by Figueiredo et al., the critical speed showed a better predictive value for the 5 km running results regarding a steady run than the peak velocity. Although our research focuses mostly on CPET performed in the clinical settings on the mechanical treadmill, we recommend further studies which will investigate the effect of critical speed compared to peak velocity [47].

Conclusions
In summary, (1) we found the strongest predictors of running velocity, (2) we derived novel prediction models for running velocities in accordance with TRIPOD guidelines, and (3) we established their fair validation.
Currently, with the use of a machine-learning approach, we can accurately estimate V AT , V RCP , and V max based only on somatic and exertion variables (the precision and repeatability in the study subgroups were comparable to the test-retest error). VO 2 , [La − ] b , VE, and somatic characteristics were the greatest contributing factors. We anticipate that our findings will improve the personalization of training and rehabilitation programs. Models should be primarily applied in disciplines where running is the main form of activity, due to the similar characteristics to those regarding the specificity of the derivation cohort.