Validation of Aerobic Capacity (VO 2max ) and Lactate Threshold in Wearable Technology for Athletic Populations

: As wearable technology (WT) has evolved, devices have developed the ability to track a range of physiological variables. These include maximal aerobic capacity (VO 2max ) and lactate threshold (LT). With WT quickly growing in popularity, independent evaluation of these devices is important to determine the appropriate use-cases for the devices. Therefore, the purpose of this study was to determine the validity of WT in producing estimates of VO 2max and LT in athletic populations. METHODS: 21 participants completed laboratory LT and VO 2max testing, as well as an outdoor testing session guided by the WT being tested (Garmin f¯enix 6 ® watch and accompanying heart rate monitor). Statistical analysis was completed, using hypothesis testing (ANOVA, t -test), correlation analysis (Pearson’s r, Lin’s Concordance Correlation [CCC]), error analysis (mean absolute percentage error [MAPE]), equivalence testing (TOST test), and bias assessment (Bland–Altman analysis). RESULTS: The Garmin watch was found to have acceptable agreement for VO 2max when compared to the 1 min averaged values (MAPE = 6.85%, CCC = 0.7) and for LT and the onset of blood lactate accumulation (OBLA), (MAPE = 7.52%, CCC = 0.79; MAPE = 8.20%, CCC = 0.74, respectively). Therefore, the Garmin f¯enix 6 ® produces accurate measurements of VO 2max and LT in athletic populations and can be used to make training decisions among athletes.


Introduction
Among the two most important parameters for predicting endurance performance are maximal aerobic capacity (VO 2max ), and lactate threshold (LT) [1][2][3][4]. VO 2max represents the highest amount of oxygen an individual is capable of bringing into the body and utilizing to produce energy [4]. The lactate threshold is the point just prior to an exponential rise in lactate concentrations, a metabolic byproduct of anaerobic metabolism that increases in concentration during exercise, especially intense exercise [4]. Traditionally, LT has been obtained in a laboratory setting with the use of blood lactate analyzers and a graded exercise test, either on a bike or treadmill [4]. To determine VO 2max , the use of a metabolic cart to measure oxygen consumption during a graded exercise test can determine an individual's maximal aerobic capacity [4]. While field-based tests have been developed for the estimation of VO 2max they are not as accurate as laboratory measurements [5][6][7]. While the field tests do have the benefit of increased accessibility as a result of not requiring expensive equipment and trained technicians to administer the tests, it also opens up the ability to test multiple people simultaneously. As wearable technology (WT) continues to evolve, it may serve as another tool to determine VO 2max and LT as predictive measures of endurance performance.
Wearable technology utilization continues to grow in popularity and prevalence, both in recreational and higher-level athletics [5][6][7][8][9][10][11][12][13][14]. As WT becomes more sophisticated, its usage will increase to a greater portion in each population [15]. Wearable technology devices are usually worn around the wrist or chest but can vary in terms of sensor placement [16]. Some of the ways WT can be used are varied, including medical [17], biomechanical [18], Technologies 2023, 11, 71 2 of 10 physiological [16], tactical [19], and for training decisions [20]. Some of the variables measured or estimated by current WT include factors such as heart rate, VO 2max , LT, blood oxygen saturation, energy expenditure, sleep quantity, heart rate variability (HRV), and ground contact time. A review of technology in team sports identified four general categories of use for integrated technology, including [21] quantifying movement patterns, [15] assessing the demands of training and competition, [22] measuring physiological and metabolic responses, and [23] determining velocity and sprint effort [24]. Wearable devices that can estimate LT using heart rate, muscle oxygen and sweat sensors have been introduced [25][26][27][28][29]. Another potential benefit of WTs is that they can estimate VO 2max and LT in a non-invasive manner. The values produced by the technology can be used to make training decisions based on individual physiological responses. Obtaining these values via WT in the field could offer an advantage to athletes and teams in terms of cost and availability [16].
As we have established, VO 2max and LT are critically important values in determining endurance performance, as an increase in these metrics allows athletes to sustain a higher intensity of exercise for longer, which is particularly important for endurance-based performances. While WT has its advantages and disadvantages in estimating these values, up to this point, studies have primarily been conducted using the general population rather than athletic populations. It remains unknown how well this technology works with higher-level athletes compared to the general population. Trained individuals and athletes have higher VO 2max values and reach LT at higher running speeds than untrained individuals. This may present a challenge for wearable technology in determining the VO 2max and LT of highly trained individuals, compared to untrained or lightly trained individuals. While previous research has validated VO 2max [16,30,31] and LT [25,26] in WT using the general population, it is important to determine whether limitations exist in athletic populations for use in collegiate and professional athletics. Recruitment of high-level athletes makes this work unique and will allow athletes, coaches, researchers, and others to better understand the use-case of this technology and who may benefit from its use. Therefore, the purpose of this study was to determine the validity of wearable technology to estimate VO 2max and LT in athletic populations.

Study Design
Prior to data collection occurring for this investigation, the protocols were approved by the University of Nevada, Las Vegas Institutional Review Board (IRB, 1525606-12). All participants signed an informed consent and filled out pre-assessment documents prior to completing the study. Data collection occurred over two days and included a laboratory testing day and an outdoor/field testing day. After consenting to the study, demographic data were obtained (24.24 ± 6.30 years, 11 male, 10 female, 171.68 ± 8.01 cm, 65.14 ± 9.41 kg, BMI = 22.01 ± 1.91, 17.04 ± 5.69% fat mass, 39.25 ± 3.26% muscle mass, 42.49 ± 22.96 km per week, all reported as mean ± SD).
Next, a treadmill-based graded exercise test utilizing speed and grade progression every two to three minutes was performed to determine both LT and VO 2max . Blood lactate levels were measured via the handheld Lactate Plus analyzer (Nova Biomedical Corp, Waltham, MA, USA). Oxygen consumption was determined by the ParvoMedics TrueOne 2400 metabolic cart (ParvoMedics Inc, Salt Lake City, UT, USA). The lactate threshold was determined by graphing the data and determining the point just prior to an exponential rise in lactate concentration (>1 mmol/L rise) that also corresponded to a final concentration above 4 mmol/L. The onset of blood lactate accumulation (OBLA) was determined by solving the slope-intercept equation for speed when lactate concentration equaled 4 mmol/L. VO 2max was determined by taking the highest average oxygen consumption during the graded exercise test for a set timeframe. VO 2max values for 4-breath, 15 s, 30 s, and 1 min average timeframes were obtained by the metabolic cart to compare to the wearable device.
After laboratory values were obtained for LT and VO 2max , participants returned between two and seven days (5.56 ± 2.53) after the laboratory-based test to complete the outdoor testing session. The outdoor run was conducted in one of two places, the University track or a flat area of campus, depending on track availability. Ten participants ran the track, and eleven completed the protocol on campus. The altitude was~686 m, and the average temperature during outdoor testing was 22.01 ± 9.57 • C. The outdoor testing involved completing two separate runs while wearing the fitness tracker watch (Garmin fēnix 6 ® , Garmin Ltd., Olathe, KS, USA) and accompanying heart rate monitor (Garmin HRM-Run ® ). A factory reset on the watch was performed before each test so that previous data did not influence the estimate of VO 2max or LT. The first run was a 10-15 min run at above 70% of the estimated max heart rate (MHR). This gave the device enough data to estimate VO 2max , using a linear extrapolation of heart rate (HR) and running speed [21,22]. For the 10-15 min run to determine VO 2max , the average distance, time, pace, and HR were 2.52 ± 0.37 km, 12.63 ± 3.19 min, 5.1 ± 1.43 min/km, and 154.8 ± 10.28 bpm, respectively. After the 10-15 min run, participants were given up to 10 min to rest before the next run, which was a graded exercise test guided by the watch. Participants were provided with a HR range via the watch and instructed to run at a pace that could be maintained within that intensity range. The HR window progressively increased every 3-4 min, and participants were required to speed up to match the new HR window. This continued until the watch concluded the test or the participants voluntarily stopped prematurely, which concluded the outdoor data collection. For the progressive exercise test to determine LT, the watch utilizes HRV during exercise to identify LT [23]. The average distance, time, pace, and HR for the graded exercise test was 3.42 ± 0.98 km, 16.71 ± 5.41 min, 4.99 ± 1.45 min/km, 164.25 ± 9.81 bpm, respectively. If the fēnix 6 was not able to produce an estimate, either because the participant had to end early or the watch failed to produce an estimate for unknown reasons, participants were asked to return on a different day to perform the outdoor test again. If the device was unable to produce an estimate after two different attempts, participants were not tested a third time. There were two participants for whom the watch was unable to generate an estimate of LT. There was one participant whose data were not recorded prior to resetting the watch and was lost. Therefore, while the total number of participants was 21, LT analysis was performed with 18 subjects, and VO 2max with 20.

Participants
For this study, apparently healthy individuals who exercised regularly (>3 times per week) were recruited. Of those that were tested, 21 scored in the 95th percentile or above for their VO 2max values, based on their age and biological sex, and were included in the athletic population dataset for the current investigation.

Data Analysis
Data for lactate concentration and speed were input directly into Google Sheets (Alphabet Inc., Mountain View, CA, USA), and further analysis to determine LT and OBLA for each participant was completed within Google Sheets. VO 2max and associated percentile for each timeframe (4-breath, 15 s, 30 s, and 1 min) was determined by the ParvoMedics software and input into Google Sheets. All granular calculations were completed within Google Sheets. All hypothesis testing, summary statistics, validation measures, and figures were completed and generated in jamovi (jamovi project, version 2.2, https://www.jamovi.org/, accessed on 22 March 2023). These include ANOVA's with post hoc pairwise comparisons with Tukey adjustments for multiple comparisons (when appropriate), descriptive statistics, error analysis (mean absolute percentage error), correlation analysis (Pearson's r, Lin's Concordance Correlation Coefficient [CCC]), equivalence testing (TOST Paired Samples Test), and bias assessment (Bland-Altman analysis). TOST test lower and upper bounds were set at +0.5 and −0.5 Cohen's D for each test. Data analysis for VO 2max was completed by comparing the fēnix 6 estimates of VO 2max to each laboratory timeframe. Data analysis Technologies 2023, 11, 71 4 of 10 for LT was completed by comparing the fēnix 6 estimates of speed at LT and HR at LT to the laboratory values (speed at LT, speed at OBLA, and HR at LT). Determination of validation was pre-determined, and any device that produced a CCC ≥ 0.7 and a MAPE < 10% was considered valid [16].

VO 2max
The 21 participants used for this analysis had an average VO 2max percentile of 98.24 ± 1.3%, based on the 30 s averaged VO 2max values. The one-way ANOVA for VO 2max at each time showed a significant difference for the global test (F = 5.59, p < 0.001, η2 = 0.19). Further post hoc pairwise comparisons with Tukey adjustments for multiple comparisons were performed. The fēnix 6 estimate was significantly different from the 4-breath average (t = −4.52, p < 0.001, Cohen's D = 1.43), but not different for any of the other time comparisons (see Table 1). Error analysis showed that the fēnix 6 VO 2max estimate had a MAPE of less than 10% when compared to the 30 s and 1 min averaged time parameters (see Table 1). Correlation analysis produced a CCC ≥ 0.7 for the 1 min averaged time only (see Table 1). Equivalence testing via TOST test was violated for all four time parameters (see Table 1 and representative plots in Figure 1). Bland-Altman bias values and 95% confidence intervals can be found in Table 1, and associated plots can be found for all time parameters in Figure 2.

Figure 1.
A representative sampling of TOST test results for VO2max and LT data. Far left = Lab HR at LT to fēnix 6 HR at LT, middle left = speed at OBLA to fēnix 6 LT speed, middle right = fēnix 6 VO2max to 30 s avg VO2max, far right = fēnix 6 VO2max to 1 min avg VO2max. Upper and lower bounds set at +0.5 and −0.5 Cohen's D. All tests shown violated equivalence testing parameters except lab HR at LT to fēnix 6 HR at LT (far left).

Lactate Threshold
The one-way ANOVA for speed at LT showed no significant difference for the global test (F = 1.32, p = 0.28, η2 = 0.07). HR at LT was not different between laboratory measures or the fēnix 6 device (t = 0.261, p = 0.797, Cohen's D = 0, see Table 2). Error analysis showed that the fēnix 6 had a MAPE below 10% for all three parameters (see Table 2). Correlation analysis produced a CCC ≥ 0.7 for both speed parameters but not HR (see Table 2). Equivalence testing via TOST test was violated for both speed parameters but was met for HR (see Table 2 and representative plots in Figure 2). Bland-Altman plots can be found for speed and HR parameters in Figure 3. that the fēnix 6 had a MAPE below 10% for all three parameters (see Table 2). Correlation analysis produced a CCC ≥ 0.7 for both speed parameters but not HR (see Table 2). Equivalence testing via TOST test was violated for both speed parameters but was met for HR (see Table 2 and representative plots in Figure 2). Bland-Altman plots can be found for speed and HR parameters in Figure 3. Table 2. Lactate threshold descriptive and validation statistics results.

Discussion
In this study, the validity of the VO 2max and LT estimates by wearable technology in athletic populations was tested against gold-standard laboratory tests. Studies have tested these metrics amongst the general population; however, to our knowledge, the validity among athletic subjects has not been reported. The physiology of untrained and trained individuals is different, with trained individuals having a higher VO 2max and LT values. This introduces unique challenges to technology seeking to provide accurate estimates of Technologies 2023, 11, 71 7 of 10 these physiological metrics. In the current investigation, we observed mixed results for the fēnix 6, although generally acceptable. The estimate for VO 2max was accurate and valid compared to 1 min averaged VO 2max values. The LT estimates were accurate and valid when comparing speed at LT as well as the speed at OBLA. However, HR at LT was not considered valid.
VO 2max is a crucial aspect of endurance performance [1,3,4], with many arguing that it is the most important measure (though not the only important measure, and potentially the second most important) [2]. Knowing VO 2max allows coaches and athletes to tailor training intensity specifically for an individual. Tracking the aerobic capacity over time can also give insights into the effectiveness of the training program. However, gold-standard measurements are expensive, require technicians with proper training to administer the test, and require athletes to take a day off from training to complete the testing protocol. These aspects make WT a desirable alternative to standard laboratory testing methods in addition to the fact that they are cost-effective testing methods and can track changes in VO 2max over time. Maximal aerobic capacity can be estimated by several devices, and they use a submaximal running or cycling test while measuring HR to estimate VO 2max . Thus, the estimates can be achieved during normal training protocols without needing to take a day off for testing. This provides efficiency and potentially very useful training feedback and information for athletes and coaches.
The results from the previous literature studying the VO 2max and LT estimates in WT have shown acceptable accuracy when compared to laboratory measurements. While no standardized statistical analyses for WT validation have been established, most investigations perform a form of error analysis and correlation analysis. Investigations of different models of wearable devices have been shown to have acceptable accuracy in their VO 2max estimates, including the Garmin fēnix 3 HR [32], Garmin Forerunner 920XT [31], and the PulseOn wrist-based monitor [30] (PulseOn Oy, Neuchâtel, Neuchatel, Switzerland). Meanwhile, others have not had acceptable accuracy in their VO 2max estimate, including the Polar V800 [31] (Polar Electro, Kempele, Oulu, Finland). Investigations into WT to determine LT has had positive results, with the Humon Hex (Humon, Cambridge, MA, USA) and the BSX Insight (BSX Athletics, Austin, TX, USA) both being found sufficiently accurate in their estimates [25,26]. Thus, the performance of this device surpasses some previous devices for VO 2max . Additionally, having the ability to determine VO 2max and LT in athletic populations using the same device, rather than two different devices, is a benefit to athletes and coaches. However, it seems likely that the technology used in previous investigations to determine LT via measuring muscle oxygen levels could be utilized to produce a more accurate estimate of LT when used in conjunction with HR. Based on the current findings, the VO 2max estimate appears to be accurate for athletic populations (those who scored above the 95th percentile for VO 2max for their age and sex). The Bland-Altman plots revealed a tendency for error as VO 2max increased, so future research should be directed to validating devices in individuals with very high aerobic capacity (>60 mL/kg/min).
While it has been proposed that VO 2max is the most important indicator for endurance performance (or potentially the second most important), LT is widely accepted to be the next most important factor for predicting success in endurance sports [1][2][3][4]. Knowledge of an athlete's LT may be even more relevant to building a training protocol than VO 2max , as it is easy to obtain a speed or HR associated with LT, which can be used to set intensity zones. Therefore, WT with the capability of estimating LT accurately, will provide great benefits to athletes and coaches. Similarly, having WT that can provide data regarding the changes in LT throughout the course of regular training allows for "fine-tuning" of an athlete's training protocol, even within a microcycle. However, the Bland-Altman plots from the data obtained in the current investigation show a tendency towards bias for LT estimates of those with very high LT values, as well as for those with a HR at LT that differs significantly from~175 bpm. This may be a limitation that athletes and coaches may need to be aware of and may also represent an area of interest for researchers to evaluate in the future.
As stated earlier, this study is unique as it tested the VO 2max and LT estimates of WT in an athletic population. As the general population had previously been studied, this provides a better resolution as to the appropriate use cases of WT. While there is no wide consensus in terms of what validity thresholds should be used [16], commonly used thresholds have been a MAPE < 10% and CCC ≥ 0.7, which have been adopted for this analysis. These thresholds seem to be appropriate for the general population; however, coaches and athletes may want to establish more stringent validity criteria for use in collegiate and professional athletics. As researchers continue to independently evaluate the validity and reliability of these devices, thresholds need to be established. A tiered threshold may be of value to help understand the appropriate use cases for different devices. While thresholds have not been established, even tests used to determine validity vary across the published literature (and requested tests by reviewers can add to the variability). Appropriate analytical techniques have been suggested, such as error analysis, correlation analysis, and equivalence tests [16,33]. We have included all in the current investigation; however, it is very rare that researchers include equivalence testing in their validation analyses. As such, it is unclear what cutoffs should be used to determine validity, which is why we have not factored in the performance on the TOST test into the validity decision. We utilized a TOST test with upper and lower bounds of +0.5 and −0.5 Cohen's D above, but as equivalence testing becomes more prevalent in the literature, those values may need to change.

Limitations
As the current investigation evaluated the validity of only a specific wearable device (the Garmin fēnix 6), any extrapolation to other devices should be avoided. While the purpose was to examine the validity of this device in athletic populations (>95th percentile in terms of aerobic capacity), readers should not use this study to determine the validity among the general population. We recommend looking towards previously published work for that population.

Conclusions
In summary, we tested the estimates of VO 2max and LT in wearable technology (Garmin fēnix 6) against gold-standard laboratory values. Determination of error and correlation was completed to determine the overall validity. The predetermined validity criteria were established at a MAPE < 10% and CCC ≥ 0.7. This device was determined to be valid for VO 2max estimates in athletic populations and should be compared against 1 min averaged VO 2max values. It was also found to be valid for determining speed at LT and OBLA. Therefore, this device may be used to determine the VO 2max and lactate threshold values for athletic individuals if laboratory values are not able to be obtained.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of University of Nevada, Las Vegas (protocol number: 1525606-11, date of approval: 08-11-2022).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Not applicable.