Improved VO2max Estimation by Combining a Multiple Regression Model and Linear Extrapolation Method

Maximal oxygen consumption (VO2max) is an important health indicator that is often estimated using a multiple regression model (MRM) or linear extrapolation method (LEM) with the heart rate (HR) during a step test. Nonetheless, both methods have inherent problems. This study investigated a VO2max estimation method that mitigates the weaknesses of these two methods. A total of 128 adults completed anthropometric measurements, a physical activity questionnaire, a step test with HR measurements, and a VO2max treadmill test. The MRM included step-test HR, age, sex, body mass index, and questionnaire scores, whereas the LEM included step-test HR, predetermined constant VO2 values, and age-predicted maximal HR. Systematic differences between estimated and measured VO2max values were detected using Bland–Altman plots. The standard errors of the estimates of the MRM and LEM were 4.15 and 5.08 mL·kg−1·min−1, respectively. The range of 95% limits of agreement for the LEM was wider than that for the MRM. Fixed biases were not significant for both methods, and a significant proportional bias was observed only in the MRM. MRM bias was eliminated using the LEM application when the MRM-estimated VO2max was ≥45 mL·kg−1·min−1. In conclusion, substantial proportional bias in the MRM may be mitigated using the LEM within a limited range.


Introduction
Cardiorespiratory fitness (CRF) strongly influences disease incidence and is therefore an important health indicator [1][2][3]. Nevertheless, CRF is not routinely assessed in clinical practice [4], primarily because of assessment challenges. The measurement of maximal oxygen consumption (VO 2max ), the gold standard metric for CRF evaluation, is time-intensive and requires skilled examiners as well as specialized equipment (e.g., a treadmill/cycling ergometer and a gas analyzer). A field-based exercise test, such as the 20 m shuttle run [5], is an alternative for facilitating efficient mass screening of CRF. This validated shuttle run test is convenient for children because it can be performed in conjunction with a sporting event or physical education class in the school gymnasium or sports field; in contrast, it is much less convenient for adults because such facilities are rarely available in the workplace.
A questionnaire-based, non-exercise VO 2max estimation [6][7][8][9] has been developed for population-based CRF assessments. However, such non-exercise methods have been shown by some previous studies [10,11] to have insufficient accuracy, suggesting that biological data such as the heart rate (HR) should be included in VO 2max estimation because only the participants' subjective responses may not correspond with VO 2max variance. Therefore, the use of HR in simple exercise tests such as the step test has the potential to be applied to CRF evaluation in adults, as it can be performed without skilled examiners, expensive equipment, a large exercise space, and participants' maximal exercise effort. 2

of 10
A multiple regression model (MRM) is often used in the development of estimated VO 2max (eVO 2max ) equations, with measured VO 2max (mVO 2max ) as the response variable and clinicodemographic variables and/or other measurement variables, such as HR and physical activity (PA) parameters, as the explanatory variables [6][7][8][9][12][13][14]. Two studies [15,16] showed that the addition of HR during a step test to a questionnaire-based, non-exercise MRM improved the accuracy of VO 2max estimation. On the other hand, the Chester step test [17], which is considered an appropriate evaluation tool for CRF in adults owing to its acceptable reliability [18,19], uses the linear extrapolation method (LEM). The LEM is based on the use of HR along with predetermined VO 2 values as constants during a stepping exercise. Subsequently, the best-fit line for HR and VO 2 is drawn, and the line is extrapolated up to eVO 2max , which corresponds to the age-predicted maximal HR.
The MRM and LEM play an important role in VO 2max estimation; however, each of these two methods has its own inherent problems. In particular, the MRM-predicted values are distorted at regression edges, creating systematic errors [20]. Typically, the MRM either underestimates the VO 2max in participants with high CRF or overestimates the VO 2max in those with low CRF [7][8][9]14]. On the other hand, the LEM usually does not introduce systematic errors but yields a larger error variance than the MRM [14], which is likely due to the inaccuracy of the predicted maximal HR [21,22]. A previous study [16] reported that the addition of HR data from a step test to a questionnaire-based, non-exercise MRM could improve the estimation power of eVO 2max in cross-sectional analyses. However, VO 2max estimation using the developed MRM was problematic in detecting increased VO 2max induced by lifestyle changes because the predicted values were distorted at the upper limit, resulting in a VO 2max underestimation among participants whose actual VO 2max was improved by their exercise lifestyle. Additionally, the study showed that the apparent systematic errors (underestimation of high values) seen in the MRM were absent in the LEM, that the LEM could detect the increased VO 2max induced by the exercise intervention, and that the estimated error of the LEM was generally greater than that of the MRM.
Therefore, a comparison of accuracy between VO 2max estimation using the MRM and VO 2max estimation using the LEM on data from the same participants may facilitate the construction of a combined model with the greatest possible accuracy. We speculated that the most accurate estimate of VO 2max could be achieved through the use of a combined model in which the LEM corrected for VO 2max underestimation by the MRM in participants with a high level of fitness.
In the present study, we applied the eVO 2max MRM equation, including questionnaire scores and step-test HRs, which was developed in a previous study [16], to participants who were different from those in the equation development study and investigated the cross-validity of this equation. Subsequently, using this dataset, we aimed to investigate an accurate VO 2max estimation method that combines the advantages of both MRM and LEM and mitigates the weaknesses of these two methods.

Participants
Adults aged 30-60 years who were residents of the Tokyo area, were working parttime or full-time at least three days per week, did not use any drugs that influence the autonomic nervous system (e.g., β-blockers), and had no medical conditions precluding VO 2max testing were included as study participants. The participants were recruited via website advertisements between June 2016 and March 2022. A total of 137 working adults (65 women and 72 men) participated in this study. The participants visited our laboratory on two occasions, with an interval of 1 week between these visits. On the first visit, the participants underwent anthropometric measurements and a step test and then responded to a questionnaire. On the second visit, the participants underwent a treadmill exercise test for actual VO 2max measurement. Nine participants were excluded because of insufficient data for analyses. Consequently, 128 participants (60 women and 68 men) were included in the analysis. This study was conducted in accordance with the principles embodied in the Declaration of Helsinki. The study protocol was reviewed and approved by the Ethics Committee of the National Institute of Occupational Safety and Health, Japan (approval ID: H2744; date of approval: 31 March 2016). Written informed consent was obtained from all participants after providing a full explanation of the study aims and research protocols.

Anthropometric Measurements
Height was measured once to the nearest 0.1 cm, whereas body weight was measured once to the nearest 0.1 kg. The body mass index (BMI) was calculated as weight in kilograms divided by the square of height in meters.

Questionnaire
The questionnaire used in this study was previously validated for VO 2max estimation [7]. It consists of several questions regarding the frequency, duration, and intensity of PA, yielding a total PA score of 0-44 points. A higher PA score suggested a higher VO 2max . The questions and scores assigned to each are presented in Table S1.

Step Test
The step test used in this study, named the National Institute of Occupational Safety and Health, Japan (J-NIOSH) step test (JST), was previously validated for VO 2max estimation [14]. It comprises three 1 min stepping exercise stages followed by two 1 min recovery stages. In the exercise stages, the participants were required to step up and down a 30 cm step in time with a metronome beating at four times the step rate. The initial step rate (stage 1) was 15 steps/min (60 beats per min [bpm]) and was increased by 5 steps/min (20 bpm) for each subsequent stage (to a final step rate of 25 steps/min or 100 bpm on the metronome during stage 3). At the end of stage 3, the participants rested in the sitting position, and HR recordings were obtained at 1 and 2 min during the recovery stage. The protocol for the JST allowed the participants to skip stage 3 if the following two criteria were met at the end of stage 2: (i) HR of 80% of the age-predicted maximal HR (i.e., 220-age in years) and (ii) a rating of perceived exertion (RPE) of 17 on the Borg scale. However, none of the participants skipped stage 3 in this study. The HR index was calculated as follows: HR index = (HR at exercise stage 3 − HR at exercise stage 1) + (HR at recovery stage 1 − HR at recovery stage 2).

mVO 2max
In order to directly measure the VO 2max , the participants performed an exhaustionlimited graded exercise test on a treadmill (AR200; Minato Medical Science, Osaka, Japan) using the Bruce protocol. During the test, the ventilation and expired gases were continuously measured using an open-circuit computerized indirect calorimeter (AE-310S; Minato Medical Science, Osaka, Japan) that was calibrated prior to each trial. The HR was monitored using an electrocardiogram (Life Scope, Nihon Kohden, Tokyo, Japan). The RPE was recorded using the Borg 6-20 scale. The highest 30 s average VO 2 value was defined as the VO 2max value when three of the following four criteria were satisfied: (i) the respiratory exchange ratio exceeded 1.10; (ii) the maximal HR was within 10 bpm of the age-predicted maximum (i.e., 220-age in years); (iii) the RPE exceeded 17; and (iv) the VO 2 reached a plateau despite further increases in workload [23,24].
The LEM eVO 2max was calculated based on a previous study [16]. Briefly, the LEM used a predetermined constant VO 2 value for each stage of the step test. The constant values were 4, 13, 19, 22, 17, and 8 mL·kg −1 ·min −1 for females and 4, 14, 20, 23, 18, and 8 mL·kg −1 ·min −1 for males at rest, during exercise stages 1-3, and during recovery stages 1 and 2, respectively. A scatter plot was constructed for each participant, with six predetermined constant VO 2 values (mL) for each stage of the JST (at rest, during exercise stages 1-3, and during recovery stages 1 and 2) on the x-axis and HR (bpm) on the y-axis. The best-fit line was calculated for the scatter plot, and the VO 2 value (x-axis) corresponding to the participant's age-predicted maximal HR (y-axis) was extrapolated as eVO 2max . The relation "208 − 0.7 × age" [25] was applied to the age-predicted maximal HR for calculation.

Data Analysis
Systematic differences between mVO 2max and eVO 2max were detected using Bland-Altman plots and linear regression analyses. Pearson correlation coefficients (r) were calculated to evaluate the relationship between mVO 2max and eVO 2max . Error statistics, such as, where Y is mVO 2max andŶ is eVO 2max , were compared among the estimation methods. All statistical analyses were performed using SAS version 9.4 (SAS Institute Japan, Tokyo, Japan) and Prism 9 (GraphPad Software, San Diego, CA, USA), with statistical significance set at a two-tailed p-value of <0.05.

Results
The participants' average values of the anthropometric variables-mVO 2max when using the treadmill, the questionnaire's PA score, and the JST's HR index-are summarized separately for women and men ( Table 1). The participants of both sexes generally had normal body sizes and CRF levels. Values are presented as mean ± standard deviation. BMI, body mass index; HR, heart rate; PA, physical activity; and VO 2max , maximal oxygen consumption.
The Bland-Altman plots revealed systematic differences between mVO 2max and eVO 2max . The MRM ( Figure 1A) exhibited a non-significant fixed bias (−0.47 mL·kg −1 ·min −1 , p = 0.21) and a significant proportional bias (r = −0.30, p < 0.01), which increased at higher VO 2max values. The LEM (Figure 2A) also showed both fixed bias (0.89 mL·kg −1 ·min −1 , p = 0.10) and proportional bias (r = 0.15, p = 0.10); however, neither was significant. The range of 95% limits of agreement (LoA) for the LEM was clearly wider than that for the MRM, suggesting a lower estimation accuracy. Indeed, the regression analysis showed a strong correlation between mVO 2max and eVO 2max values using both the MRM (r = 0.78, Figure 1B) and LEM (r = 0.64, Figure 2B); nonetheless, the correlation was stronger when the MRM was used. As shown in Figure 1B, the MRM also produced eVO 2max values under the ideal line (Y = X) for 9 out of 11 participants with the highest fitness (VO 2max ≥ 45.0 mL·kg −1 ·min −1 , indicated by black triangles in all figures). On the other hand, several values derived using the LEM were near the ideal line for the same 11 participants ( Figure 2B). Thus, the MRM tended to underestimate the VO 2max in fit individuals, whereas the LEM showed no proportional bias but generally yielded less accurate VO 2max estimates.
line (Y = X) for 9 out of 11 participants with the highest fitness (V O2max ≥45.0 mL·kg −1 ·m indicated by black triangles in all figures). On the other hand, several values derived the LEM were near the ideal line for the same 11 participants ( Figure 2B). Thus, the M tended to underestimate the V ． O2max in fit individuals, whereas the LEM showed no portional bias but generally yielded less accurate V ． O2max estimates.   We speculated that the most accurate eVO 2max could be obtained through the use of a combined model in which the LEM corrected for VO 2max underestimation by the MRM in participants with high fitness. The procedure was as follows: (1) the eVO 2max for each participant was calculated using the MRM, and (2) if the eVO 2max exceeded 45.0 mL·kg −1 ·min −1 , then it was recalculated using the LEM. Figure 3 shows the Bland-Altman plot (A) and scatter plot (B) constructed using this procedure. With this combined method, neither the fixed bias (−0.11 mL·kg −1 ·min −1 , p = 0.76) nor the proportional bias (r = −0.05, p = 0.55) were significant, and the correlation between mVO 2max and eVO 2max was still strong (r = 0.80).

Discussion
The high correlation coefficient (r = 0.78) between the MRM eV ． O2max and mV ． indicated that the equation derived from 173 adults in a previous study [16] could b plied to an independent participant group (i.e., good stability of the equation). The M yielded greater estimation accuracy; however, a substantial proportional bias was detected, resulting in a remarkable V ． O2max underestimation in participants with hig ness (Figure 1). On the other hand, the LEM demonstrated lower estimation accuracy  Table 2 compares the other error statistics (CE, SEE, and TE) among the three estimation methods. The CE was not significant for all three methods; however, a larger difference between mVO 2max and eVO 2max was observed with the LEM than with the other two methods. The SEE increased in the order of LEM, MRM, and combined method. This tendency was also observed for the TE. Additionally, similar values were obtained between the SEE and TE for the MRM and combined method, whereas the TE was relatively higher than the SEE in the LEM.

Discussion
The high correlation coefficient (r = 0.78) between the MRM eVO 2max and mVO 2max indicated that the equation derived from 173 adults in a previous study [16] could be applied to an independent participant group (i.e., good stability of the equation). The MRM yielded greater estimation accuracy; however, a substantial proportional bias was also detected, resulting in a remarkable VO 2max underestimation in participants with high fitness (Figure 1). On the other hand, the LEM demonstrated lower estimation accuracy than the MRM but no proportional bias-that is, the LEM did not underestimate VO 2max in participants with high fitness (Figure 2). Interestingly, the combined method (Figure 3), which replaces the MRM with the LEM for predicted values ≥ 45 mL·kg −1 ·min −1 yields an optimal result, suggesting that the inaccuracies induced by the MRM and LEM could be partly mitigated by optimizing the statistical model. Thus, VO 2max could be predicted with a relatively high degree of accuracy from a simple combination of morphometric measurements, questionnaire data, and step-test HR by combining MRM and LEM calculations.
One metabolic equivalent (MET) change in CRF is a meaningful value for disease prevention [26]. Hence, it would be a reasonable policy that the target SEE of VO 2max estimation models should be ≤1.0 MET (3.5 mL·kg −1 ·min −1 ). Peterman et al. [11] reported SEEs ranging from 4.1 to 6.2 mL·kg −1 min −1 for non-exercise models, indicating substantial variation in the validity of these questionnaires for VO 2max estimation. Some studies [7,8] suggested that eVO 2max questionnaires should include items assessing the frequency, duration, and intensity of PA and that, owing to its greater influence on eVO 2max , the intensity score should be more heavily weighted than the frequency and duration scores. One of the previous studies [7] that used the same questionnaire as the present study reported an SEE of 4.29 mL·kg −1 ·min −1 for the non-exercise MRM (including age, sex, BMI, and the questionnaire's PA score), as well as excellent test-retest reliability (intraclass correlation coefficient [ICC], 0.87; 95% confidence interval [CI], 0.82-0.91) for the PA score. Meanwhile, several studies have shown that evaluating "changes" in VO 2max is difficult to perform using a questionnaire-based, non-exercise estimation method. For instance, Peterman et al. [27] found limited accuracy for 27 non-exercise prediction equations as compared with that for directly measured values in a cohort of 987 healthy adults. Similarly, Lannoy and Ross [10] reported limited VO 2peak estimation accuracy, as compared with that for mVO 2peak , in 163 adults participating in a 24 week exercise intervention. These studies suggest that biological measures such as the HR may be required for accurate VO 2max estimation.
The HR during the step test is often used for VO 2max estimation. The increase in HR during a stepping exercise will be lower and will more rapidly return to baseline in individuals with high fitness than in those with low fitness [14]. The Queen's College step test [28], the Astrand-Ryhming step test [29], and the Chester step test [17] assume that the HR during or soon after exercise is lower in individuals with high fitness. On the other hand, others, such as the Harvard step test [30] and YMCA step test [31], assume that the HR decreases more rapidly during recovery in individuals with high fitness. However, the use of recovery HR alone may not be sufficiently sensitive for precisely predicting VO 2max [14,32]. The JST's HR index in the present study captures the HR responses for VO 2max estimation both during the stepping exercise and during the recovery period. A recent study [14] reported that the SEE for the JST (4.54 mL·kg −1 ·min −1 ) was relatively lower than the SEE for the Chester step test (4.99 mL·kg −1 ·min −1 ) and that the JST's HR index demonstrated fair-to-good test-retest reliability (ICC, 0.65; 95% CI, 0.53-0.74).
Recently, Webb et al. [15] reported that the addition of both the questionnaire score and step-test HR to the MRM (including sex and body weight) improved the accuracy of eVO 2max . Similarly, Matsuo et al. [16] showed that the addition of the JST's HR index to a questionnaire-based, non-exercise MRM (including age, sex, BMI, and the questionnaire's PA score) improved the accuracy of eVO 2max . Nevertheless, the study [16] suggested that this "synergistic effect" of the questionnaire's PA score and step-test HR for MRM eVO 2max was not effective in detecting changes in mVO 2max in an intervention experiment. That is, the study showed that mVO 2max , which was measured using a cycling ergometer and gas analyzer, increased by approximately 20% during the exercise training period and decreased by approximately 10% during the subsequent detraining period. The MRM eVO 2max apparently underestimated the increase in mVO 2max owing to a substantial proportional bias; therefore, the MRM could not detect changes in the CRF along with lifestyle modifications. Additionally, the study also showed that the LEM eVO 2max presumably detected the increase in mVO 2max while yielding a larger error variance than the MRM.
The results of the present study are consistent with the findings of a previous study [16]. The MRM exhibited good VO 2max estimation accuracy in participants with average or low fitness but significantly underestimated the VO 2max in participants with high fitness (Figure 1). On the other hand, with the LEM (Figure 2), the eVO 2max values of participants with high fitness were distributed near the ideal line (i.e., no underestimation); however, the LEM yielded larger errors across the VO 2max range, likely because of greater errors in the age-predicted maximal HR [21,22], which is heavily weighted in the LEM calculation process. Therefore, in the present study, we propose a method combining the advantages of both the MRM and LEM-that is, the generally higher accuracy of the MRM when eVO 2max is below a certain cut-off (in this case, 45 mL·kg −1 ·min −1 ) and the absence of proportional bias in the LEM estimates when VO 2max is above the cut-off. Consequently, the significant proportional bias observed for the MRM ( Figure 1A) disappeared, and the wider LoA seen for the LEM (Figure 2A) was improved by the combined method ( Figure 3A). Additionally, the combined method produced an improved scatter plot as well as the highest correlation coefficient among the three methods. Based on the principles of cross-validation analyses [33,34], the SEE and TE values should be calculated because the TE reflects the actual difference between measured and estimated values, whereas the SEE reflects only the variation in regression; similar values between the SEE and TE reflect a close approximation between the regression line and the line of identity. From this viewpoint, we compared the error statistics of the three methods (Table 2). Consequently, better CE, SEE, and TE values were observed for the combined method, and the difference between the SEE and TE for the combined method was not large. Thus, the favorable error statistic values demonstrate the superiority of the combined method.
This study has some limitations. First, the participants decided to participate after viewing our research advertisement, which might have likely introduced a selection bias in adults seeking to monitor their own CRF value. Second, mVO 2max was measured only once in each participant; hence, there was no estimation of intra-individual variability or possible measurement errors. Third, this study primarily included healthy participants and did not include any individuals having medical conditions precluding VO 2max testing or using medical drugs, such as β-blockers. Therefore, the estimation method proposed in this study might not apply to those patients and should be further investigated. Fourth, it should be considered that the favorable results obtained by the combined method may have happened by chance because of the unique dataset used in the present study. Similarly, the cut-point (≥45 mL·kg −1 ·min −1 ) may not be applicable to other populations. Therefore, the combined method should be validated using different subjects' data. Even so, we believe that the combined method presented in this study may be a feasible method for improving the accuracy of VO 2max estimation.

Conclusions
The present study showed that the MRM yielded higher estimation accuracy than the LEM but demonstrated marked proportional bias (i.e., VO 2max underestimation in participants with high fitness). However, this weakness could be mitigated through the use of the LEM within a limited range because the proportional bias was less likely to occur in the LEM. Further research is needed to confirm that this approach is applicable to other populations.

Institutional Review Board Statement:
This study was conducted in accordance with the principles embodied in the Declaration of Helsinki. The study protocol was reviewed and approved by the Ethics Committee of the National Institute of Occupational Safety and Health, Japan (approval ID: H2744; date of approval: 31 March 2016).
Informed Consent Statement: Written informed consent was obtained from all participants after explaining the aims and design of the study.

Data Availability Statement:
The derived data supporting the findings of this study are available from the corresponding author upon reasonable request following the acquisition of approval from the Ethical Committee of the National Institute of Occupational Safety and Health, Japan.