Utility of Verification Testing to Confirm Attainment of Maximal Oxygen Uptake in Unhealthy Participants: A Perspective Review

Maximal oxygen uptake (VO2max) is strongly associated with endurance performance as well as health risk. Despite the fact that VO2max has been measured in exercise physiology for over a century, robust procedures to ensure that VO2max is attained at the end of graded exercise testing (GXT) do not exist. This shortcoming led to development of an additional bout referred to as a verification test (VER) completed after incremental exercise or on the following day. Workloads used during VER can be either submaximal or supramaximal depending on the population tested. Identifying a true VO2max value in unhealthy individuals at risk for or having chronic disease seems to be more paramount than in healthy and active persons, who face much lower risk of premature morbidity and mortality. This review summarized existing findings from 19 studies including 783 individuals regarding efficacy of VER in unhealthy individuals to determine its efficacy and feasibility in eliciting a ‘true’ VO2max in this sample. Results demonstrated that VER is a safe and suitable approach to confirm attainment of VO2max in unhealthy adults and children, as in most studies VER-derived VO2max is similar of that obtained in GXT. However, many individuals reveal higher VO2max in response to VER and protocols used across studies vary, which merits additional work identifying if an optimal VER protocol exists to elicit ‘true’ VO2max in this particular population.


Introduction
Maximal oxygen uptake (VO 2 max) as determined by the Fick Equation represents the maximal ability of the cardiovascular system to transport oxygen and the capacity of the periphery to extract oxygen to support aerobic metabolism. It is apparent that VO 2 max is related to endurance performance and, more importantly, premature mortality [1]. Because of this link between VO 2 max and health status, the American College of Sports Medicine [2] recommends 150 min/week of moderate intensity continuous exercise or 75 min/week of vigorous exercise to enhance fitness and improve overall health status, although attainment of this guideline in U. S. adults is relatively low [3].
Despite the fact that VO 2 max has been measured in laboratory and clinical settings for a century, there is no standardized exercise testing protocol to assess it as the specific work rate increment, stage duration, and gas exchange sampling interval vary across studies. In addition, there is no robust approach to ensure that VO 2 max is attained at the end of incremental exercise which is problematic when this value is used to prescribe exercise, assess training responsiveness, or describe health status. In turn, relying on an imprecise estimate of VO 2 max can have negative effects upon the accuracy of these applications which can change the course of decision making made by practitioners or scientists regarding client health. Various primary (oxygen plateau) and secondary criteria (maximal values of heart rate, respiratory exchange ratio, rating of perceived exertion, and blood lactate concentration) are widely used in this capacity, yet each has its limitations (for additional information on this, please consult Schaun et al. [4]) that may make them ineffective in ensuring that VO 2 max is actually attained by each participant.
Implementation of a second exercise test completed after the incremental test was first identified by Thoden et al. [5] in active adults who required an 'exhaustive test' to be performed after the incremental protocol. Later work [6,7] showed that completion of this subsequent higher intensity bout (called the verification test (VER)), performed a few minutes or up to 1 week after the incremental exercise bout, led to similar mean estimates of VO 2 max, thus confirming a plateau in oxygen uptake and, in turn, attainment of VO 2 max. For example, in 16 distance runners, data [8] showed that 26 of 32 VO 2 max tests performed on a treadmill reveal similar (≤2% different) estimates of VO 2 max between ramp and subsequent verification testing. In seven healthy men, Rossiter et al. [9] demonstrated that VER at 95 or 105%of peak power output (PPO) performed 5 min after ramp exercise elicits similar values of VO 2 max, leading these authors to recommend either protocol as a suitable way to confirm VO 2 max attainment. Overall, these data show that VER is a robust procedure to confirm attainment of VO 2 max in healthy active adults.
Despite these data, a valid concern of VER is that its supramaximal effort would be inappropriate for those who are inactive or at risk for chronic disease who lack the exercise capacity due to aging, presence of comorbidities, or desire to sustain such demanding efforts long enough to allow VO 2 to attain a maximal value. However, results from inactive adults [10], older adults [11], and those with obesity [12][13][14] demonstrate that it is welltolerated and feasible in these populations and leads to similar estimates of VO 2 max as the ramp test. In addition, data show its efficacy to confirm attainment of VO 2 max in adults with metabolic syndrome [15] as well as heart failure [16]. Recent data also show that implementing VER reveals more precise determinants of increases in VO 2 max in response to high intensity interval training in adults with metabolic syndrome compared to graded exercise testing [17]. So, similar to healthy adults, use of VER seems warranted to confirm attainment of 'true' VO 2 max in persons with chronic disease.
A recent systematic review [18] summarized data concerning efficacy of VER in healthy participants and concluded that this is a robust approach to confirm the value acquired from incremental exercise. However, having a more accurate estimate of 'true' VO 2 max in this active population may not be that important as their cardiorespiratory fitness is superior, leading to enhanced health status versus less fit populations. In response to exercise training, an increase in VO 2 max as low as 1.5 mL/kg/min has been identified as being clinically significant in persons with chronic disease [19]. Thus, in persons having low VO 2 max and, in turn, diminished health status, any small inaccuracies in VO 2 max assessment may elicit different responses to training and/or inaccurate diagnoses that may modify choice of various treatment options implemented to improve individual health status. In addition, VO 2 max is frequently measured as a primary outcome in exercise training studies due to its strong relationship with health status [1]. Moreno-Cabanas et al. [17] concluded that ramp testing misrepresents the training-induced change in VO 2 max in a majority of individuals with metabolic syndrome, and they emphasized the necessity of VER to better represent the VO 2 max response to training. However, to our knowledge, no review article has summarized efficacy of VER to confirm VO 2 max incidence in unhealthy participants. Some studies show that VER leads to similar estimates of VO 2 max versus graded exercise testing, whereas others show significantly higher VO 2 max when VER is performed. These equivocal findings may cloud judgment as to whether this additional test should be performed to elicit a 'true' VO 2 max and merit development of a review article to provide a thorough overview of feasibility of VER in clinical populations.
This review summarized findings regarding efficacy of VER to confirm attainment of VO 2 max in unhealthy and/or inactive participants which, to our knowledge, has not been done. The main questions answered by this review include: (1) is verification testing able to confirm attainment of VO 2 max in this sample, (2) is it safe and well-tolerated, and (3) is there an optimal intensity or structure of VER to employ to confirm attainment of VO 2 max in this particular sample? Results from Murias et al. [20] obtained in young and older men concluded that VER is unnecessary to confirm VO 2 max attainment as mean VO 2 max values from this test and the preceding ramp test were not significantly different. Recent work from this laboratory [21] also revealed that VER using supramaximal workloads significantly underestimated VO 2 max, so these authors did not recommend these intensities for VER testing. Nevertheless, these results were acquired in active adults that do not apply to individuals with lower cardiorespiratory fitness. Moreover, no individual results were presented which is important since attaining a 'true' VO 2 max is an individual phenomenon. Recent work in adults with cancer [22], hypertension [23], and obesity [13] reveal that a sizable amount of individuals exhibit an underestimation in ramp-derived VO 2 max and a higher VO 2 max value when supramaximal VER is performed, which supports its efficacy in inactive individuals. However, across all studies, the participant population, testing protocol used, and criteria employed to confirm VO 2 max incidence vary, which does not allow identification of a standard VER protocol in clinical populations. Overall, detecting a 'true' VO 2 max is paramount, as this value can be used to prescribe personalized exercise training, assess efficacy of exercise training, and classify health risks.

Search Strategy
We conducted a literature search from February to April 2021 using databases including PubMed, Google Scholar, and SPORTDiscus. The key words used were 'maximal oxygen uptake,' 'VO 2 max,' 'maximal oxygen consumption,' AND 'verification testing,' and 'supramaximal.' Additional articles were also identified by using the references lists of selected articles. Inclusion criteria were studies written in English using incremental exercise testing leading to VO 2 max followed by verification testing to confirm attainment of VO 2 max at submaximal up to supramaximal intensities. In addition, studies using participants who have or are at risk for chronic disease were included, which encompassed the following populations: inactive adults or children; adults with obesity; older adults >50 years; and adults or children with underlying disease including cancer, diabetes, cardiovascular disease, etc. These criteria were chosen as a recent review paper extensively summarized the efficacy of verification testing in healthy adults [18]. Studies were excluded if submaximal protocols were used to assess VO 2 max, as well as those not acquiring gas exchange data.

Outcomes Identified
From each article, we extracted the following information: The traits of the participants including age, health status, physical activity status, and body mass index, which was calculated from height and mass if not presented. In addition, we denoted the exercise modality completed, as well as the specific traits of both the incremental and verification test as well as the recovery duration between these tests. As far as the physiological outcomes, we identified the relative VO 2 max from each protocol, as well as HRmax and test duration of the incremental and verification test.

Data Analysis
Results are presented as mean ± SD where appropriate. Table 1 presents a summary of the 19 studies included in this review, consisting of 783 adult men and women and children. The populations included in these studies were children or older adults (n = 2) who are inactive (n = 2), overweight or obese (n = 5), had cancer (n = 1), congestive heart failure (n = 1), metabolic syndrome (n = 1), hypertension (n = 2), cystic fibrosis (n = 3), spina bifida (n = 1), or had spinal cord injury (n = 2). Across participants, age ranged from preadolescent up to adults over 60 years of age. Seven studies contained participants who were inactive, and five studies had participants who were recreationally active. Eleven studies included participants with BMI values greater than 24.9 kg/m 2 , whereas seven studies included participants with BMI below this value.  Table 2 denotes the methods used to assess VO 2 max from graded exercise testing and the subsequent verification test. Fourteen studies utilized primary (VO 2 plateau) and secondary criteria (RERmax, HRmax, RPE, and/or blood lactate concentration) to verify attainment of VO 2 max from GXT, although five studies did not report that any VO 2 max criteria were used. Cycling was the modality used in 14 of 19 studies, with 1 study employing arm ergometry [23] and 4 studies using treadmill exercise in overweight to obese adults [12], adults with hypertension [22], athletes with spinal cord injury [24], and children with spina bifida [25]. The most widely used protocol to assess VO 2 max during GXT was a traditional ramp test (n = 10 studies), although in nine studies, a step incremental test was used. Studies were characterized by various intervals between protocols, with durations as brief as four minutes to as long as a few hours between tests. Two studies required VER to be performed 24-48 h after completion of GXT.   As far as the intensity of VER, 2 studies used a submaximal protocol [16,26], 15 studies used supramaximal work rates ranging from 105-115% PPO or above maximal TM velocity, and 3 studies [12,26,27] used workloads equivalent to PPO. Eight studies included specific criteria to identify differences in VO 2 max between protocols which were developed through reliability testing or predicted changes in VO 2 for the change in work rate. Table 3 denotes VO 2 max values measured in response to GXT and VER for the studies included in this review. Results from 13 of 19 studies [5,[9][10][11][12][13][23][24][25][28][29][30]33] revealed no significant difference in mean VO 2 max between protocols, although in 7 of these studies [9,10,12,13,25,28,29], individual participants revealed meaningfully higher VO 2 max (≥3% higher) with VER compared to GXT. Nevertheless, in six studies [14,22,26,27,31,32] the VER-derived VO 2 max was significantly higher than GXT, with participants' VO 2 max ranging from 19-40 mL/kg/min. In one study in cancer patients [21], VER-derived VO 2 max was significantly lower than from GXT.   VO 2 max = maximal oxygen uptake; GXT = graded exercise test; VER = verification test; HR = heart rate; NO = normal weight; OB = obese; * = p < 0.05 between protocols.

Differences in HRmax between Ramp and Verification Testing
HRmax values from GXT and VER are demonstrated in Table 3. Similar to VO 2 max, the majority of studies exhibit no differences in maximal HR between protocols. Results from one study in obese adults [12] revealed a higher HRmax in response to VER, although another study [9] showed lower HRmax with VER versus GXT. Table 3 shows durations of VER reported in the studies. The shortest duration was equal to 1.5 min [13], with this VER protocol lasting up to 7 min in obese adults performing this bout at 80% PPO [26]. Twelve of nineteen studies were characterized with VER duration less than 3 min [9,[12][13][14][15]21,23,[26][27][28][29]32], with five studies having duration equal to or less than 2 min [12,13,15,23,28].

Discussion
Despite the widespread testing and application of VO 2 max in the fitness, clinical, and research setting, there is no universal approach to confirm its attainment from graded exercise testing. Verification testing is another widely adopted method to perform this function, yet it has been criticized for requiring an additional intense effort that may be inappropriate in those who are not active or healthy. A prior review by Poole and Jones [34] emphasized the widespread implementation of verification testing to identify a 'true' VO 2 max rather than 'VO 2 peak' in healthy active adults. In contrast, recent work [19] in active young and older men concluded that verification testing is unnecessary due to lack of differences in mean VO 2 max between the incremental and verification-derived value. The current review adds to this dogma by summarizing existing results from a large population of unhealthy adults and children completing verification testing following a GXT. Obtaining the most accurate VO 2 max value in this population is vital as it may lead to misrepresentations in their health status or responsiveness to training, which may in turn lead to inappropriate courses of treatment. Results reveal that most studies show no differences in aggregate VO 2 max between protocols. However, six studies show that VER elicits significantly higher estimates of VO 2 max, which supports its use when utmost accuracy is required in determining a 'true' VO 2 max on that day of testing.
Identifying differences in VO 2 max between GXT and VER requires that scientists are aware of the magnitude of error in VO 2 max estimation for both protocols. The error inherent in repeated VO 2 max testing ranges from 2-9% [7,14,28,35], with the error in acquiring gas exchange data from a metabolic cart being small (40 mL/min for the Parvo Medics system). This suggests that the remainder of the error is biological and likely related to participants' ability and motivation to tolerate near maximal exercise. We recommend that scientists perform repeated testing to develop typical error values for their lab and use these values when comparing individual VO 2 max values between protocols rather than only comparing aggregate values. This approach, albeit time intensive, is preferred since relying on other laboratories' criterion values is inappropriate due to differences in exercise protocol, equipment, patient population, pre-test dietary and physical activity restrictions, and time averaging intervals, which likely induce small changes in oxygen uptake.
A primary criticism of supramaximal VER testing is that this effort is too intense for inactive, unhealthy, or deconditioned adults to tolerate, resulting in a very brief duration of exercise and greater potential to not attain VO 2 max due to slow O 2 kinetics. However, data from multiple studies [12,15,28,29] using supramaximal VER with exercise duration <2 min exhibit no differences in VO 2 max between protocols, similar to studies [9,10,29,30] in which VER duration lasted between 2-4 min. A recent study in hypertensive adults [22] used a multi-stage verification protocol eventually requiring a supramaximal workload. Results showed a significant underestimation of mean and individual VO 2 max values in response to GXT compared to VER. In nine obese adults with VO 2 max equal to 35 mL/kg/min [26], VER at 105% PPO elicited significantly lower exercise duration (167 s) compared to VER at 80% PPO (418 s), although there was no difference in VO 2 max between tests. However, VER performed at 80 (+0.16 L/min, 5% higher) and 90% PPO led to a higher VO 2 max value (+0.24 L/min, 7% higher) versus GXT, although this latter result was a trend (p = 0.06). Bhammar et al. [29] reported that a minimum exercise duration to attain a plateau in VO 2 in response to VER in patients with hypertension was 80 s. These results seem to indicate that the appropriate or minimum duration required to allow attainment of 'true' VO 2 max using VER in unhealthy adults and children is similar to that recommended for healthy and active individuals. Thus, it is possible that submaximal intensities or multi-stage protocols may optimize VO 2 max values compared to GXT, although additional work in larger samples is needed to confirm this result.
Our review corroborates results from healthy, fit adults [17,36] showing no difference in HRmax between GXT and VER. However, a subset of data presented in this study [36] from participants with average cardiorespiratory fitness, exhibited significantly lower HRmax (−3 b/min) in response to VER compared to GXT. This is likely a result of the stepwise protocol used in this study that is characterized by a work rate less than PPO eliciting VO 2 max combined with a relatively long exercise duration (~20 min) versus the traditional 8-12 min ramp protocol. In contrast, obese adults performing VER at 100% PPO expressed significantly higher HRmax (+3 b/min) versus GXT [12], which may be attributed to their unfamiliarity with vigorous exercise during the initial incremental bout. To identify a 'true' VO 2 max, Midgley and Carroll [37] denoted a difference in HRmax < 4 b/min between GXT and VER. This value encompasses the magnitude of differences in HRmax described in the above studies, so it is likely that these discrepancies in HRmax between protocols are not clinically meaningful.
Considerations as to the exact characteristics of the recovery interval between GXT and VER include the intensity of the verification test, duration of GXT, cardiorespiratory fitness of participants, as well as a potential need to reduce the overall time of the session. Our review ( Table 2) shows durations as brief as 2-5 min between protocols [24,25,27,30,31], 5-15 min [13][14][15]22,24,28,32,33], to as long as several days between protocols [10,27]. A recent systematic review [17] concluded that there was no effect of recovery interval on the difference in VO 2 max between protocols, which would suggest that any duration is appropriate. It is also apparent that some studies require an active recovery between protocols [13,15,23,31], whereas a passive recovery is completed in other investigations [11,14,22,25,29,32]. We recommend that scientists perform preliminary testing to identify an optimal recovery protocol for their specific population, and if this is implausible, then we recommend that they duplicate previously used procedures for that population.
Verification testing is only appropriate to identify 'true' VO 2 max if it is safe and welltolerated by the participant completing exercise testing. This factor is especially critical in persons unfamiliar with vigorous exercise who may face enhanced risk of complications during vigorous exercise. In male and female survivors of cancer, Schneider et al. [21] reported no adverse events in their participants performing VER at 110% PPO. Furthermore, use of VER in adults with heart failure [15], hypertension [29], and metabolic syndrome [14] was described as "feasible" and "well-tolerated" in these populations at risk for or having heart disease. In children with cystic fibrosis [33], it was labeled as "safe." Although further work is needed to substantiate this, empirical results suggest that VER following GXT is a safe and well-tolerated procedure that does not induce contraindications to exercise testing in persons who are inactive, have known disease, or exhibit enhanced risk of cardiometabolic disease. This guideline encompasses all VER protocols requiring efforts at submaximal, maximal, or supramaximal work rates. The only disadvantage to VER seems to be the extra time commitment required of approximately 15-20 min, including the recovery between protocols. However, this extra time is acceptable if the primary goal of testing is to acquire the most precise estimate of VO 2 max, which is critical in "at-risk" individuals when VO 2 max testing is used to identify health status or determine the effects of exercise training.
There are a few limitations to this review. First, the marked diversity in patient populations used and the specific GXT and VER protocol completed preclude us from making universal recommendations regarding an optimal verification test. Nevertheless, it seems that submaximal or supramaximal work rates can be employed with little difference in resultant VO 2 max values expected versus GXT. Second, with exception of a few studies [11,14,21,31], the sample size of individual studies is relatively small, which reduces the generalizability of these findings. Consequently, we recommend that scientists follow experimental procedures used in single studies that utilized their target population. Third, the use of VER following GXT likely elicits the highest estimate of VO 2 max on that day, yet it is possible that additional testing on subsequent days could elicit higher estimates of VO 2 max, as recently shown [38]. However, requiring multiple sessions of exercise including GXT and VER on many days may not be appropriate in unhealthy participants due to time and health related challenges.

Conclusions
In conclusion, results from this review demonstrate that verification testing typically leads to similar estimates of VO 2 max versus prior incremental exercise in unhealthy adults and children having a range of conditions that diminish health status and overall function. This result is informed from verification testing requiring submaximal, maximal, and supramaximal intensities, and it is apparent that each protocol is able to verify VO 2 max attainment in this particular sample. However, many participants reveal higher VO 2 max in response to VER compared to GXT, which substantiates its use when the most accurate estimate of VO 2 max is needed. Moreover, it is a safe and well-tolerated protocol that does not induce contraindications to exercise, and its only shortcoming is the additional time required of the participant. It is evident that some individuals do show higher VO 2 max in response to verification testing. This merits implementation of this additional test when detecting small differences in VO 2 max are paramount, for example, to identify potential health risks or describe the efficacy of exercise training in specific clients to augment health status. Failure to do so may lead to inaccurate courses of treatment which may diminish health status of patient populations.