Head-Out Water-Based Protocols to Assess Cardiorespiratory Fitness—Systematic Review

The aquatic environment offers cardiorespiratory training and testing options particularly for individuals unable to adequately train or test on land because of weight bearing, pain or disability concerns. No systematic review exists describing cardiorespiratory fitness protocols used in an aquatic environment. This review investigated the different head-out water-based protocols used to assess cardiorespiratory fitness. Our comprehensive, systematic review included 41 studies with each included paper methodological quality assessed using the statistical review of general papers checklist. Diverse protocols arose with three major categories identified: conducted in shallow water, deep water, and using special equipment. Thirty-seven articles presented data for peak/maximal oxygen consumption (VO2peak/VO2max). Twenty-eight of 37 studies predefined criteria for reaching a valid VO2peak/VO2max with shallow water exercise demonstrating 20.6 to 57.2 mL/kg/min; deep water running 20.32 to 48.4 mL/kg/min; and underwater treadmill and cycling 28.64 to 62.2 mL/kg/min. No single, accepted head-out water-based protocol for evaluating cardiorespiratory fitness arose. For clinical use three cardiorespiratory fitness testing concepts ensued: water temperature of 28–30 °C with difference of maximum 1 °C between testing participants and/or testing sessions; water depth adapted for participant aquatic experiences and abilities; and intensity increment of 10–15 metronome beats per minute.


Introduction
Cardiorespiratory fitness (CRF) is a health-related component of physical fitness defined as the ability of the circulatory, respiratory, and muscular systems to supply oxygen during sustained physical activity [1]. CRF is not only a sensitive and reliable measure of habitual physical activity [2] but also a relatively low-cost and useful client health indicator in clinical practice [3]. Therefore, it is increasingly important to monitor and systematically evaluate cardiorespiratory fitness in persons with disability or chronic disease in clinical rehabilitation and beyond [4][5][6]. Monitoring changes in cardiorespiratory fitness may indicate training and rehabilitation program effectiveness, as well as the development of a physically active lifestyle [7].
Water offers a place for cardiorespiratory training and testing especially for populations who cannot be trained or tested on land because of problems with weight bearing, pain or disability. The head-out aquatic exercises are an important therapeutic component for individuals with physical limitations [8,9], as well as an element of a primary health prevention system, [10][11][12] and elite athlete sport performance conditioning [13][14][15][16][17]. Regular testing of the individuals participating in an aquatic therapy program with standardized cardiorespiratory protocols may give practitioners valuable information for establishing exercise guidelines, monitoring progress and making adjustments in both Inclusion of each paper was based on the assessment of two independent reviewers (AOS and NMA) and full agreement was required. Studies included in our review were published in English and fulfilled the following criteria according to the PICOS system (Population: human; Intervention: not applicable, Comparison: other protocols assessing cardiorespiratory fitness; Outcome: data characterizing cardiorespiratory fitness; Study design: reviews were excluded).

Quality Assessment
Methodological quality was assessed using the statistical review of general papers checklist [25] which we adapted for assessing protocols. All three authors independently assessed the articles. The 15 quality assessment items, scoring one point each, were evaluated and a total score determined. The quality assessment scores included the following areas: overview, purpose, literature, design, sample size reported and justified, protocol thoroughly reported, potential confounders and biases noted, outcomes reliable and valid, including statistical significance and analyses, dropouts, clinical importance and appropriate conclusions. Studies with score 10 (65%) out of 15, or above, were considered to demonstrate sufficient data quality.

Study Selection
In total 1300 titles were found and following removal of duplicates 747 potential studies were included for eligibility screening. Based on title and abstract 643 of these were excluded. The remaining 104 full text articles were read and 62 papers excluded. Two studies presented the same data [26,27] and were merged as . In total we retained 41 studies in this review's qualitative synthesis (Figure 1).

Quality Assessment
The quality assessment of included studies examined biases and reporting accuracy with a standardized checklist administered by our review team [28]. Thirty-five articles demonstrated sufficient data quality (10/15) of 65% (Table S2, Supplementary Materials). Five of the six articles below the sufficient data quality cut-point were published 14 years or more prior to this systematic review and scored 8 (n = 1) and 9 (n = 5). We independently scored the studies meeting our 65% data quality cut off at the following levels: 10 (n = 9), 11 (n = 12), 12 (n = 7), 13 (n = 4), and 14 (n = 3), with average quality score 11.4. All authors clearly stated study purpose, described test protocol in detail, reported results with statistical significance and appropriate analysis methods. Almost all studies (n = 40) reviewed relevant background literature, reported dropouts and gave appropriate conclusions. Many authors described the sample in detail (n = 39), reported clinical importance (n = 37) and provided valid outcome measures of the aerobic test (n = 36). Half of the studies discussed methods for avoiding co-intervention (n = 20) and contamination (n = 19), but only a few authors justified sample size (n = 7), provided reliable outcome measures of the aerobic test (n = 6) and described the study design (n = 5).

Protocol Description
The included papers' analyses revealed varied protocol types measuring cardiorespiratory fitness. We divided them into three groups: protocols conducted in shallow water, deep water, and others that used special equipment like underwater treadmill or bicycle. Shallow water exercise protocols were described in 13 studies, deep water exercise in 16 studies and two studies described protocols in shallow and deep water. Ten studies required special underwater equipment to conduct the protocol. Most papers did not report any pretest screening with some studies using health questionnaires or medical screenings [10,24,26,27,[29][30][31][32][33][34][35][36]. Only one study [37] reported a land treadmill stress test as a pretest screening. Familarization sessions before testing were mentioned by most authors, but more specific details were provided in only seven studies [24,26,27,33,34,[38][39][40]. More than half of the studies did not report any warm up. Indications for test termination were described in most studies; however, they often lacked absolute indications (e.g. the participant's request to stop). Protocol end point mainly occurred when the participant could not produce the cadence required or arrived at volitional exhaustion. The majority of protocols (n = 24) were feasible for active healthy individuals (including professional athletes), eight protocols enrolled healthy participants not necessarily active, seven protocols included individuals with specific needs (patients with coronary artery disease, older adults, individuals with spinal cord injury, overweight adults and individuals with rheumatoid arthritis) and two studies did not characterize participants other than by age and sex.

Shallow Water Exercise (SWE)
Shallow water exercise protocols are described in details in Table 1. In six studies [31,[47][48][49][50][51] running constituted the exercise mode and the other studies employed different movement combinations of frontal kick, cross county skiing, jumping jack, rocking horse, abductor and adductor hops. For the majority of papers metronome use ensured accurate intensity levels. Water temperature ranged between 27.5 to 32 • C and water immersion levels between waist/umbilicus to shoulders. Only six studies reported warm up [29,30,37,46,48,51]. Starting intensity and increasing intensity were based upon metronome setting or rate of perceived exertion (RPE). Metronome starting intensity ranged between 80 and 90 beats per minute and increments per stage ranged between 8 and 15 beats per minute. Time on each effort stage ranged between 1 and 8 min and one study [31] reported duration based on pool length.

Deep Water Exercise (DWE)
Deep water exercise protocols are described in details in Table 2.   Only one study recruited individuals with spinal cord injury using modified deep water running (DWR) [58]. Twelve studies reported participant tethering to maintain upright static positioning. A buoyancy belt and/or vest facilitated participant head above water positioning during testing. Water temperature ranged between 25-33 • C and water immersion level reported between the shoulder and nose. Nine studies reported conducting warm up before testing [24,33,35,40,51,53,54,58,59]. The intensity protocol was driven by a metronome, subjectively set by the participant or set by the pulley system weights. Metronome starting intensity ranged between 72-120 beats per minute and increments per stage 6-30 beats per minute. Only one study, designed to assess cardiorespiratory fitness in individuals with spinal cord injury, was set with starting intensity at 40 beats per minute [58]. Time on each stage of the protocol ranged between 1-4 min.

Other Protocols
Protocols with the use of special equipment are described in details in Table 3.  The protocols with special equipment used underwater treadmill [32,38,60,61,65,66] and underwater bicycle [62][63][64]67] with water temperature of 28-30 • C, with one study [38] ranging between 20.6-35.6 • C. Water immersion level was reported at the xiphoid process, and intensity set by treadmill speed or cycling cadence. In two studies [38,60] additional water jet resistance was used. Five studies reported warm up [38,61,63,65,66]. Costa et al. [63] conducted three protocols with different frontal surface areas while cycling. Time on each exercise stage ranged between 1-3 min.

Peak Outcomes
Authors of 37 studies provided mean VO 2peak /VO 2max values (Table 4) and other test outcomes considered as secondary criteria used when an oxygen uptake plateau was not evident. Varying outcomes in VO 2peak /VO 2max were reported in the included studies. Twenty-eight of 37 studies predefined the criteria for reaching a valid VO 2peak /VO 2max . Varying criteria included: RER above a certain level (>1.0-1.15) (n = 20), reaching a VO 2 plateau (n = 18), attainment of age-predicted maximal heart rate (n = 9), RPE above level of 17-18 in Borg's 6-20 RPE Scale (n = 7), highest observed VO 2 value measured (n = 6), maximal respiratory rate of at least 35 breaths per minute (n = 4) and blood lactate level above 8-9 mmol/l (n = 2). Other criteria similar to the previously described test termination criteria, included general exhaustion [24,33], RPE equals 10 in Borg's 0-10 RPE Scale [63] or inability to maintain the required cadence or load [41,52,55]. One study referred to the American College of Sports Medicine (ACSM) guidelines [61]. The number of required criteria for reaching a valid VO 2peak /VO 2max ranged from one specific criterion (seven studies), through 2-4 criteria (11 studies), to selected 1-3 criteria from the list of 3-5 criteria (nine studies).

Discussion
This systematic literature review provided an overview of the published head-out water-based protocols used to assess cardiorespiratory fitness. The results indicated varied head-out protocols were used in the aquatic setting to assess cardiorespiratory fitness. The majority of tests were conducted in a non-laboratory setting, using clinical or performance exercise incremental step test protocols for an aquatic environment. Based on our analyses we provide the following suggestions for testing cardiorespiratory fitness in water, which may help researchers and clinicians.
The physiological responses to head-out water-based exercises are temperature dependent [18]. In the analyzed papers the water temperature ranged between 25 and 33 • C. The thermoneutral temperature during exercise is considered between 28 and 30 • C. The temperatures below and above this level additionally impact the physiological response, particularly vasoconstriction and vasodilation. Maintaing the same water temperature for all the participants is also crucial. Most analyzed protocols were conducted in the water with a temperature difference of 1 • C. One study [38] reported water temperature difference of 15 • C for participants. Secondary to water temperature's influence upon physiological responses, large temperature differences should be avoided during testing protocols. The recommended water temperature for cardiorespiratory fitness testing is 28-30 • C with difference of maximum 1 • C between testing participants, which the analyzed studies mainly incorporated.
Another physiological response modulator was water depth and resultant hydrostatic pressure [18]. The biggest water depth differences were found in the shallow water protocols. The water level was set between waist/umbilicus to shoulders. Oxygen uptake and energy expenditure are lower at breast immersion when compared with hip immersion [68]. Heart rate decreases significantly with the increase of body immersion [69] with physiological demand apparently lower for deep-water versus shallow-water exercises [70]. Only one study compared these two modes of exercise and provided cardiorespiratory fitness values [47] confirming higher values for shallow water running. Deep water protocols are better suited for participants who are already familiar with exercising in deep water or professional running, in the case of DWR. In our review, the highest values of mean reported VO 2peak /VO 2max achieved in DWR protocols were observed for individuals trained in water running [39], active males [54] and male runners [59]. On the other hand, deep water testing is a choice for people who are unable to perform high intensity cardiorespiratory movement in shallow water, such as individuals with spinal cord injury [58].
Interestingly, when the pattern of movement and protocols accurately matched water, and on land, no differences in VO 2peak /VO 2max values occurred between conditions [18]. This aligns with the present review findings of underwater and land treadmills [32,38,61,66], with stationary water running and stationary running on land [50] and with water cycling and bicycle ergometer [67]. Therefore, a determining factor for the VO 2peak /VO 2max pattern is the mode of exercise performed rather than inherent water properties.
In our review, mean reported VO 2peak /VO 2max values achieved in SWE protocols differed when varied movements were performed [41][42][43]45]. For these studies we reviewed exercise type to understand the 10-15% lower VO 2peak /VO 2max achieved. No patterns arose except with varied exercise, it may be more difficult to maintain exercise intensity stage even with metronome guidance. It seemed wider movements like jumping jacks or abductor/adductor hops might be more difficult in water with higher intensity due to water resistance. Running movement is natural for humans and based on the critically assessed data we suggest that cardiorespiratory fitness testing in shallow water incorporates running. However, with this limited data no final conclusions can be drawn.
The analysis of criteria used for achieving VO 2peak /VO 2max during head-out water-based exercise protocols indicated lack of uniformity for both plateau definition and secondary criteria for reaching a valid VO 2peak /VO 2max . Similar to the land environment it is common in water for participants to complete a maximal graded exercise test and fail to achieve a plateau in VO 2 [71]. Thus, secondary criteria variety for reaching a valid VO 2peak /VO 2max exists in head-out water-based protocols. However, no general agreement on specific secondary criteria alone, or in combination, arose; moreover, criteria selection appeared arbitrary without justification and/or scientific evidence. The terminology associated with measuring maximal oxygen uptake in water environment was also inconsistent. The interchangeable use of terms VO 2max and VO 2peak neglects original definitions and adds confusion in the literature.
Correctly setting the intensity and increments per testing stage is one of the factors affecting cardiorespiratory fitness values. Most studies used a metronome as an objective tool to conduct the test, using small to modest individualized increments per stage, resulting in test completion between 8 and 12 min [2], the increments per stage ranged between 6 and 30 beats per minute. The time until exhaustion, in the study which used 30 beats per minute increment, was 7.3 min [24], which according to the guidelines might not be enough to reach VO 2peak individual values. Based on this literature review, use of 10-15 beats per minute increments while using the metronome is optimal. The starting intensity requires individual adjustment to the population tested.
Appraising study quality is difficult but necessary. We provided quality scores but did not create a high or low binary categorization [72]. The three consistent quality areas of concern included: study design type, sample size justification and reliability of outcome measures. Future researchers and clinicians need to carefully analyze all data to determine which head-out of water cardiorespiratory protocol best fits their needs.
Generally, in studies analyzing cardiorespiratory fitness protocols used for land exercise, testing does not always comply with exercise testing guidelines [7,71] and parallels our findings as only two studies [32,61] followed the commonly accepted ACSM exercise testing guidelines. To implement testing protocols in the aquatic environment the following standard steps are required: pretest screening to identify contraindications for maximal exercise, familiarization session, indications for test termination, warm up, load increments per stage (resulting in completion of the test between 8 and 12 min), and criteria used to verify achievement of VO 2peak /VO 2max . Additionally, these testing components require modification according to the individual exercise test purpose and participants' ability.

Limitations
This review included only articles in English found in five databases. It is possible that some relevant studies were not included in the search strategy used. However, with the detailed search terms used, the screening performed by two independent reviewers, and only three articles excluded due to language, the risk of selection bias was limited. The adapted methodological quality assessment with an arbitrarily set data quality cut point potentially was influenced by reporting and interpretation bias. The selected studies' heterogeneity included differences in study protocols, outcome measures, and statistical and analytical methods which limited advanced comprehensive statistical analyses and interpretation.

Conclusions
Analyzed protocols were highly diverse and no one broadly accepted head-out water-based protocol exists to evaluate cardiorespiratory fitness and compare results. However, three groups were differentiated: protocols conducted in shallow water, deep water, and others that used special equipment. Moreover, based on analyzed protocols, three key testing proprieties for cardiorespiratory fitness testing can be suggested for clinical use: water temperature of 28-30 • C with difference of maximum 1 • C between testing participants and/or testing sessions, water depth adapted for participant aquatic experiences and abilities, and intensity of 10-15 metronome beats per minute increment. The available data supported tat RPE, HR, and VO 2peak /VO 2max should all be included in aquatic environment cardiorespiratory fitness testing, similarly to land testing. Future research is needed to test methodological standardization in which protocols can be individualized to specific populations, and to determine reliability and validity of the specific protocols. Furthermore, consensus regarding reporting test procedures, outcomes and proprieties in terms of gold standard is necessary to enhance comparison and understanding of intervention results.

Conflicts of Interest:
The authors declare no conflict of interest.