1. Introduction
While body weight is tracked throughout the lifespan, body composition is the more important indicator of overall health. High percent body fatness is associated with negative health implications such as hypertension, diabetes, cancer, and cardiovascular disease. Low percent body fatness is associated with malnutrition, osteoporosis, osteopenia, and muscle wasting [
1]. Knowing body composition can assist in tracking training goals and interpreting overall physique; however, every method of body composition analysis poses unique advantages and disadvantages. Field methods of determining body composition include anthropometric measurements such as body mass index (BMI), waist circumference, waist to hip ratio, skinfolds, single frequency bioelectrical impedance analysis (BIA), and multifrequency BIA [
2,
3,
4]. Lab methods of determining body composition include hydrostatic weighing (HW), air displacement plethysmography (BODPOD), isotope dilution method, dual-energy Xray absorptiometry (DEXA), computed tomography (CT), computed tomography body composition, magnetic resonance imaging, and whole-body potassium counter [
2,
3,
4]. With an array of options, deciding which method to use depends on accessibility of the equipment, one’s financial means, and the degree of accuracy desired. DEXA is the clinical gold standard and while comfortable, the procedure emits radiation that poses risk to the participant. It is also the most expensive piece of equipment. Due to this limitation, HW has been a trusted and valid method of determining body composition for decades within laboratory settings [
5]. The accepted standard of performing HW requires full submersion of the participant while at residual volume (RV). This can cause physical and psychological discomfort which can deter participation as it is an unnatural and uncomfortable feeling [
6]. Therefore, previous groups have studied ways to minimize these difficulties and discomforts while still maintaining the validity of HW.
RV is the standard procedure because it is the lung volume least affected by hydrostatic pressure. Total lung capacity (TLC) is most affected by hydrostatic pressure on the lungs; however, Weltman and Katch [
7] found that HW at TLC required fewer trials compared to RV. Limiting time in the water by fewer total trials, along with using a more natural lung volume is beneficial in making participants feel more comfortable about the body composition assessment. It has also been shown by previous studies that full body submersion is not needed. Israel et al. [
8] found that keeping the head above water is a valid alternative to full submersion in morbidly obese females. Donnelly et al. [
9] also found that due to increased buoyancy, 25% of obese participants could not fully submerge. Developing more comfortable ways to accurately measure body composition via HW will allow access to a wider and more diverse population of participants. Recently, Tesch et al. [
6] has developed a head volume prediction (HV
PRED) equation with the use of anthropometric measurements, so head submersion is not necessary. However, this equation needs further validation. Due to the discomfort caused by the gold standard of hydrostatic weighing and the simplicity of incorporating head volume measurements, the purpose of this study was to find a more comfortable way to complete HW. To accomplish this, three tests were done: (1) the concordance of HV
PRED equations [
6] with head submersion, (2) the validity of TLC using RV during HW, and (3) the validity of HAW@TLC (the most comfortable technique) using HBW@RV (the most unnatural and uncomfortable technique).
3. Results
Physical characteristics, separated by sex, for 122 participants are shown in
Table 1. The complete data set was used for analysis except for two individuals in which the participants did not follow the RV methods, disqualifying their data, therefore, all RV data
n = 120. Sex, preferred pronouns, age, ethnicity, and SGPALS score were self-reported. The majority of the sample was White (88.5%) between the ages of 18 and 24 (84.4%) who were physically active, defined as a SGPALS score of 3 or 4 (76.2%). Height, weight, head girth, and face girth were measured. Out of the 122 participants, only 10 (8.2%) required a third head girth measurement and none required a fourth. Regarding face girth, 24 participants (19.7%) required a third face girth measurement and only one participant required a fourth.
Table 2 depicts the means and standard deviations of the first and second consistent HG and FG measurements. The measured HG and FG were not significantly different between the two trials within 5 mm (
Table 2). When selecting the samples that best represented the participant’s underwater weight the mean ± SD number of samples selected for HAW@TLC was 100.05 ± 1.13, HBW@TLC was 100.24 ± 1.18, and HBW@RV was 100.26 ± 1.08. While the minimum number of samples selected throughout the study was 96 and the maximum was 109, over 95% of the samples for each trial were within the range of the goal of 98–102 samples (100 ± 2).
In comparison 1, the only variable changed was head position (HBW@TLC vs. HAW@TLC), which specifically tests the accuracy of the PBF w/HV
PRED equation suggested by Tesch et al. [
6].
Table 3 shows that HAW@TLC resulted in statistically significantly higher mean PBF measurements than HBW@TLC, both overall and separately for males and females. The Bland-Altman plots (
Figure 4A–C) illustrated no evidence that the difference in calculated PBF between head position was proportional, both overall and separately for males and females, as there was no significant relationship between the difference in these metrics and the average of the metrics (
p-value for proportional relationship > 0.05). The LCCC values were >0.8, indicating the percent body fat values have very strong concordance, both overall and separately for males and females, when using the standards of interpretation for Pearson’s correlation coefficient (
Figure 5A–C).
In comparison 2, separately for males and females (
Table 3). The Bland-Altman plots (
Figure 4D–F) demonstrated that the difference in calculated PBF with a change in lung volume was statistically proportional for females (
p-value for proportional relationship = 0.019,
Figure 4F), but not for males (
p-value for proportional relationship = 0.236,
Figure 4E), thus driving the overall proportional bias (
p-value for proportional relationship = 0.012). Specifically, overall and for females, HBW@RV appeared to yield higher PBF estimates than HBW@TLC for people with less body fat, and lower estimates for people with higher body fat.
Figure 5D–F further illustrate that these two measures had the greatest amount of discordance with Lin’s CCC between 0.72–0.77, demonstrating that changing lung volume affects calculated percent body fat, even with the head below water.
In comparison 3, both head position and lung volume changed (HBW@RV and HAW@TLC). HAW@TLC yielded statistically significantly lower mean PBF measurements for males (
p-value = 0.003) but no statistically significant difference overall (
p-value = 0.175) or for females (
p-value = 0.389,
Table 3). Although there was a statistically significant difference for males, this difference was only 1.5% lower mean PBF from HAW@TLC; this difference is not clinically significant. The Bland-Altman Plot for males (
Figure 4H) demonstrated there was no evidence that the difference in calculated PBF with a change in lung volume and head position was statistically significantly related to PBF (
p-value for proportional relationship = 0.347).
Figure 4I seemed to suggest proportional disagreement in females’ PBF from HBW@RV and HAW@TLC (
p-value for proportional relationship = 0.017); however, this statistically significant relationship was driven by an outlying participant with low PBF. Omitting this outlier resulted in a no-longer-significant relationship between the difference in these estimates and the average PBF (
p-value with outlier removed = 0.055). In
Figure 5G–I LCCC values are >0.8 overall and separately for males and females, indicating the percent body fat values have very strong concordance when using the standards of interpretation for Pearson’s correlation coefficient.
4. Discussion
Comparison 1 specifically tested the accuracy of the HV
PRED equations developed by Tesch et al. [
6] by looking at the difference in PBF with HAW and HBW while lung volume remained constant. The results of comparison 1 show that head position does matter when measuring PBF at TLC. As the results stated, HAW@TLC produced higher PBF values when compared to HBW@TLC, both overall and separately for males and females. The statistically significant (
p < 0.05) differences in mean PBF in this study are contradictory to past studies that compared HBW and HAW weighing.
A study done by Evans et al. [
20] revealed that mean PBF with the HAW was higher than mean PBF with the HBW by a mean difference of 0.66% (
p > 0.05). Similarly, Israel et al. [
8] found that PBF with the HAW was 0.66% higher than PBF with the HBW (
p > 0.05). A study done by Donnelly et al. [
9] found a 0% mean difference in PBF in males and a 0.7% (HAW higher) mean difference in females. As shown in the studies by Evans et al. [
20], Israel et al. [
8], and Donnelly et al. [
9] no statistically significant difference in PBF was found when changing head position and keeping lung volume the same. However, a study completed by Heath et al. [
21] found that PBF with the HAW was lower than PBF with the HBW by 2.8% in females (
p < 0.0001) but only 0.1% lower in males. The mean difference in PBF was found to be statistically significant in females but not in males. Demura et al. [
22] found a mean difference in PBF of approximately 5% higher in HAW. These results were statistically significant. Lastly, the study done by Tesch et al. [
6] found no statistically significant differences in the male experimental group, male validation group, female experimental group, or female validation group.
From the previously listed studies, only two out of the six populations produced statistically significant differences in PBF. The low number of statistically significant results contradict the present study’s finding of statistically significant results. However, five out of the six listed populations were niche populations. For example, three out of the six populations included only individuals who were female and morbidly obese. The majority of these studies also used differing equations, indicating that an accepted equation for the general population is yet to be determined. In 1988, Donnelly et al. [
9] developed equations for HAW weighing that produced non-statistically significant results. However, when Demura et al. [
22] sought to validate these equations a statistically significant difference in PBF of approximately 5% was found. As it has been mentioned, the present study used equations developed by Tesch’s group [
6]. The lack of statistically significant results from Tesch’s study [
6] contradict the present study. However, these equations were both produced and validated within the same study. The validation group for Tesch’s study [
6] was half the size of the population of the present study. Tesch et al.’s [
6] validation group consisted of 21 males and 24 females, whereas the present studies population consisted of 64 males and 58 females. The difference of statistically significant results between two studies using the same equation is similar to the occurrence of Demura et al. [
22] validating Donnelly et al.’s [
9] equations and finding a 5% difference.
Comparison 2 specifically tested the measurement of PBF at different lung volumes (TLC vs. RV) with the head fully submerged. In the current study, it was found that TLC resulted in lower PBF by approximately 5%, and that this disagreement was roughly equivalent for both males and females. These findings contradict previous research such as Weltman and Katch [
7] who observed a mean difference in percent body fat between TLC and RV of 0.9% for females and 0.5% for males (
p = 0.05), both of which are within the measurement error for HW of ±1.5% [
14]. Another study by Warner et al. [
23] found that TLC compares favorably to RV, with statistically significant differences (
p = 0.001), as the differences in the methods were not clinically significant (PBF = RV: 16.34%, TLC: 15.47%). Latin and Ruhling [
24] observed a mean difference in PBF of 1.1% between RV and TLC. This is statistically significant (
p < 0.05) but the difference is not clinically significant. Overall, the current study contradicts previous research because it was found that lung capacity plays a significant role when measuring PBF.
Comparison 3 evaluated the most comfortable method of HW (HAW@TLC) to the gold standard which is the least comfortable method of HW (HBW@RV). In this comparison, both head position and lung volume changed. According to the current results, comparison 1 resulted in a higher PBF from HAW for all participants by approximately 5%. Comparison 2 then resulted in a lower PBF from TLC for all participants by approximately 5%. When HAW was paired with TLC in comparison 3, there was no significant difference compared to the gold standard. Therefore, the current results suggest that when using the HV
PRED equation [
6] with TLC, the equation produced an accurate measurement of PBF compared to the gold standard (HBW@ RV).
The findings from comparison 3 indicate that this more comfortable method of HW is an acceptable method for determining body composition. This method increases participant comfort in the water by allowing the head to remain above water and inhale rather than exhale. Furthermore, it is simple and inexpensive to take head measurements, easy to cue participants into position without needing to adjust the scale up or down, and clear communication can be had with participants throughout the trials. Allowing full communication between the participant and technician also alleviates many of the anxieties participants often feel. The benefits of HAW@TLC allow a wider variety of individuals to participate in HW, including individuals who are obese and are not physically able to fully submerge [
8,
9].
Limitations of this study include the lack of diversity of participants, setting a maximum number of trials for each condition, and not using a spirometer to measure lung volumes. Despite efforts to recruit a diverse sample of participants, this study included participants that were mostly White, 18–24 years-old, and were physically active. Therefore, the generalizability of the current results is limited. Future research should include wider diversity in race, age, physical activity, and those with extremely low and high PBF. The maximum number of trials was set because the main goal was participant comfort. The TLC trials were capped at seven and RV trials were capped at five based off the study done by Bonge and Donnelly [
25] which found that as many as 10 trials are not needed and the first three trials within 100 g provide accuracy. By capping the number of trials, it minimized participant fatigue while maximizing the learning effect to increase consistency in results. However, setting a maximum number of trials resulted in some participants not obtaining three consistent values within 100 g for each condition. In comparison 1, 27.0% (
n = 33) of the participants achieved only two consistent measurements instead of three. In comparison 2, 24.6% (
n = 30) achieved only two consistent measurements instead of three. In comparison 3, 1.7% (
n = 2) did not achieve two consistent measurements, and 28.0% (
n = 33) did not achieve a third consistent measurement. Therefore, it is possible the data could be more consistent; however, after extensive pilot testing, we did not notice an improvement in consistent results (participant got fatigued) and thus chose to cap at seven trials for TLC and five for RV. Another difference between previous research and this study was the use of a spirometer. Without the use of a spirometer in this study, there was a greater chance of human error in trusting that the participants followed protocol of maximally inhaling for TLC and maximally exhaling for RV. The use of a spirometer would further solidify the accuracy of the data collected in future studies, by measuring exact lung volumes used. But spirometers are not extensively used in lab-based settings; thus, we wanted to mimic what would happen in real life body composition testing. Adding the use of a spirometer would have been helpful; however, it would not mimic “typical” HW testing. Further, the acceptable criteria were measurements within only 100 g, therefore demonstrating the consistency of lung volumes.
While there were limitations, pilot testing for this study was extensive. Every procedure implemented was rigorously pilot tested to ensure the most accurate data possible while keeping participant comfortability in mind. All researchers were instructed by an International Society of the Advancement of Kinanthropometry (ISAK) member on how to properly take HG and FG measurements. With extensive practice, a technician can become proficient in measuring HG and FG using simple and affordable equipment. Tesch et al. [
6] used five individual head measurements, however, only head girth and face girth showed the highest individual correlations when correlated with the mass of water displaced by the head. Therefore, these were the only two measurements the technicians learned and added only 2–3 min to the total data collection time for participants, making it a viable alternative to head submersion. Additionally, several lung volumes were pilot tested based off previous research, including functional residual volume (FRV) [
5], TLC [
26], and the gold standard RV. FRV was too difficult to standardize without the use of a spirometer. Several attempts during pilot testing were contrary to results by Thomas and Ethridge [
5] which found no difference in PBF when using FRV and RV, and thus was not used in the study. However, based on the study of Timson and Coffman [
26] TLC was demonstrated to be a viable alternative to RV and was chosen in the current study for the ability to standardize the lung volume with “a maximal inhale” verbal queue. The pilot data demonstrated more consistent PBF results when cueing for TLC than FRV and two participants were able to achieve consistent results in only three trials, 11 achieved it within four trials, 11 achieved it within five trials, and 31 achieved it in six trials, with 67 individuals needing all seven trials. When using RV, 25 participants achieved three consistent measurements within the first three trials, 34 participants required a fourth trial, and 59 required a fifth trial. From the pilot data, TLC and RV were most consistent which made sense since they are at the extreme ends of lung volumes and not somewhere in the middle. Additionally, asking participants to inhale and hold their breath before going under water is a much more comfortable and natural feeling; thus, TLC was used instead of FRV.