Human Hydration Indices: Spot Urine Sample Reference Values for Urine Concentration Markers in Athletic Populations

Background: Reference values and confidence intervals for the hydration indices of a large athletic population are currently lacking. Methods: Urine indices were gathered from an athletic population (n = 189) based on spot-urine samples. Results: High urine concentration was associated with a low volume and short void duration. When stratifying the data, differences for urine volume were seen for race and ethnicity and for athletic affiliation (p < 0.05), but no differences were found for urine concentration markers or volume for time of day of collection, thirst sensation, or age (p > 0.05). When classifying urine samples for a low vs. a high urine concentration by scoring urine color (Uc), the athletic population reported a slightly lower accuracy (4–7%) compared to investigators (p < 0.02). Subjects scored samples as lighter than the investigators, with a higher misclassification of the more concentrated urine samples. Conclusions: In this convenience sample of a predominantly young athletic population, urinary indices did not differ for subgroups within a large athletic population aside from some difference for race and ethnicity on urine volume. Although well-trained investigators reported better accuracy for Uc scoring, both athletes and investigators reported the highest accuracy for correctly classifying samples with a very low or a very high urine concentration.


Introduction
It is recommended that athletic populations should monitor their hydration status [1]. However, the assessment of hydration status is complex, as individual indices show only a limited part of the dynamic and complex fluid matrix [2]. A wide range of publications has explained the value of different hydration markers [3][4][5][6], but normative data for athletes reporting urinary indices are missing.
Low water intake results in a high urine concentration [7], a useful marker of suboptimal hydration status [8]. A large number of studies have used a cut-off value of 800 mOsm/kg indicating a poor hydration status [8][9][10][11][12]. A smaller number have assessed urine osmolality and USG at the same time, confirming that a 1.020 USG value often matches urine osmolality values ranging from 700 to 830 mOsm/kg [6,8,11,13]. In addition, urine color has been suggested as a reasonable replacement for the direct use of urine concentration, especially for the self-assessment of athletes or others interested in their hydration status [14][15][16].
Other urine-based markers, such as a high urine volume and long urine void duration, have been linked to hydration status [17,18]. Although spot urine samples can be assessed for volume and void duration [19], in general, 24-h urine collection is seen as the most reliable when assessing concentration [20]. The problem is that 24-h collections are cumbersome, and often there is insufficient time to collect a 24-h urine sample [21,22]. On the one hand, it has been suggested to collect a first morning urine sample, as this allows for standardization (i.e., a urine sample collected directly after a full night of sleep, being fasted without having any exercise) [20]. On the other hand, spot morning urine samples tend to result in much higher urine concentration values in comparison to 24-h urine collections [6].
Thirst [23] as well as bodyweight change from baseline have been suggested as markers of hydration status [24]. True thirst correlates with a high urine concentration [25], but normally only occurs after a bodyweight fluid loss >2%, as such urine concentration is able to detect much more subtle changes in fluid balance [6,26]. Urine concentration may be affected by body mass, urging the need for cautiously selecting urine concentration cut-off values to detect underhydration or dehydration at population level [27]. Normally, urine consists of~4% solutes, with 60% protein metabolites, potentially affected by muscle protein breakdown or protein intake [28]. It has been shown that muscle mass [29] or a diet high in protein [30] can augment urine protein metabolite concentration, which may then increase USG levels.
Finally, simple demographics have been associated with hydration status, such as age and sex [27,31] or race and ethnicity [10][11][12]32]. Armstrong et al. (2012) suggested that it is difficult to assign numerical values to euhydration, dehydration, or hyperhydration because normative values do not exist [33]. The ones that do exist describe healthy men [6], women [13,33], or both [11], but normative values and confidence intervals for the hydration indices of a large athletic population, taking into account the demographics as earlier described in this introduction, are currently lacking.
Therefore, the primary aim of this investigation was to discover reference values and 95% confidence limits for spot urine urinary indices (i.e., volume, specific gravity, osmolality, and color) and to determine if they differ for age, sex, race/ethnicity, and athletic group affiliation. The secondary aim of this investigation is to assess reporting differences between athletes and investigators for urine color scoring. Although publications report on the accuracy of Uc scoring by investigators or athlete populations, no comparison has been made between the results of both.

Design
The full data set from a study approved by the Institutional Review Board at Arizona State University (STUDY00010071) was used [19]. The dataset consisted of a convenience sample of one hundred and eighty-nine university NCAA Division I athletes, student club athletes, Army Reserve Officer Training Corps (ROTC) cadets, and a group of Chinese coaches visiting the USA (52% male, 22.3 ± 1.6 years) that were asked to score the color of their urine while handing in a single urine sample without any requirements or interventions towards their hydration status. Data were reanalyzed to generate reference values for urine indices (i.e., urine osmolality, urine specific gravity (USG), and urine volume) and urine color. Additionally, urine indices were stratified for void volume (<250 vs. ≥250 mL), void duration (<16 s vs. ≥16 s), time of collection during the day (first morning urine sample vs. urine sample collected during another time during the day), thirst (no vs. yes, based on the question, were you thirsty during the collection of your urine sample?), age (18-19 years, 20-21 years, and >22 years, terciles were pragmatically formed while aiming for three relatively similar group sizes), sex (male vs. female), body mass (≤65.5 kg, 65.6-75.8 kg, and ≥75.9 kg, resulting in exact tertiles, while bodyweight was missing for n = 2 participants), race and ethnicity (Black or African American, White, Hispanic or Latino, and Other, consisting of Asian or Native Hawaiian or Other Pacific Islander, whereas most were Asian coming from China), and athletic affiliation (student athlete, Army ROTC cadet, and coach).

Procedures
Prior to data collection, participants gave signed informed consent and completed a personal characteristic form, and they provided body weight and a single urine sample. Athletes brought a urine sample within 4 h of collection to our lab facility or they collected a sample at the facility. Urine sample containers were then weighed to estimate total urine volume. Before urine concentration was measured (i.e., urine osmolality and urine specific gravity), urine color was measured using a 30 mL urine sample as previously described [16], using a 7-color and 8-color urine color chart. Athletes scored their urine sample once with each of the two different color charts. The order of the 7-and 8-color charts was alternated based on the chart that their predecessor scored first. After the athlete was finished, a team of two investigators scored each sample independently. When sample scores matched for both investigators, this was accepted as their final Uc score, but if urine color score outcomes differed between investigators, they discussed each sample until they reached an agreement on the urine color score. Finally, Uc score outcomes of athletes and investigators were used to classify urine samples as low vs. high urine concentration based on a USG value of 1.020.

Measurements
Participants used a paper form to record their urine void duration, time of collection, perception of thirst, age, sex, age (years), race and ethnicity, athletic status, and they registered their body mass in kg while using a digital scale at our lab facility (Seca 803 digital scale, Hamburg, Germany), as well as body height (Seca 213 portable stadiometer, Hamburg, Germany), allowing for body mass index (BMI) calculation.
To estimate urine volume in mL, all urine collection containers were pre-weighed empty on a precision scale with 0.1 g accuracy (PT1400, Sartorius AG, Göttingen, Germany), and this weight was recorded at the bottom of the container. Each urine sample was weighed and the empty container weight was subtracted to obtain urine volume based on the assumption that the outcome in gram equaled mL.
Urine color was assessed using a 30 mL centrifuge tube (Evergreen centrifuge tube, Caplugs, Buffalo, NY, USA). Each sample was covered using clear Parafilm (Laboratory Film, Bemis Company Inc., Neenah, WI, USA) to seal and prevent color distortion. Participants were instructed to look into a urine color scoring box, as previously described [16]. To control lighting, a 28-watt color adjustable lamp providing an intensity of~1650 lux at full power and light color set to white (NL480, Neewer, Shenzhen, China) was placed on the left side of the box at the height of the sample. The scoring was done in a well-lit room directly under a 3-light fluorescent parabolic troffer (420 lux) built into the ceiling [16].
Urine specific gravity was measured in fresh urine samples (stored no longer than five days in the refrigerator) using a USG refractometer pen (Pen-Urine S.G., Atago, Tokyo, Japan) at a sample temperature of 20 • C. Each measurement was performed twice. In case a variance larger than 0.0005 was detected between the two measurements, a third measurement was added and the median was calculated. Duplicate measurements were performed to calculate mean urine osmolality (with sample CV 0.17 ± 0.18) in fresh urine samples (stored no longer than seven days at a temperature of at 5 • C) using freezing point depression (A2O Osmometer, Advanced Instruments, Norwood, MA, USA) [19].

Statistics
All data were reported as medians and interquartile ranges (IQR) and were calculated using urine concentrations based on split percentiles ranging from being extremely wellhydrated to extremely underhydrated. Additionally, a 95% confidence interval (95% CI) was calculated for each of the stratified data, including values ranging from 2.5 to 97.5 percentile for each subgroup. Urine color (Uc) data were reported as median (IQR), min-max score, and the percentage of urine color samples that were correctly classified for each percentile. To classify Uc, we used the earlier suggested Uc cut-off value ≤2 for the 8-color chart and 1 for the 7-color charts based on previously reported validation data [16], allowing the classification of low vs. high urine concentration against a cut-off value of 1.020 urine specific gravity. The comparison of Uc with a USG based cut-off value was preferred over urine osmolality, as practitioners are more likely to use USG as a field-based measurement, which substantiates the need for more USG-based reference data. Selecting to compare Uc with USG instead of osmolality was also supported by the suggestion of Armstrong et al. (1994) that urine osmolality and USG may be used interchangeably as a result of a very strong correlation (r ≥ 0.97) between them [14]. Differences for hydration indices, as well as for urine color scores between athlete and investigator, and stratified analysis for differences between two groups were calculated using a Mann-Whitney U test, while the difference between three or more variables was tested using Kruskal-Wallis test, followed by pairwise comparisons adjusted by Bonferroni correction. p-values were set at 0.05.

Results
The athletic population (n = 198) represented n = 132 student-athletes: 16% crosscountry or track, 15% wrestling, 9% tennis, 5% triathlon, 5% water polo, 4% swimming and smaller percentages for baseball, basketball, beach volleyball, cricket, CrossFit, cycling, dance, fencing, hockey, rowing, rugby, sailing, soccer, softball, ultimate frisbee, volleyball, lacrosse, and weightlifting, of which 83% were DI student athletes and 17% was a student club athlete; n = 33, 18% Army-ROTC cadets training three times a week for 1.5-2 h, and; n = 24 fulltime coaches representing tennis and volleyball. After classifying urine samples in seven percentile groups from low to high urine concentration, there was a weak but significant inverse relationship between urine concentration (i.e., urine osmolality and USG) and urine volume (r = −0.34 with 95%CI −0.46 to −0.21, p < 0.001). The highest median urine volume was 423 (207 to 595) mL for the top 1-10 percentile range and the lowest volume was 142 mL (105-177) mL for the 91-100 percentile range. Urine volume was especially high when athletes were (extremely) well hydrated. Total group BMI was 23.5 (21.6-25.8) reporting a good correlation with body mass (r = 0.84 with 95%CI from 0.80 to 0.88, p < 0.0001).
The total of correctly classified urine color samples against a < 1.020 USG cut-off value were 77% for athletes for both color charts, and 84% and 81%, respectively, for the 8-color and 7-color charts for investigators. As shown in Table 1, Uc outcomes for urine samples with the lowest and the highest urine concentrations report the highest correct percentages, as scored by athletes and investigators. The percentile ranges that are positioned around the cut-off value to identify underhydration report the lowest number of correctly classified Uc scores for both urine color charts, regardless of whether Uc is scored by athletes or investigators. Roughly half of Uc of the urine samples is scored lighter by athletes vs. investigators, starting at the percentile range 76-90 up to 91-100. Removing outliers, below the 2.5th and above the 97.5th percentile, also resulted in a significant difference between athletes and investigators scoring Uc (p = 0.02, and p < 0.001 for the 8-color and 7-color Uc chart, respectively). This suggests that, on average, urine is scored significantly lighter by athletes vs. investigators, which was especially the case for the more concentrated urine samples.
When stratifying the data into various categories, there was a significant difference for urine concentration for a high low vs. high void volume when outliers were removed (percentile range 2.5-97.5, p = 0.04 for osmolality and USG), but not for separate percentiles, as shown in Table 2. A short vs. a long void duration resulted in different urine volumes for all percentiles (ranging p ≤ 0.001 to p = 0.01), except for the 91-100 percentile (p = 0.06). No significant differences were found for percentile-based groups for the time of collection during the day (ranging p = 0.17 to p = 0.84), or for thirst (ranging p = 0.20 to p = 0.89; Table 2). There were no age-related differences for urine concentration (ranging p = 0.11 to p = 0.85) and urine volume (ranging p = 0.06 to p = 0.71) ( Table 3). There were only limited sex-related differences in that extremely underhydrated females (91-100th percentile) had a lower urine volume than males (p = 0.03). When categorizing for body mass in tertiles, only the 61-75 percentile showed a difference for body mass groups for urine volume (p = 0.02), driven by the difference between body mass ≤ 65.5 kg vs. ≥75.9 kg (p = 0.01). Extremely well-hydrated Other participants reported the highest urine volume when removing outliers in comparison to White, Hispanic or Latino, and Black or African American participants (p = 0.001; Table 4), with the volume of all groups significantly different from Other (p < 0.04). Additionally, volume was different between White and Other for the 11-25 percentile (p = 0.03). Student-athletes had a significantly lower urine volume than Army ROTC cadets and coaches when outliers were removed (p < 0.01), with student-athletes having a lower volume than Army ROTC cadets (p = 0.04) and Coaches (p < 0.001). Aside from these differences, for most percentiles, no significant differences were calculated (p > 0.05).    Data are reported as median and interquartile range. Significant differences based on Mann-Whitney U tests within categories, are expressed in bold, with p ≤ 0.05. Table 3. Spot urine sample reference values for age, sex, and body mass in relation to urine concentration and urine volume (n = 189).      reported a slightly lower accuracy (4-7%) compared to investigators, and they scored the more concentrated samples as lighter.
The current study shows that values on the well-hydrated end of the spectrum were more often below the <1.017 USG and <545 mOsm/kg value for spot urine samples reported by Armstrong and colleagues as extremely hyperhydrated cut-off values [6], but they are similar to later 24-h reports in women [13,33]. In the current study, no differences in urine concentration were seen for time of collection during the day, although many sources have reported a difference in urine concentration between a first morning spot urine sample and urine collected at a later time of the day [21,22,34]. This might result from the fact that the samples were collected during the afternoon hours, and urine concentration could have been influenced by acute rehydration strategies after practice [35]. The current study also did not reveal any hydration status differences for age groups, and although an older age has been associated with an impaired hydration status [10], this was likely related to having a higher BMI. We did not stratify data using BMI for this population, as it is known that BMI may be under or overestimated in athletic populations due to a higher muscle mass, but we stratified for body mass revealing no differences between groups. This is in contrast with earlier reported differences between athletes with different body composition, such as rugby players vs. runners [36]. It has been reported that athletes in non-weight category sports showed a lower morning urine concentration than athletes in weight category sports [9], but the number of athletes performing in weight category sports in our study was limited. The almost equal percentage of men and women included in this study (52% men) allowed us to assess potential differences. Although there is a substantial body of literature that reports differences for urine concentration between non-athlete men and women [9], our athletic population did not express clear sex differences. Overall, the athletes assessed in this study reported only small, probably irrelevant sex and body mass differences for some percentiles.
There are suggestions that non-athletic Black or African American individuals report the highest urine concentration, followed by a slightly lower concentration in Hispanics or Latinos, with the lowest concentration in White individuals [10,12]. Moreover, others have compared Black or African American and White healthy non-athlete adults [11,32], reporting differences in urine concentration. In contrast, in the current study, no significant differences were seen for urine concentration between athletes from different race/ethnic groups. This was possibly due to our focusing on an athletic population. Although in the present study no differences were seen for urine concentration, there was a difference in urine volume between groups when outliers were removed. Black or African American athletes reported a lower spot urine volume than Hispanic or Latino, White, and Other athletes.
There was also a urine volume difference related to athletic affiliation, i.e., studentathletes having the lowest volume, followed by army ROTC cadets, and coaches with the highest volume, but we feel that this was driven by racial and ethnic differences, as well. Most of the coaches were part of the "Other" group (as they were Chinese exchange coaches visiting the US) producing a large urine volume, while the army ROTC cadets were predominantly White, resulting in a student-student athlete population with a larger portion being Black or African American and Hispanic or Latino. Overall, the interesting conclusion is that despite differences in urine volume, there were no differences in urine concentration between athletic groups of different ethnic and racial descent.
Organizing the Uc results in percentiles, reflecting categories from a low to a concentrated urine, resulted in a gradual increase in Uc from score 1 (indicating a low urine concentration) to a score of 5 (suggesting a high urine concentration), comparable to other reporting [6,13]. Although there are some data comparing the accuracy of self-reporting Uc with investigator based Uc scores [37], this study is the first to compare both while analyzing differences for percentile ranges. This analysis revealed that the significant difference between athletes and investigators is driven by athletes rating the more concentrated samples with a lower score than the investigators. The athletes' scores for the two different color charts were apparently the lowest from the 41st to the 75th percentile range, similar to the investigators, whose ratings were also less accurate in this range, but no clear differences were seen between charts. Normally, self-reported accuracy values for Uc range from 0.67 to 0.78 [37,38], while earlier we reported accuracy rates from 0.74 to 0.83 for the population represented in this study [16]. However, the current breakdown in percentiles shows that there is variability in how well urine color samples are scored depending on their concentration. As a result, the accuracy of percentile ranges in this studied varied from 43% to 100% correct classification, with the highest accuracy for extremely well hydrated and extremely underhydrated samples. This suggests that a traditional Uc chart predicts especially accurate urine samples with a low and a high urine concentration. This includes urine concentrations values (i.e., ≤500 mmol/kg −1 , and ≤1.012) for euhydration [38], represented by the first two percentile ranges 1-10 and 11-25, with in most cases 100% accuracy.
This study has particular strengths. These spot urine sample reference values add to our current knowledge of urine indices in a substantial athletic population, in addition to available experimental and observational data [6,13,33]. As said, samples were collected as spot-urine, whereas others often report the 'gold-standard' 24-h sample, but with much smaller sample sizes [6,13,33]. Although spot samples are often considered inferior to 24-h samples, they are most frequently collected in practical situations, and therefore these data are of value. As such, this study provides urinary indices for health professionals that can help athletes to improve their hydration status, while having a better insight in hydration markers reflecting and covariates influencing this hydration status. Further, this is the first study to compare investigator and self-reported Uc scores from athletes while the accuracy of correctly classified urine samples is split into percentile-groups. The difference in reporting between investigators and athletes was mainly driven by the misclassification of samples with a higher urine concentration. This suggests that athletes especially should be trained on accurately scoring darker urine samples, instruction that in the future could be provided by qualified health professionals.
Limitations of this study were that, despite its substantial sample size, the study results should be considered preliminary, especially because most of the data came from young student athletes, which makes it difficult to generalize results to older populations. Further, it was not determined if the urine sample was provided directly after practice or after consuming a large volume of fluid [16]. Additionally, no instruction was given to avoid urinating with force, which might have influenced voiding time and therefore the classification of urine samples as short vs. long duration [19]. Although the Uc scoring accuracy was deemed similar for athletes reporting the use of nutritional supplements vs. no use [19], no dietary assessment was performed looking at food-based colorants or food sources influencing diuresis, nor physiological measurements such as sweat rate were performed. The representation of White, Black or African American, Hispanic or Latino, and Other, including Asian or Native Hawaiian or Other Pacific Islander participants, was slightly skewed [16], but likely to be representative for the Phoenix metropolitan area (AZ, USA). Additionally, there are many ways thirst can occur [25], and during data collection no distinction was made between the type of thirst reported, therefore other types of thirst (i.e., contextual thirst, pharmacological thirst, and impulsive thirst [25]) may have contributed to our classification of true thirst, which is related to a high urine concentration. This may be a reason why no differences were seen for this split analysis. A final limitation for this specific analysis was that we stratified urine volume based only on USG values, and not for osmolality as well, which may have resulted in minor differences in urine volume per percentile group when compared to the osmolality results.
This analysis leads to some practical recommendations as urine volume is one of the most accessible tools to make athletes more aware of their hydration status. The current study showed that in this athletic population, a larger urine volume was associated with a lower urine concentration. When the practicality of the collection of urine volume is compromised, one could consider quantifying volume by timing the urine void, assuming that urine flow rate is somewhat consistent between voids [19]. Some subgroups in this study were more prone to report a low volume with a high concentration than others. Therefore, health professionals should be mindful to consider race, ethnicity, and athletic affiliation as part of their hydration strategy. Future research should focus on improving the classification of urine samples, e.g., by combing, using multiple field-based assessments (including the assessment of urine color), allowing athletes to accurately identify a low vs. a high urine concentration to reduce the current misclassification when scoring urine color alone. Additionally, research should examine the trainability of Uc scoring as investigators clearly reported a higher accuracy than the athletic population.

Conclusions
In this convenience sample of a predominantly young athletic population, a high spot urine sample concentration was associated with a low urine volume and a shorter void duration, and despite small urine volume differences for race and ethnicity, no relevant differences were seen for urine concentration between groups based on athletic affiliation. Although well-trained investigators reported slightly more accurate values than the athletic population, investigators and athletes report Uc especially different for samples with a higher urine concentration. Further, the results of this study show that the Uc method offers an especially good representation for the accuracy of urine samples with a very low or high urine concentration.
Funding: This research received funding from the Global Sport Institute at Arizona State University.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Arizona State University (STUDY00010071).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The underlying research materials related to this paper are available from the corresponding author upon request.