Body Composition Assessment in Mexican Children and Adolescents. Part 1: Comparisons between Skinfold-Thickness, Dual X-ray Absorptiometry, Air-Displacement Plethysmography, Deuterium Oxide Dilution, and Magnetic Resonance Imaging with the 4-C Model

The evaluation of body composition (BC) is relevant in the evaluation of children’s health-disease states. Different methods and devices are used to estimate BC. The availability of methods and the clinical condition of the patient usually defines the ideal approach to be used. In this cross-sectional study, we evaluate the accuracy of different methods to estimate BC in Mexican children and adolescents, using the 4-C model as the reference. In a sample of 288 Mexican children and adolescents, 4-C body composition assessment, skinfold-thickness (SF), dual-energy X-ray absorptiometry (DXA), air displacement plethysmography (ADP), and deuterium dilution (D2O) were performed, along with MRI in a subsample (52 participants). The analysis of validity was performed by correlation analysis, linear regression, and the Bland–Altman method. All methods analyzed showed strong correlations for FM with 4-C values and between each other; however, DXA and MRI overestimated FM, whereas skinfolds and ADP under-estimated FM. Conclusion: The clinical assessment of BC by means of SF, ADP, DXA, MRI and D2O correlated well with the 4-C model and between them, providing evidence of their clinical validity and utility. The results from different methods are not interchangeable. Preference between methods may depend on their availability and the specific clinical setting.


Introduction
In Mexico, 35.6% and 38.4% of children and adolescents have overweight or obesity (OW/OB) according to the 2018 Mexican National Health and Nutrition Survey (EN-SANUT) [1,2]. OW/OB represent a key public health problem in the country and are closely related to the top three causes of mortality (heart diseases, diabetes, and cancer) [3].
Although OW/OB are defined as the excessive accumulation of adipose tissue leading to increased risk of negative health outcomes, they are routinely categorized in all age groups using body mass index (BMI) [4,5].
BMI is almost universally adopted to assess nutritional status because of its simplicity, practicality, and its good correlation with adipose tissue [6,7]. Nevertheless, the limitations of using BMI for this purpose are increasingly recognized [8][9][10], and it should not be used as a standard in the specific assessment of adiposity [11]. BMI may be imprecise and may misclassify those of short and tall stature and those with a significant increment of their muscle mass [12]. BMI may also be insensitive to change by interventions (e.g., reductions in adipose tissue coupled with increments in muscle mass related to successful nutritional and physical activity intervention may not equate to changes in BMI) [13]. Finally, individuals with high levels of lean mass (constituted mostly of functional and useful tissues that are not necessarily unhealthy) may be misclassified as OW/OB due to high BMI [14,15].
In addition to OW/OB, several other health conditions (e.g., cancer, malnutrition, storage diseases, chronic exposure to systemic corticosteroids, etc.) are also associated with alterations in weight which may be insufficiently described by BMI [16][17][18].
The increased availability of alternative technologies has brought significant improvements in our capacity to measure human-body physical characteristics. Likewise, the assessment of the relative distribution of different tissues that contribute to body composition (BC) has gained interest and relevance in the evaluation of health and disease [19].
Clinical BC acknowledges four different components of weight: fat mass (FM), protein mass (PM), bone mineral content (BMC), and total body water (TBW). Currently, the estimation of these components is possible using a multi-technique approach known as the 4-component (4-C) model. While this is the most accurate method, its complexity, cost, time to provide results, and exposure to radiation challenges its use for routine clinical practice [20,21]. For these reasons, the 4-C model is only used for research purposes. Simpler, faster, safer, and less expensive techniques are more readily available and are increasingly used in the clinical assessment of BC. Each technique has its advantages and its limitations and thus may be applicable to different scenarios.
A key issue concerns variability in the accuracy of the different techniques. The aim of this study was, therefore, to compare several BC estimation methods to the criterion 4-C model in Mexican children and adolescents to better inform the clinicians about their performance, ultimately facilitating the adoption of the most suitable method to assess BC.

Participants
Healthy children and adolescents that participated in the study "Reference values of body composition of Mexican children and adolescents" were invited to participate in this study using an age-and sex-stratified random procedure. Our approach was described in detail previously [22]. Briefly, it was a population-based cross-sectional study of more than 1500 volunteer healthy Mexican children and adolescents who were residents of Mexico City. These participants were clinically, nutritionally, and biochemically assessed to confirm their health status prior to the corresponding measurements, with the objective to describe the reference values of body composition for Mexican children and adolescents [22]. Sampling selection for the current study was performed using this database with an iterative stratified random process considering age in yearly intervals from 4 to 18 years and sex. A sample size of 7 participants per year of age and sex was calculated as appropriate for an expected correlation coefficient of ≥0.90 with a two-tailed type 1 error rate of 5% and type 2 error rate of 20% [23]. Recruitment was conducted by telephone calls, where the study was explained in detail and carried out from June 2018 until the sample size was completed for each age and sex group, which occurred in July 2019. This study was reviewed and approved by our institutional ethics, biosafety, and research committees (Registered as HIM 2015-055).

Clinical Assessment
Invited children and adolescents that agreed to participate were asked to arrive at the study site after 8 h fasting for the measurements. All parents or guardians of participants signed an informed consent form, and children aged ≥ 7 years were asked for their assent as well. Participants were clinically, nutritionally, and biochemically assessed by a pediatrician and a nutritionist to confirm their health status. Pubertal development stage was registered according to the Tanner and Whitehouse scale [24,25]. Those with biochemical abnormalities were not included in this study (i.e., impaired fasting glucose; low high-density cholesterol; high triglycerides; or insulin resistance according to the Expert Panel on Integrated Guidelines for Cardiovascular Health and Risk Reduction in Children and Adolescents criteria) [26].

Anthropometry
Weight and height were measured with participants wearing lightweight clothing, using a SECA ® 284 scale stadimeter. Waist and hip circumferences were measured according to WHO standards using a SECA ® 201 measuring tape [27]. Mid upper arm, thigh, and calf circumferences were measured according to the International Standards for Anthropometric Assessment by the International Society for the Advancement of Kinanthropometry (ISAK) recommendations [28].
BMI was calculated as weight (kg) divided by the square of height (m) [29]. Weight, height, and BMI z-scores were calculated using the growth reference of the World Health Organization [30].

Skinfold Thickness
SF thicknesses were measured according to Lohman's technique following ISAK recommendations [28]. They were measured at the triceps and calf, twice for each site and for both body sides, with a calliper with a scale of 0-80 mm and precision of ±0.2 mm (Harpenden calliper British Indicators Ltd., St Albans, UK). Measurements were taken to the nearest millimetre at each site, and the mean of the four values for each region was calculated. The percentage of fat was calculated according to the equations of Slaughter et al. and multiplied by the total weight of each subject to obtain total fat mass [31]: Males Percentage of fat (%) = 0.735 (triceps + calf) + 1.0 Females Percentage of fat (%) = 0.610 (triceps + calf) + 5.1 Total Fat-mass = fraction of fat × weight (kg)

Dual X-ray Absorptiometry (DXA)
A whole-body scan was performed on all participants using a Lunar-iDXA densitometer (GE Healthcare ® ) according to the manufacturer's instructions and analyzed through ENCORE ® software version 15. Measurements were performed by an International Society of Clinical Densitometry (ISCD)-certified nurse, and calibration of the densitometer was performed on a weekly basis according to the manufacturer's instructions. DXA total body composition assessment with regional analysis provided data for total body (with head) fat mass (FM), lean soft tissue mass (LM) and bone mineral content (BMC) [32], and regional from arms, legs, and trunk [33]. DXA FFM values were calculated as total body LM plus BMC.

Air-Displacement Plethysmography (ADP)
Body volume was measured by ADP using BOD POD ® instrumentation (COSMED USA Inc., Concord, CA, USA, Software version 5.2.3) with standardized procedures according to the manufacturer's instructions [34]. Briefly, participants had to abstain from physical activity and food 2 h before the measurement. The BOD POD was calibrated each day before use according to the manufacturer's guidelines. Study participants were measured in tight-fitting bathing suits with swimming caps to minimize air trapped in clothing and hair. Body mass was measured using the BOD POD's precise electronic scale, while body volume was measured in the chamber twice. If the first two readings for body volume differed by more than 150 mL, a third measurement was taken, and the two values that were closest and within the criteria for the agreement were averaged. Thoracic gas volume (TGV) was predicted by the software with a validated child-specific equation [34,35]. The fat-mass percentage (FMADP%) and fat mass by ADP (FMADP) were calculated using up-to-date child-specific conversion factors reported included in the paediatric software [35,36] D 2 O doses were outliers and not used for calculations of TBW when the weight of the bottle was lower after drinking the D 2 O with tap water than it was in the beginning, when the remaining D 2 O with tap water after drinking was more than 1 g, or when the weighed D 2 O differed more than 15% from the target dose of 0.05 g of D 2 O per kilogram of body weight.

Magnetic Resonance Imaging (MRI)
As an exploratory part of this study, we measured total-body fat mass by whole-body multi-slice MRI for a subsample of participants from this study (n = 52). Participants were placed in a 3.0 Tesla (T) scanner (Achieva 3.0T, Philips Medical Systems, Best, The Netherlands) in a supine position with their arms by their sides. T1-weighted (TR/TE: 72.3/2.3 ms) and T2-weighted (TR/TE: 1093.4/76 ms) coronal images (6 mm slice thickness, 1.0 mm gap) were acquired across the whole body. The intervertebral space between the fourth and fifth lumbar vertebrae (L4-L5) was set as the point of origin for abdominal T2-weighted (TR/TE: 3000/16 ms) water suppression and transverse images (8 mm slice thickness, 1.0 mm gap) covering the abdominal area. We then calculated the size of voxels, counted those with fat, and multiplied them by the adipose tissue density to obtain a value for total body fat mass. Visualization, annotation, and quantification were performed in MATLAB R2020b (The Mathworks, Inc., Natick, MA, USA).

The 4-Compartment Model (4-C)
The 4-C model was used as the reference standard method for the estimation of fat mass and was calculated according to Fuller et al. [38]:

Statistical Analysis
Descriptive statistics were used to characterize the demographics and measurements of each method, expressing results as means and standard deviations for continuous variables and percentages for categorical variables.
The means of FM estimated by each method in comparison with the 4-C model, for the total sample and by age and sex groups, were compared using a paired t-test.
Pearson correlation coefficients and Lin's concordance correlation coefficients were computed for the estimated FM, %FM, and FFM values by each method with respect to the reference standard of the 4-C model. A simple linear regression was performed to determine the relationship between body composition methods and obtain the equation for each method with the 4-C model for total body FM and FFM estimation. The Bland-Altman method [39] was used to assess agreement between each method with the 4-C model as reference standard. In this procedure the differences of FM values estimated by each method minus the values estimated by the 4-C model (y-axis) were plotted against the average of such two measurements (x-axis). The means of FM estimated by the different methods were compared by paired t-tests. The mean difference and the limits of agreement (+/−2 SD of the difference) were calculated, and linear regression analysis with the difference as the dependent variable, and the average of measurements as the independent variable, were undertaken for each method to assess proportional bias (i.e., whether the magnitude of the bias varied depending on the level of fatness) [39].
Statistical analyses were conducted in SPSS for Windows version 21.0 (SPSS Inc., Chicago, IL, USA) and Prism 8 for Windows (GraphPad Software, Inc., San Diego, CA, USA). Statistical significance was set at p < 0.05.

Results
A total of 293 children and adolescents were measured; data from 5 participants met the outlier criteria for TBW measurement by D 2 O and were not included. We report results from the measurements of 288 participants (aged 4 to 18 years old); 53% of them were females, and 173 (59%) were adolescents (11 to 18 years). Demographics and measurements data are summarized in Table 1. For clarity, all data are presented by sex and age group unless otherwise specified. The characteristics of the subjects measured with MRI are shown in Supplementary Table S1.  Comparisons of mean FM values estimated by each method in comparison with the 4-C model for the total sample and stratified by age and sex group are shown in Supplementary Table S2. ADP FM mean values for the whole sample were consistently and significantly lower than those estimated by the 4-C model (8.2 ± 6.5 kg vs. 9.5 ± 6.8 kg), whereas those estimated by DXA and MRI were consistently and significantly higher (12.5 ± 6.8 kg and 12.9 ± 5.7 kg, respectively). FM estimated by D 2 O was similar to 4-C values in children but significantly higher in adolescents. The estimation of FM by SF showed mean values that were significantly lower for female children and adolescents, significantly greater for male adolescents, and similar in male children when compared to the 4-C mean values.
The correlations, concordances, agreements, and proportional bias assessments of FM between SF, DXA, ADP, D 2 O, and MRI with respect to the 4-C model are shown in Table 2.  All methods showed strong to very strong correlations with 4-C values for FM (i.e., Pearson's correlation coefficients > 0.80) across all age and sex groups. For this study, we used the Slaughter formula to estimate BC from SF measurements, but in Supplementary Figure S1 we also provide raw data and correlation analyses for raw SF data as additional analyses.
Lin's concordance correlation coefficients ranged from poor (<0.90) to substantial (>0.95) precision and accuracy for each method in comparison with the 4-C model across age and sex groups. The Bland-Altman agreement analyses disaggregated by sex and age groups showed the lowest mean bias for D 2 O (−0.17 to +0.94 kg) and SF (−1.   Table 3. Lin's concordance correlation coefficients for precision and accuracy ranged from poor (<0.90) for DXA to substantial for D 2 O (~0.95) in comparison with FFM by the 4-C model across age and sex groups. The Bland-Altman agreement analyses disaggregated by sex and age groups showed the least mean bias for D 2 O (−0.86 to +0.21 kg) and SF (−0.74 to 1.0 kg) with respect to the FFM with the 4-C model, and greater values for DXA (−3.3 to −2.7 kg) and ADP (0.42 to 1.28 kg). The Bland-Altman plots for these analyses for the total sample are shown in Figure 2.  Further analyses of correlation, concordance, and agreement between the different methods in the estimation of FM are shown in Supplementary Table S3. In the correlation analysis, all the different techniques showed r values ≥ 0.83. The techniques with the best concordance were DXA with MRI and SF with ADP. The main differences were between MRI and D 2 O. In the Bland-Altman analyses, all three techniques showed significant biases in the mean estimation of FM, as shown in Supplementary  Table S3 and Supplementary Figure S3.

Discussion
This study compared five different BC estimation methods to the 4-C model in Mexican children and adolescents. For this study, the 4-C model was considered the reference standard of BC assessment. Like previous publications, our results showed that all five methods provided data on FM and FFM that correlated well with the 4-C model [36,[40][41][42]. However, Lin's concordance correlation coefficients and Bland-Altman plots provided more detailed information regarding significant differences between methods. According to such analyses, D 2 O, SF, MRI, and ADP showed the highest overall concordance and the lowest bias, though with higher FM values, proportional biases became significant and agreement between each of these methods and the 4-C model decreased. In contrast, DXA consistently overestimated FM by approximately 3 kg, but this was stable across the different values of FM, showing lower accuracy but higher precision than D 2 O, SF, and ADP.
Considering their availability, accessibility, and affordability, measuring SF may represent a preferred choice for clinicians across different levels of healthcare. This method requires the least infrastructure investment, is non-invasive, reproducible, relatively comfortable for the patient, may be repeated as frequently as required without risk, and can be conducted in the clinical setting of critically ill patients and those with mobility restrictions [43]. Limitations include the need for trained and standardized personnel, the method is operator-dependent, entails the application of an equation, or conversion to Z-scores, and its performance may be compromised in clinical conditions where BC assessment is frequently required, such as oedema, extreme obesity, and other conditions such as muscular or lipid dystrophies, storage diseases, among others [44]. In our data, SF showed variations in the estimation of FM in the range of limits of agreement from −5.2 to +4.5 kg in comparison to the 4-C model. SF showed a significant proportional bias with increasing sub-estimation for increasing FM values (beta-coefficient −0.1; p < 0.001). The magnitude of the proportional bias may represent a compromise in its clinical performance when assessing patients with the highest FM values (e.g., OW/OB) but may be less relevant for those with malnutrition, cancer, or other conditions, including the nutritional assessment of healthy subjects. BC by ADP is currently possible only using a single commercially available device known as the BOD POD ® (Cosmed USA Inc., Concord, CA, USA). This device has specifically been designed to assess BC and is very popular in weight-management programs and among high-performance athletes. BC assessment by ADP has several advantages: it is non-invasive, relatively easy to perform, reproducible, tightly calibrated, and can also be repeated as frequently as needed without risks. However, it requires a significant investment for the device, a dedicated room with constant temperature and pressure, and trained personnel. ADP is not easily performed in individuals with mobility restrictions or in critically ill patients. Because ADP estimates FM and FFM assuming a constant of tissue hydration, individuals with several diseases affecting hydration status may be inadequately assessed by this method as well. [21]. BC is usually contraindicated in individuals with claustrophobia and may be limited for individuals with excessively large body sizes. The use of skin moisturizers and even abundant hair may compromise its precision and accuracy as well. In our data, ADP showed variations in the estimation of FM in the range of limits of agreement from −5.3 to +2.6 kg in comparison to the 4-C model. ADP showed significant proportional bias with increasing sub-estimation for increasing FM values (beta-coefficient −0.05; p = 0.003). Again, the magnitude of the proportional bias may represent a compromise for those with the highest FM values but not so relevant for other health conditions. DXA has gained substantial interest because of the growing versatility of clinical assessments that can be conducted. Initially, DXA was developed to estimate bone mineral density (BMD), and it is currently the standard clinical tool to diagnose osteopenia or osteoporosis. Subsequently, DXA is increasingly used to estimate other body components, such as FM and lean mass (LM) (i.e., fat-free and bone-free mass) [45]. This method has become popular, and at some centers, it is considered the clinical gold standard for BC assessment [46]. The major advantages of this method are that it allows the estimation of three components (BMC, FM, and LM) in a relatively simple and fast assessment (i.e., <15 min with results immediately available), and it is very reproducible (as long as the same technology is used). It also allows for BC assessment by regions (i.e., arms, legs, trunk), which may be of clinical relevance. The major limitations of DXA include exposure to radiation which impedes repeated frequent assessments, it requires a high investment for the device, related infrastructure, and its maintenance. Individuals with limited mobility, critically ill, those with prosthetics, those who are or might possibly be pregnant, and those unable to stay still during the scans may compromise the feasibility of this type of BC assessment [47][48][49]. In our data DXA showed variations in the estimation of FM in the range of limits of agreement from −1.1 to +7.0 in comparison to the 4-C model. The DXA estimations showed no significant bias across the different values of FM (beta-coefficient −0.006; p = 0.74), consistent with other reports in the literature [36,40,49,50]. D 2 O is considered a reference method to estimate total body water. It relies on the ingestion of labelled water, and then by adjusting for hydration coefficients, FM and FFM can be estimated. This method is non-invasive, with no known adverse effects, may be used in pregnant women, children, elderly, and may be used multiple times without clinical consequences [51]. However, this technique requires mass spectrometry analyses or Attenuated Total Reflection Fourier Transformed Infrared Spectroscopy (ATR-FTIR), which necessitates access to such technology and infrastructure, trained personnel, and usually the time from assessment to results may be considerable. In our data, D 2 O showed variations in the estimation of FM in the range of limits of agreement from −4.3 to +5.4 in comparison to the 4-C model. D 2 O showed a significant proportional bias with increasing supra-estimation for increasing FM values (beta-coefficient +0.09; p < 0.001).
MRI offers an interesting approach to BC given its ability to discriminate between different tissues, offering a unique perspective on body fat mass [52]. This method is non-invasive and may be repeated in the follow up of patients. MRI also allows for the assessment of adipose tissue in specific regions and organs. However, MRI may represent a challenging tool for BC assessment in the clinical setting. It requires a significant investment of the device-related infrastructure and maintenance; it usually necessitates a considerable amount of time for image acquisition, where participant cooperation is needed, and as with ADP, claustrophobia may be a relative contraindication. Individuals' size, mobility restrictions, prosthetics, and critical illness may also prevent this method from being used. Moreover, MRI interpretation may require considerable time trained personnel and pose challenges to the time taken from images acquisition to a clinical result. For such reasons, and even in the research setting such as this study, BC assessment by MRI presents major challenges. In our data, MRI showed variations in the estimation of FM in the range of limits of agreement from −3.2 to +7.8 in comparison to the 4-C model. MRI also showed a significant proportional bias with increasing sub-estimation for increasing FM values (beta-coefficient −0.1; p = 0.03).
Our results provide relevant data to clinicians regarding the acceptable clinical performance of all analysed techniques compared to the 4-C model. In addition, this study also compared correlations, concordances, and agreements between such different methods. Routine clinical practice may assess BC by means of SF, DXA, ADP or MRI, whereas D 2 O and the 4-C model are mostly conducted for research purposes. As this study has shown, the methods have good correlations with the 4-C model and between each other; but their results are not interchangeable therefore clinical assessments of BC, especially where followup is relevant, should be made with the same technique to avoid unproper comparisons.
Limitations of the current study include the sample being restricted to healthy participants from 4.5 to 18 years of age. In this study, we only present data from Hispanic subjects living in Mexico City and its Metropolitan Area. Those from rural areas, Afro-Mexican, and indigenous populations, as well as from other territories of Mexico, may not share the same characteristics of our sample. Therefore, we advise caution when comparing subjects from such groups. As the Bland-Altman plots showed, significant biases were evident, and increasing disagreement was observed at higher values of FM for several methods. This finding may be influenced by the smaller number of participants with very high FM values. Future studies with larger samples of participants with OW/OB may allow for further analyses and ascertain if, in fact, a significant bias is dependent on FM. Another limitation of this study is its inability to capture and compare the clinical performance of assessed methods of BC assessment in specific clinical conditions of interest (i.e., malnutrition, storage diseases and diseases where bone, muscle, adipose tissue and or hydration status are affected may influence on clinical performance of the different methods assessed in this study). Data on MRI was limited because of sample size, so no robust conclusions can be drawn from this study; therefore, data were presented as an exploratory analysis only.
We firmly believe that BC assessment should have a more important role, especially in a population such as ours where OW/OB and other health conditions that impact BC (cancer, malnutrition, chronic diseases, chronic exposure to systemic corticosteroids, etc.) are increasing in prevalence within the paediatric population. Such BC assessment in the clinical setting requires precise, accurate, simple, safe, and accessible methods. As our results showed, none of the methods are ideal compared to the criterion 4-component model, but knowing the magnitude and direction of their biases should aid the clinician in the appropriate tool selection and its potential impact on BC estimation. Practicality, versatility, and time to results are other valuable attributes to consider. This study performed a comprehensive comparative analysis of five different BC assessment methods, providing supportive data for their clinical use, significant differences between them and consistent evidence against their interchangeability. Availability and the specific clinical context might provide further direction on preference between the different methods.

Conclusions
Clinical assessment of BC by means of SF, ADP, DXA, MRI, and D 2 O correlated well with the 4-C model, providing evidence of the clinical validity and usefulness of these approaches. All of the methods are appropriate for ranking individuals within a population in terms of their FFM and FM, and this information is often of great value in monitoring clinical progress. Significant differences in concordance and agreement were observed between the methods and varied across different values of FM, indicating that the methods cannot be used interchangeably. However, some of the bias associated with any specific technique can be resolved by providing method-specific reference data whereby raw data are converted to z-scores [20], and providing such comprehensive reference data for Mexican children and adolescents is a further aim of this project. Preference between the methods may depend on their availability and the specific clinical setting, but the emphasis should be maintained on the importance of assessing BC in routine care of the pediatric population.