Digital Anthropometry: A Systematic Review on Precision, Reliability and Accuracy of Most Popular Existing Technologies

Digital anthropometry (DA) has been recently developed for body composition evaluation and for postural analysis. The aims of this review are to examine the current state of DA technology, as well as to verify the methods for identifying the best technology to be used in the field of DA by evaluating the reliability and accuracy of the available technologies on the market, and lay the groundwork for future technological developments. A literature search was performed and 28 studies met the inclusion criteria. The reliability and accuracy of DA was high in most studies, especially in the assessment of patients with obesity, although they varied according to the technology used; a good correlation was found between DA and conventional anthropometry (CA) and body composition estimates. DA is less time-consuming and less expensive and could be used as a screening tool before more expensive imaging techniques or as an alternative to other less affordable techniques. At present, DA could be useful in clinical practice, but the heterogeneity of the available studies (different devices used, laser technologies, population examined, etc.) necessitates caution in the interpretation of the obtained results. Furthermore, the need to develop integrated technologies for analyzing body composition according to multi-compartmental models is increasingly evident.


Introduction
Conventional human anthropometry is a simple, non-invasive, and economical methodology that is easy to perform in different epidemiological or clinical settings and aims to collect measurements of the human body at a total body and/or regional level using simple devices (stadiometer, weight scale, meter, gauges, compasses, skinfold caliper, etc.) [1]. In a broader perspective, it is correct to consider anthropometry as the measurement of each segment, area, or volume of the human body.
The etymology of the word anthropometry is based on the Greek anthropos (human) and metron (measure). Anthropometry was developed in the late 19th century by anthropologists analyzing the differences in human body shape. The role of anthropometry in evaluating nutritional status was defined at the end of the 19th century by Richer, who used the thickness of skin folds as an index of fatness. The modern era of nutritional anthropometry began with Matiegka's studies during the First World War [2]. Matiegka's interest in the physical efficiency of soldiers led him to develop methods for anthropometrically subdividing the human body into muscle, fat, and bone compartments.
Anthropometry's main limitation is that it is deeply dependent on the operator's skill and requires adequate training. The International Society for the Advancement of Kinanthropometry (ISAK) has established a methodology for reducing procedural errors [3]. Kinanthropometry, first defined by William Ross in 1978, is the study of body • Random error can be expressed in terms of precision and reliability (relative or absolute) level.
• Precision expresses the variability between repeated measurements by a particular observer using a particular device to measure a particular variable. Imprecision can be caused by flawed measuring equipment, inadequately trained measurers, or poor technique. Common indices of precision are absolute intra-and inter-Technical Error of Measurement (TEM) and relative TEM (%TEM). According to the International Society for the Advancement of Kinanthropometry (ISAK) protocol, acceptable TEMs are 0.1 kg, 3 cm and 2 cm for weight, stature, and body circumferences, respectively. Another example is the precision error of repeated measurements (PE). • Absolute reliability regards the consistency of scores for individuals or, in other terms, the degree to which repeated measurements vary for individuals. It can be expressed by the coefficient of variation (CV) and the standard error of measurement of a group estimate (SEM). • Relative reliability is the degree to which individuals maintain their position in a sample over repeated measurements; it is expressed by the reliability coefficient R and the intraclass correlation coefficient (ICC).
• Systematic error or bias depends on accuracy, defining the level of correlation or agreement between an under-validation (bedside) method and a reference method when measuring the same variable. It may depend on equipment bias (lack of calibration, device complexity) or operator error. As mentioned before, it is possible to classify accuracy in terms of: • Correlation at a mean level: paired t-tests, Pearson's correlation coefficient, concordance correlation coefficient (CCC), linear regression. The latter involves calculating the coefficient of determination (R 2 ), standard error of the estimate (SEE), and root mean square error (RMSE). CCC appears useful to describe methods agreement (association and identity) when more than two operators and/or repeated measurements come into play. While ICC relies on ANOVA assumptions, CCC does not; both indices concurrently involve precision and accuracy assessment. To date, anthropometry refers to the systematic collection of physical measurements of the human body [11] and their combination to develop useful indicators for the assessment of nutritional status [12], the risk of malnutrition, sarcopenia, the decline in physical tion. The year of publication of studies was restricted to 2000-present (i.e., September, 2021). The research was conducted by applying the PICO methodology (Population: healthy population; Intervention: use of DA; Comparator: use of conventional techniques for the generation of anthropometric data; Outcome: replacement of conventional techniques with DA). The last search was performed on 15 September 2021.

Eligibility Criteria and Procedures for Article Selection
The included studies involved the use of DA to analyze lengths, circumferences, and other anthropometric measurements. Body composition results were also included because of the close connection, through the use of predictive equations, that exists in numerous technologies, including reference methods, between anthropometric measurement and body composition appraisal.
For these reasons we decided to include studies concerning body composition, considering them an indirect measure of the precision and accuracy of the technologies used to measure anthropometric measurements that are then used to estimate body composition.
Each selected study compared DA with conventional techniques of body composition assessment, such as manual measurements, bioimpedance analysis (BIA), dual-energy Xray absorptiometry (DXA), hydrostatic weighing (HW), air displacement plethysmography (ADP) and computed tomography (TC). Every study included in this review evaluated accuracy; some studies evaluated both precision and accuracy; studies evaluating only precision but not accuracy were not included.
Meta-analyses, reviews, book chapters, case reports/series, expert opinions, articles in languages other than English, full text unavailable, articles published before 2000, and those concerning somatotypes, body typing with statistical models, body surface, and postural analysis were discarded.
The references of included studies were also checked to identify other potentially relevant studies. The search process was carried out by two researchers (F.F. and P.C.) working independently; disagreements were solved through consensus and by discussion with the lead author (M.E.).

Data Extraction and Quality Assessment
Data was extracted by the lead author, who retrieved the following information for each study: first author, year of publication, study design, country of origin, type of 3D scanner used, comparison with other techniques, and results.
The methodological quality of the included studies was assessed by means of the Appraisal tool for Cross-Sectional Studies (AXIS) [19] and the Newcastle-Ottawa Scale (NOS) on cross-sectional studies [20]. Scores for both scales are reported at the end of this section.
The AXIS tool consists of 20 items that evaluate the quality of reporting (7 questions), study design quality (7 questions), and potential introduction of biases in a study (6 questions). Each question has three possible answers: "yes", "no", "do not know/comment", therefore implying a subjective judgement by the user. While numerical rating scales may appear to provide a more objective assessment, increasing comparability among different raters, in reality summing up individual item answers to produce a global score or a weighted summarization (as in a meta-analysis) can lead to biased estimates since quality itself can be non-additive and nonlinear [21]. On the other hand, the AXIS provides more flexibility for quality of reporting and risk of bias assessment, due to its inherent subjectivity; it appears more comprehensive than similar tools for cross-sectional studies [19]. The NOS, appropriately modified for cross-sectional studies assessment, consists of seven different items grouped into three categories; each item is given a score ranging from 0 to (maximum) 2 stars by the rater. A summary score ranging from 0 to 10 stars is computed by adding up the individual item scores. The three categories are: quality of group selection (4 questions, maximum 5 stars overall), comparability between groups (1 question, maximum 1 star overall), and study outcomes (2 questions, maximum 3 stars overall). The NOS summary score has no universal cut-off value but some authors have suggested the following categorization: very good (9-10 stars), good (7-8 stars), satisfactory (5-6 stars), unsatisfactory (0-4 stars) [22].
Finally, the quality of evidence and strength of recommendations were evaluated using the GRADE scale (Grading of Recommendations Assessment, Development, and Evaluation). The GRADE approach gives an a priori ranking based on study design (randomized controlled trial or observational study) before grading certainty of evidence and weighing cost-effectiveness, patient preference and desiderable/undesiderable effects balance [23].

Results
The search returned 4410 references: a total of 696 records were found from the search in PubMed, 232 in Embase, 3289 in Scopus, and 194 from the references of some studies. 804 records were excluded because they were duplicates. Another 3519 articles were excluded after screening the remaining citations based on title and abstract. Full-text examination was then conducted, and finally 28 papers were included in this review.
A flowchart of the paper selection process is shown in Figure 1. The detailed PRISMA checklist is available in Supplementary Materials Table S1.

Quality Assessment
The results of the quality assessment of the included studies are available in Supplementary Materials Tables S2 and S3. The quality assessment by the NOS on cross-sectional studies [20] and by the AXIS tool [19] was conducted by two authors (P.C. and F.F.) independently, and disagreements were resolved by consensus in the presence of a third author (M.E.).
The mean value obtained using the NOS on the cross-sectional studies was 5.2 ± 0.5. The highest value was 6/10 and the minimum value was 4/10. Only eight studies scored above the mean value; considering the three categories, the quality of group selection was assessed as medium risk of bias, the comparability between the groups had a minimum score (high risk of bias), and the study outcome had a maximum score (low risk of bias). Overall, the quality of the studies can be regarded as satisfactory.
Finally, the quality of evidence and strength of recommendations were evaluated using the GRADE scale [23].   The results of the studies comparing digital anthropometry (DA; 3D scanners) to classic manual anthropometry (CA) with specific reference to body circumferences, lengths, and shapes, are discussed separately on the basis of reliability and accuracy criteria, which quantify random and systematic (bias) error related to anthropometric measurements, respectively (Boxes 1 and 2). The results of studies included in this systematic review are summarized in Tables 2-5; the comparison between studies is described in Supplementary Materials Tables S4 and S5.   The precision and the intraclass correlation coefficients were better in DA than in CA, and the two methods were highly correlated, but there were significant differences between two methods. The DA produced higher readings in waist and hip circumferences compared with CA. The measures were reliable with both methods, but the precision was better in the CA. The agreement was good, but there was significant bias with an overestimation of height (+0.6 mm) and head circumference (+0.3 mm) and an underestimation for arm circumference (−0.2 mm).  [29] Accuracy: coefficient of The linear regression R 2 of hip (0.63, p < 0.05) and waist-to-hip ratio (0.53, p < 0.05) were significant. For waist, height and weight, the same results were not found; significant differences (p < 0.01) existed between DA and CA for circumferences of hip (DA: 40 ± 4.5 cm; CA: 39 ± 4.7 cm) and waist (DA: 33 ± 4.2 cm; CA: 32 ± 4.2 cm).
The accuracy of measures of hip and waist-to-hip ratio decreased when the measure increased. DA produced an overestimation of waist and hip circumferences.
The accuracy of DA was lower than of CA, probably due to variations caused by human subjects.
The correlation was good. The waist circumference was systematically smaller in DA than in CA, and height was less in CA than DA Accuracy: t-test, coefficient of determination (R 2 ) of multiple regression, Bland-Altman plot, mean differences (between groups) All three scanners showed significant mean differences (paired t-test, p < 0.01) with CA (∆ mean: Fit3D Proscanner ® ( Stream SS20 (Cary, NC, USA) (hip and arms +1.6-2.5 cm, thigh −3.0 cm) and Styku S100 (Styku, Los Angeles, CA, USA) (heterogeneous magnitude).
In the processed scans, mean 3DO-tape circumference differences tended to be small (~1-9%) and varied across systems; correlations and bias estimates also varied in strength across anatomic sites and systems. Overall findings differed across devices; the best results were found for the multi-camera stationary system and less so for two rotating single-or dual-camera systems. Mean circumference values by CA and DA were comparable.Statistically significant differences were observed (absolute mean ∆~2 cm across digital scanners and body sites, with a few outliers). Mean systematic differences were negative for Styku S100 scanner (Los Angeles, CA, USA) and positive for Fit3D Proscanner (Redwood City, CA, USA) and Size Stream SS20 (Cary, NC, USA). Relative CA-DA differences were smaller for chest, waist, and hip measurements (∼2-3%) but larger for arms (∼5-7%) and ankles (∼8-10%). Linear regression analysis showed a RMSE of 1-3 cm, with a trend for higher error for Styku; high R 2 values were also seen (majority > 0.90, p < 0.001), with a few exceptions for limbs. Bland-Altman plots displayed significant systematic bias in 11/33 evaluations; correlations between CA and DA waist circumference estimates had R 2 s of 0.95-0.97 (p < 0.001), with measurement bias significant only for the Fit3D Proscanner (Redwood City, CA, USA) (p < 0.05).
Site location error sometimes had a significant impact on various girth measurements. The magnitude of this error varied according to the girth measurement being taken, sex, and BMI. Special care should be applied when measuring girths on females, especially waist girths on lean females.  Comparing the DA and reference method, CV values were lower for CA (between 0.2 and 0.4%) than for DA (between 0.1 and 2.6%), except for the hip circumference with Styku S100 scanner (Styku, Los Angeles, CA, USA) (0.2% in CA, 0.1% in DA), and lower with DXA (between 0.2 and 1.5%) than with DA (between 0.4 and 5.7%), except with the Styku S100 scanner (Los Angeles, CA, USA) for the trunk (0.6% with DXA, 0.3% with the scan) and left leg (1% with DXA, 0.8% with scan).
The reliability was higher in the reference methods (tape measurements and DXA). The measurements of circumferences and regional body volume obtained from 3D optical devices were well correlated with those obtained from tape measurements and DXA, but there were significant differences and an underestimation, especially in body volume for larger subjects; total body volume determined by DA were highly correlated with ADP volumes. Mean group differences between DA and CA ranged from 1.5 cm (arms) and 3.2 cm (thighs). Only hip circumference was not significantly different between the two methods. For all sites, explained variance in linear regression by DA was high (R 2 s, 0.84-0.97; p < 0.0001). Bland-Altman plots displayed how the Naked Body Scanners (Naked Labs Inc., Redwood City, CA, USA) significantly overestimated waist circumferences by~2.0 cm compared with CA (p < 0.0001). Significant bias was also found for left and right thighs, with a mean overestiamtion of~3.0 cm (p < 0.0001). %BF: no significant difference between DA and DXA, with a linear regression R 2 = 0.73 (p < 0.0001). Bland-Altman plot revealed a quasi-significant systematic bias by DA to underestimate %BF (p = 0.09).
DA exhibited greater variation in test-retest reliability between the six measured anatomic locations compared with manual measurements. All six device-derived circumferences correlated with flexible tape references. The %fat estimates correlated with DXA results with no significant bias. Body shape of white American adults differs from that of their UK counterparts. Among Americans, ethnic differences in body shape closely track reported differences in prevalence of metabolic syndrome. 3D photonic scanning offers a novel approach for categorizing the risk of metabolic syndrome. Total BV: strong linear correlation was observed between DA and reference methods (R 2 : 0.98-1.0); significant overestimation by Size Stream ® (Cary, NC, USA) and underestimation by Styku S100 scanner ® (Los Angeles, CA, USA) was observed (p < 0.01) and no true equivalence from Fit3D Proscanner ® (Redwood City, CA, USA) (in contrast to all DXA-derived equations). Bland-Altman plots showed systematic proportional bias of various degrees for all four scanners. DA RMSE: 4.2-10.5 L, with LoA 2.9-5.3 L (both larger compared with DXA-related indices). Similar accuracy issues (strong linear correlation with significant overestimation and proportional bias) were reported in regional volumes.
All scanners produced precise estimates. Precision for circumferences generally decreased in the order of: hip, waist and thigh, chest, neck, and arms. Precision for volumes generally decreased in the order of: BV, torso, legs, and arms. No total or regional 3DO volume estimates exhibited equivalence with reference methods using 5% equivalence regions, and proportional bias of varying magnitudes was observed.     There was a high correlation between DA and ADP for %BF (r = 0.899, r 2 = 0.809, SEE 4.13%), but the mean difference (mean DA: 24 ± 6.8%; mean ADP: 21.9 ± 9.4%) and Bland-Altman (r = −0.597, LoA −6.7 to 11%) showed a significant (p < 0.001) and proportional bias with an overestimation of the lean body.
The scanner overestimated participants at the lean end of the sample and underestimated participants with the most body fat, not providing valid estimates of %BF compared with ADP.
The correlation and concordance were high with DA and there were no significant differences of means. DA produced acceptable measurements compared with DXA, and the two methods were in good agreement, especially in those with normal or high lean mass, but the LoA was wide so the agreement should be interpreted with caution.
DA does not appear to be valid against 4C models.
Total body and regional volumes measured by DXA and ADP had strong associations with corresponding estimates from the commercial 3D optical scanners coupled with the universal software.
Regional body volumes also had strong correlation between DXA and the 3DO scanners. Similarly, there were strong associations between DXA-measured total body and regional fat mass and 3D optical estimates calculated by the universal software. Absolute differences in volumes and fat mass between the reference methods and the universal software values appeared. All pre-post absolute changes in DA whole-body FM showed fair linear correlation with DXA counterparts (r > 0.5); 4 out of 6 regional DA trunk FM changes correlated with DXA measurements. As for relative changes, only TB %FM and trunk % FM correlated with their respective DA measurements. When individually used as predictor variables in simple linear regression analysis, several DA anthropometric measurements produced significant models (p < 0.05, adjusted R 2 12.0-39.9%) with no improvement when implemented in a stepwise regression analysis.
Variation in DXA-measured FM and % FM (at both the TB and trunk level) of women with obesity after exercise training showed several significant correlations, with variation in automatic digital anthropometric measurements. The measurement results below indicate the minimum value and the maximum value; the ranges are wide due to the differences in values of the measured body section and the laser technology used to measure them.
The reliability of the two methodologies (i.e., the variability observed among repeated measurements performed on the same subject by one or more operators, i.e., the intra-and inter-operator variability) is expressed in terms of Technical Error of Measurement (TEM) and %TEM. According to the ISAK protocol, if TEM is <2 cm and %TEM is <1.5%, the anthropometric measurements should be considered reliable.
By means of SL-IR (Structured Light-InfraRed) devices, Conkle used the AutoAnthro Scanner ® (Occipital San Francisco, CA, USA) to obtain comparable inter-and intra-observer TEM, demonstrating that DA performance was operator independent. In contrast, CA produced higher inter-observer TEMs than intra-observer TEMs, as might be expected [38].
Koepke et al. employed an SL laser scanner (BS VITUS Smart XXL (Human solution GmbH, Kaiserslautern, Germany)), which showed acceptable TEMs for hip circumference, as observed in CA; otherwise, manual measurements of other body sites appeared to be less precise [24].
Regarding the relative reliability indices (i.e., the proportion of variance attributable to between-subject variance in a set of measurements), the ICC (two-way mixed-model, absolute agreement) and R (coefficient of reliability) values were high [24,38,47].
Tinsley, Benavides et al. in the aforementioned study, used SL-IR and reported high ICCs (0.974-0.999) for all circumferences [47]; similarly, using SL-IR, Conkle observed high R and ICC values for both methods [38], with slightly better values in CA. Wang et al. showed ICCs > 0.97 for both lengths and circumferences with a SL-laser scanner [46]. Koepke obtained ICCs > 0.993 except for the chest circumference (ICC 0.981) [24]. Pepper et al. reported ICCs ≥ 0.99 for all eight repeated measures of body circumferences, with the abdomen, waist, and hip showing the highest values (ICC = 1.00) and the chest circumference having the lowest one (ICC = 0.992) [31].
For absolute reliability (i.e., consistency within repeated measurements of the same subject), %CV and standard error of the measurement (SEM) were calculated. Wong observed CVs < 5% when using an SL device (Fit3D Proscanner ® (Redwood City, CA, USA)) with the exception of the forearm circumference (CV value = 6.09%) [37]; Ng et al. reported %CVs of 0.75-2.24% measured by a SL-IR scanner (Fit3D Proscanner ® (Redwood City, CA, USA)) [45]. The studies of Kennedy with SL-IR scanners showed lower reliability in DA than CA for body circumferences (CV 0.4-2.7% vs. 0.2-0.4%), with the most precise measurement being the hip [39]. Bourgeois, who used two SL scanners and one ToF, revealed %CVs < 2.6% for four circumferences: waist, hip, right arm, and right thigh [34]. Simenko and Busic's studies, which used SL-visible light scanners, reported %CV > 5% [49,50], with the exception of hip circumference (%CV in DA 4.243%, in CA 4.295%) [49], which performed slightly better in DA (%CV 6.62-11.29, SEM 0.13-0.46) [50]. Busic also reported similar values of SEM between the two methods, with higher values in chest and breast circumference [49]. Wong examined the %CV of body shape indices and found that the CV was 1.50% for the waist hip ratio, 1.82% for waist height ratio, and 1.29% for waist-width ratio, respectively [37].

Accuracy
Most of the studies selected evaluated DA accuracy compared with manual measurements. The correlation was studied with Pearson's coefficient (r), and a very strong linear correlation (r > 0.8) [52] was demonstrated between the two methods [24,27,32,48,49]. There was also a strong correlation with body shape [24]. Some studies [24,27] examined the correlation with Lin's concordance correlation (CCC); other studies [25] used Spearman Rho, and the strong correlation was confirmed.
Similar results were found in Sobhyeh and Kennedy's study (R 2 > 0.90 for most of the scanners compared with conventional anthropometry, p < 0.001, RMSE 1-3 cm), with a few exceptions for limbs; specifically, correlations between CA and DA waist circumference had R 2 s of 0.95-0.97 (p < 0.001) [43]. Wong, who studied a pediatric population, found high values of R 2 and RMSE for waist (R 2 0.939, RMSE 3.783 cm) and hip (R 2 0.987, RMSE 1.828) circumferences [37]. In the study of Wells, ranking consistency was high (R 2 > 0.90 for most of the outcomes) [48].
Differences between two methods were found. A statistically significant overestimation between means obtained with DA and CA was observed for waist circumferences (Japar' [24]). This was also confirmed by Bland-Altman plots that highlighted a systematic bias and a proportional bias in circumference of waist [27]. Wells [48]. Wong found a good agreement using Bland-Altman plots for hip and waist circumference [37]. Heuberger also reported significant (p < 0.01) differences in waist and hip circumference measured by DA and CA [29].
Kennedy et al. found mean group differences between DA and CA ranging from 1.5 cm (arms) to 3.2 cm (thighs), which were all statistically significant apart from that for hip circumference [39]. Sobhyeh et al. reported statistically significant differences between CA and DA means (absolute mean difference (∆)~2 cm across digital scanners and body sites, with few outliers). Overall, Bland-Altman analyses revealed systematic bias in 11 of the 33 evaluations, with the highest observed slopes comparing CA and DA results by the Fit3D Proscanner ® (Redwood City, CA, USA) system. Relative CA-DA differences were smaller for chest, waist, and hip measurements (2-3%) and larger for arms (5-7%) and ankle measurements (8-10%). As for lower limbs, the Styku S100 scanner®(Styku, Los Angeles, CA, USA) displayed absolute mean differences between DA and CA measurements, increasing from 1-2 cm at the thighs to 2-3 cm at the calves and then to 6 cm at the ankles; in contrast, both the Fit3D Proscanner ® (Redwood City, CA, USA) and the Size Stream SS20 (Cary, NC, USA) showed relatively constant (1-3 cm) mean differences between DA and corresponding CA measurements at those body sites, with no increasing pattern moving down along the legs. As a result, Bland-Altman slopes comparing ankle circumferences that were measured on the Size Stream SS20 (Cary, NC, USA) and Fit3D Proscanner ® (Redwood City, CA, USA) were close to zero. When comparing ankle circumferences measured on either of these scans with Styku S100 scanner®(Styku, Los Angeles, CA, USA) scans, the Bland-Altman slopes were larger [43].
As for body shape, Japar and Koepke observed significant differences for Waist to Hip Ratio (WHR). Koepke reported a WHR of 0.82 in DA vs. 0.85 in CA, and Japar reported 0.82 in DA vs. 0.89 in CA; for Waist to Height Ratio (WHtR), Koepke reported a value of 0.46 in DA vs. 0.45 in CA, with p < 0.001 [24,28].
Simenko reported conflicting results, with 10 statistically different paired measurements out of 14, but clinically small differences (mean differences 0.273-0.974 cm, p < 0.05). Bland-Altman plots showed high agreement between both methods; 95% LoA were narrow for both the upper (−1.61; +2.74 cm) and lower limbs (−1.43; +1.95 cm) [50]. Similarly, Busic showed significant differences between means in 9 out of 15 circumferences (height, waist, hip, chest, upper arms, forearms, and right upper leg circumferences) with a p < 0.05, with small differences except for the breast and chest girths, probably due to the chest movements during breathing [49]. Unlike other authors, Lu used the mean absolute difference (MAD) between the DA and CA as a measure of accuracy in addition to the paired t-test, finding significant differences in 8 of the 12 body dimensions, p < 0.05 [51]. Busic showed good agreement and no significant biases between DA and CA, as analysed by Bland-Altman plots [49]. Kennedy et al. found significant mean differences between three DA devices and CA circumferences on a sample of young children (∆ mean: Fit3D Proscanner ® (Redwood City, CA, USA), 1.2-4.2 cm; Styku S100 scanner ® (Los Angeles, CA, USA), 1.0-5.5 cm; Size Stream SS20 (Cary, NC, USA), 1.6-3.4 cm; p < 0.01). The Fit3D Proscanner ® (Redwood City, CA, USA) generally overestimated waist, right arm, and left arm measurements by~1.5 cm and hip measurements by about 4.0 cm; in contrast, thigh circumferences > 40 cm were generally underestimated. The Size Stream SS20 (Cary, NC, USA) also showed a slight positive bias for waist, hip, right arm, and left arm measurements ranging from 1.6 to 2.5 cm, whereas thigh circumferences were underestimated by~3.0 cm. Finally, the Styku S100 scanner®(Styku, Los Angeles, CA, USA) prediction bias was less homogenous between different measurement locations [42].
Pepper et al. found no significant differences for waist, hip, or waist: hip ratio according to paired-samples t tests (p = 0.05) [31]. Garlie also found no significant differences between the means of height, weight, neck circumference, and waist circumference [26].
In terms of body shape, DA and CA agreed on popular indices of body shape (waist circumference, waist to hip ratio, waist to height ratio). The correlation was very high [24,25], but the Bland-Altman plot exhibited a bias and a trend towards values in the upper part of the range in DA [25].
As for anthropometric measurements, the results are analysed separately on the basis of the reliability and accuracy criteria, which quantify random and systematic (bias) errors related to anthropometric measurements, respectively (Boxes 1 and 2). The results of studies included in this systematic review are summarized in Tables 2 and 4 [41]. Another study by Tinsley et al. using the same scanners reported an average RMS-% CV of 1.9-2.3% for body volumes; the lowest value was observed for total body volume (RMS-% CV < 1% for all scanners), followed by trunk (~1.2%), legs (~2.5%), and arms (~3 to 5%) [47].
The measurement results are presented below as the minimum and maximum values; the range can be wide depending on the specific body segments and the laser used to measure them.
Cabre and Milanese examined body composition. In Cabre's study there was a strong correlation between 3D, DXA, and four compartments (4C) model for FM (3D vs. DXA r = 0.90, 3D vs. 4C r = 0.85) and for FFM (3D vs. DXA r = 0.90, 3D vs. 4C r = 0.92) [36]. Milanese et al. compared post-exercise changes in total and regional FM, as detected by DA and DXA. Whole-body FM showed a fair linear correlation with DXA counterparts (r > 0.5); 4 out of 6 regional DA trunk FM changes correlated with DXA measurements. As for relative changes, only total %FM and trunk %FM correlated with their respective DA measurements [30].
The association between DA and other methods was studied through the coefficient of determination. For total body volume, a strong correlation between DA, ADP, and DXA (R 2 > 0.97, RMSE 1.618-9.7) was described [34,40,45].
Despite a strong prediction of a total BV R 2 = 0.98-1.0, Tinsley et al. observed significant overestimation (Size Stream SS20 ® (Cary, NC, USA)) and underestimation (Styku S100 scanner®(Styku, Los Angeles, CA, USA)) (both p < 0.01) for the DA vs. the reference method (4C model). The reported RMSE for total BV ranged from 4.2 to 10.5 liters depending on the scanner, whereas DXA predictive equations showed a RMSE of 0.7-1.5 L. Compared with the 4C model, Bland-Altman plots showed systematic proportional bias in total BV (with statistically significant regression coefficients) for all four scanners and wider LoA with the 4C model than with DXA (LoA 2.9-5.3 L vs. 1.1-2.0 L, DA vs. DXA). Similar accuracy issues were also reported for regional volumes, with DA significantly overestimating trunk volume as well as underestimating both arm and leg volumes. Furthermore, all 3D regional volumes failed to exhibit equivalence with DXA-derived volumes [47].
Using (UWW as the criterion method, Wang et al. found no significant difference in %BF between DA and UWW (p = 0.4801), although the absolute differences were higher than in volumes [46].
In 4.8% for %FM); Styku S100 scanner®(Styku, Los Angeles, CA, USA) displayed the largest RMSE (4.6 kg for FM and FFM; 6.1% for %FM). According to the Bland-Altman analysis, FIT3D Proscanner ® (Redwood City, CA, USA) showed the narrowest LoA (±7% for %BF and ±~5.5 kg for FM and FFM), with other scanners showing larger values (±9.0-9.5% for %BF and ±~7.0 kg for FM and FFM). Proportional bias was largest for %FM, with regression coefficients ranging from ±0.1 to 0.3 for all scanners (all p < 0.01). On the one hand, only Naked 3D Fitness Trackers ® (Redwood City, CA, USA) did not display proportional bias for FM, while the other scanners displayed regression coefficients of 0.1 to 0.2 (p < 0.0001). On the other hand, only Styku S100 scanner®(Styku, Los Angeles, CA, USA) did not display proportional bias for FFM, while all other scanners exhibited proportional bias, which was statistically significant but small (coefficients ± 0.1) [41].
In another study using DXA as a reference method for %BF, android and gynoid FM showed higher prediction values for android and gynoid FM (R 2 93.2% android and 91.4% gynoid) than %BF (76.4% android and 66.5% gynoid). As for Bland-Altman plots, both FM and %FM data were randomly dispersed within the 95% LoA; the limits of agreement for FM and %FM were −0.06 ± 0.87 kg and −0.11 ± 1.97 % for android and −0.04 ± 1.58 kg and −0.19 ± 4.27% for gynoid, respectively; few outliers and a systematic bias~0 cm for both android and gynoid FM were observed [44].
The mean differences between DA and other methods were measured. When measuring total body volume with DA, Bourgeois reported a significant (p < 0.0001) underestimation in comparison with DXA and ADP; for regional body volume, Bourgeois found significant differences: the volume of the trunk was overestimated, and the volumes of arms and legs were underestimated [34].
Garlie showed small and no significant mean differences between DA and DXA of 0.11 ± 3.1%, with a LoA ranging from −6.06 to 6.28% [26]. Pepper reported no significant difference between DA and DXA, and HW measured %BF [32]. In the study by Kennedy et al., the Naked Body Scanners (Naked Labs Inc, Redwood City, CA, USA) showed a trending bias to underestimate %FM in individuals with less than~30% body fat (p = 0.09) [39].

Discussion
This work focused on how three-dimensional body scanners perform in terms of anthropometric measurements and body composition estimates. The majority of included studies reported good reliability and accuracy of DA, with laser-based scanners outperforming other technologies [24]. SL-projectors and ToF scanners produced a wide spectrum of results: some studies found lower, though still acceptable, reliability than reference methods [34,37,38,48], whereas two studies reported poor precision in both CA and SL-projector or ToF scanners [39,49], and one study demonstrated better precision with SL-projector and passive stereo scanners than the reference method [34].
With the exception of one [26], all the studies agreed on a good correlation between traditional and 3D body composition estimates, but 3D imaging showed a systematic bias. In particular, because of heterogeneity in landmark positioning and body surface partitioning algorithms [33], three of the studies included in the review found less accurate estimation of the %BF and total BV by 3D imaging than BIA, HW, and ADP among adults with increased adiposity [33][34][35]. The quality of evidence was assessed by means of the GRADE tool [23]. The GRADE approach rates each outcome across studies, assigning a final grade of "high", "moderate", "low", or "very low" for all critically important outcomes. Clinical parameters of anthropometric measures and body composition were used. The certainty of evidence was considered very low for almost all studies: nineteen studies [24][25][26][28][29][30]33,35,[37][38][39][40]43,44,47,48,50,51] had a serious risk of bias according to the AXIS tool, three studies [27,42,45] were considered imprecise for the narrow sample size, and two studies [36,42] used an indirect comparison of evidence. Four studies were rated low [24,26,29,48] (Table  7).

Discussion
This work focused on how three-dimensional body scanners perform in terms of anthropometric measurements and body composition estimates. The majority of included studies reported good reliability and accuracy of DA, with laser-based scanners outperforming other technologies [24]. SL-projectors and ToF scanners produced a wide spectrum of results: some studies found lower, though still acceptable, reliability than reference methods [34,37,38,48], whereas two studies reported poor precision in both CA and SL-projector or ToF scanners [39,49], and one study demonstrated better precision with SL-projector and passive stereo scanners than the reference method [34].
With the exception of one [26], all the studies agreed on a good correlation between traditional and 3D body composition estimates, but 3D imaging showed a systematic bias. In particular, because of heterogeneity in landmark positioning and body surface partitioning algorithms [33], three of the studies included in the review found less accurate estimation of the %BF and total BV by 3D imaging than BIA, HW, and ADP among adults with increased adiposity [33][34][35].

Discussion
This work focused on how three-dimensional body scanners perform in terms of anthropometric measurements and body composition estimates. The majority of included studies reported good reliability and accuracy of DA, with laser-based scanners outperforming other technologies [24]. SL-projectors and ToF scanners produced a wide spectrum of results: some studies found lower, though still acceptable, reliability than reference methods [34,37,38,48], whereas two studies reported poor precision in both CA and SL-projector or ToF scanners [39,49], and one study demonstrated better precision with SL-projector and passive stereo scanners than the reference method [34].
With the exception of one [26], all the studies agreed on a good correlation between traditional and 3D body composition estimates, but 3D imaging showed a systematic bias. In particular, because of heterogeneity in landmark positioning and body surface partitioning algorithms [33], three of the studies included in the review found less accurate estimation of the %BF and total BV by 3D imaging than BIA, HW, and ADP among adults with increased adiposity [33][34][35].
Despite this observation, DA appears to be less time-consuming and more reliable than CA, especially in the clinical population with obesity. Furthermore, three-dimensional body scanners have some significant advantages: they are more affordable than DXA, which requires adequately trained personnel in the acquisition and post-processing phases, and exposes patients to ionizing radiation. They are also less expensive and invasive than other reference body composition techniques (ADP, HW) in the field of bicompartmental models (Fat Mass and Fat Free Mass) of human body composition assessment.
However, DA has a few limitations due to technical and human variability. Technical variability is influenced by the characteristics of 3D scanning hardware and the performance of data acquisition, visualization, landmarking, and measurement extraction software. Stationary SL laser scanners show sub-millimeter accuracy and resolution, although their cost and slow scanning time limit their use to experimental settings. At the other end of the spectrum, passive stereo (PS) handheld devices or ToF mini-scanners represent economical "field measurement" options, though lower resolution limits their use in collecting groundtruth data [53].
Currently, the absence of validated reference software makes the use of patentprotected technologies and software from different manufacturers, which only allow updates and calibrations but no direct comparison between different models, unavoidable [54]. As a first step towards DA standardization, several studies have proposed the development of standard software (which does not require laborious manual positioning of reference landmarks), paving the way for cross-validation of body measurements across different devices [40,43]. Furthermore, incorporating principal component analysis (PCA) into regression models trained by machine-learning algorithms could lead not only to improved accuracy of body composition estimates but also of haematological metabolic parameters, muscle strength, and performance [55,56]. Finally, from a global rehabilitation perspective, the integration of postural analysis based on 3D imaging in a complete tool for assessing patients' nutritional status could provide useful diagnostic information to researchers or clinicians, considering that patients with over-or under-nutrition (such as obesity and/or eating disorders) may be affected by pathologies affecting the musculoskeletal system [57][58][59][60].
Participants ability to minimize motion artifacts and to replicate a standard pose across several scans is contributes to human variability [54]. Indeed, experimental studies have demonstrated better accuracy and precision if the human variation is under the control of the experimenters. Lu et al. used a dummy to eliminate interference from body sway and allow for stable posture. They showed that the mean values of the absolute difference between the scan-derived measurements and hand-held measurements, and between the scan-derived repeated measurements were better than the mean values reported in other studies that did not use a dummy [51]. It is, therefore, recommended to normalize the rate and depth of respiration during the acquisition phase through repeated measurements; otherwise, serious accuracy problems may arise. To minimize the impact of posture on human variability, without compromising the quality of the scans, positioning aids have been developed [54].
Furthermore, if a standardized pose is not adopted by the subjects analysed, it may be possible to remove the unwanted pose variance (i.e., a random error introduced by different postures) by rigging individual 3D meshes to a standard pose. This in turn would improve the mathematical models applied to the prediction of human body composition [37,56].
Additional clinical applications of 3D body scanners include anorexia nervosa and obesity diagnosis and treatment. The ability of patients to describe other people's 3D body images and their own body images could help clarify any relationship between the mental representation of the body and body image distortion [61].

Limitations
Our systematic review suffers from a number of drawbacks. The included studies are observational (cross-sectional), lowering the overall quality of evidence compared with experimental studies. The study sample sizes were generally small, with heterogeneous ages, ethnic groups, and body mass index (BMI) classes. The heterogeneity of measurement sites further limits the comparability of studies. As mentioned above, digital scanners used patent-protected technologies and software from different manufacturers, which limits direct comparisons between devices; finally, the "reference" methods used in DA validation were not always gold standard techniques.

Conclusions
Initially designed for the textile industry, DA applications have now expanded to human nutrition, due to rapid technological advancements. The high reliability and speed of measurement detection make DA more suitable than its conventional counterparts in specific contexts, e.g., large-scale population surveys or clinical subpopulations. Furthermore, 3D body imaging could be used in place of other known methods of body composition assessment where biological costs (DXA, computed tomography (CT)) or technical/time constraints (ADP, UWW, magnetic resonance (MR)) are of concern. Finally, AD could be proposed as a screening tool before second-level imaging techniques for the assessment of human body composition as well as in postural analysis. However, hardware variability, a lack of standard validated software, the cost of more accurate and precise scanners, and small sample sizes limit the quality of the evidence in current studies.
For all of these reasons, this systematic review of the literature was able to achieve the primary objective of providing an update on the state of digital anthropometry. The secondary objective was to verify the methods for identifying the best technology to be used in the field of DA, to identify how technologies can be selected appropriately for specific applications, and to identify ways in which digital anthropometry technologies can be incorporated into daily clinical practice. Investigation of this objective highlights a series of concerns that must first be further investigated in order to address the above.
Finally, although the contribution of anthropometric measurements in statistical models for the prediction of human body composition (for example, in the estimation of lean body mass) is extremely high and explains over 80% of variability [62], to explain residual variability, especially in different clinical settings, it is necessary to develop new tools and software that integrate the available analytical methods of human body composition according with the perspective of multicompartmental models.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.