Validating Accuracy of an Internet-Based Application against USDA Computerized Nutrition Data System for Research on Essential Nutrients among Social-Ethnic Diets for the E-Health Era

Internet-based applications (apps) are rapidly developing in the e-Health era to assess the dietary intake of essential macro-and micro-nutrients for precision nutrition. We, therefore, validated the accuracy of an internet-based app against the Nutrition Data System for Research (NDSR), assessing these essential nutrients among various social-ethnic diet types. The agreement between the two measures using intraclass correlation coefficients was good (0.85) for total calories, but moderate for caloric ranges outside of <1000 (0.75) and >2000 (0.57); and good (>0.75) for most macro- (average: 0.85) and micro-nutrients (average: 0.83) except cobalamin (0.73) and calcium (0.51). The app underestimated nutrients that are associated with protein and fat (protein: −5.82%, fat: −12.78%, vitamin B12: −13.59%, methionine: −8.76%, zinc: −12.49%), while overestimated nutrients that are associated with carbohydrate (fiber: 6.7%, B9: 9.06%). Using artificial intelligence analytics, we confirmed the factors that could contribute to the differences between the two measures for various essential nutrients, and they included caloric ranges; the differences between the two measures for carbohydrates, protein, and fat; and diet types. For total calories, as an example, the source factors that contributed to the differences between the two measures included caloric range (<1000 versus others), fat, and protein; for cobalamin: protein, American, and Japanese diets; and for folate: caloric range (<1000 versus others), carbohydrate, and Italian diet. In the e-Health era, the internet-based app has the capacity to enhance precision nutrition. By identifying and integrating the effects of potential contributing factors in the algorithm of output readings, the accuracy of new app measures could be improved.


Materials and Methods
We evaluated the essential nutrients from 131 social-ethnic diets taken by various vulnerable populations, continued from our series of studies [7,12,42]. Under-reporting is a common human behavior in human studies in recording dietary intake [13,43,44]; thus, validation of dietary intake with model social-ethnic diets as recipes or menus can offer precise and controlled nutrient intake for individuals to follow and consume [12,45]. We examined four groups of diets, including: (1) pure liquids [27,30,46]; (2) convenient diets [25,26,34]; (3) ethnic diets of Western and Eastern origins [21,24]; and (4) smoothies- added [27][28][29]. We selected various diets to enhance the delivery of nutrients based on possible forms of liquids and solid foods. To add variations to the baseline daily recipes of these diets, excess calories, proteins, vegetables, fruits, and fats were included.
We evaluated essential nutrients of these diets based on the United States National Institute of Health's dietary reference intake [47] using both NDSR and the app based on 3-day 24-h dietary diaries throughout the week. Macro-nutrients included total calories, carbohydrates, protein, total fat, saturated fat, cholesterol, and fiber [12]. Micro-nutrients included vitamins: thiamin B1, riboflavin B2, niacin B3, pyridoxine B6, folate B9, and cobalamin B12, A, C, D, E; amino acids: methionine, glycine [47]; choline; and minerals: zinc, calcium, magnesium, iron, and sodium. Foods that are rich in carbohydrates also commonly contained fiber, vitamins A, B9, and C. Whereas, foods that are meat-based are rich in protein, fat, saturated fat, cholesterol, vitamin B12, methionine, glycine, and zinc [47].

Social-Ethnic Diets
Details of these diets were included in a prior report [12] are summarized in the following sections. Liquids are necessary for human hydration and gut motility [27,28]. Pure liquid diets have been commonly recommended for frail elderly individuals and patients with terminal illness and gastrointestinal dysfunction following surgery [27,30]. Examples of liquid diets comprise: various liquids loaded with minerals, fruits, vegetables, and soups. Convenient diets in modern industrial societies include canned-foods, café diets, and fast foods [25,26]. The canned-foods are comprised of items such as soups, various vegetables, noodles, and fish. The café diets include foods offered at high schools, frequently were loaded with sodium and saturated fat [34] such as chicken wings, chicken tenders, grilled cheese sandwiches, mini cheeseburgers, pizza, cheese sticks, hot dogs, and corn dogs. The fast-foods also contain high fat, saturated fat, sodium, sugar, and empty calories [25,26,48], such as fried meats, fries, biscuits, sandwiches, wraps, and hash browns [12,25,26].
Various ethnic diets include Western influences from American, Mexican, Italian, and Mediterranean diets, and Eastern influences from Japanese, Chinese, and Korean diets [21,24,37]. The American diet embodies fried foods and salads [31]. The Mexican diet contains grain-and bean-based foods with meats and chili soups. The Italian diet includes grains and vegetables. The Mediterranean diet contains plenty of fruits, vegetables, whole grains, nuts, and olive oil, with some meat [37]. The Japanese diet includes seafoods, soups, and tempura [49]. The Chinese diet embodies meats, tofu, eggs, noodles, and rice [50]. The Korean diet includes rice, meats, mixed bowls of vegetables, and soups [21,24]. A smoothie-added diet could be used when people perform physically strenuous activities that require additional hydrations or nutrients. The smoothie-added diet includes a variety of fruits and vegetables added to the base diet, with ethnic-based ingredients and multigrains [27][28][29]51].

Dietary Measures and Nutrient Intake
We analyzed 3-day dietary intake of nutrients from various social-ethnic diets using both NDSR and an app. We used NDSR software version 2015 based on published values in the USDA nutrient database (NDSR, Minnesota University, MN, USA; http://www.ncc. umn.edu/products/nutrients-nutrient-ratios-and-other-food-components/primary-energysources/ (assessed on 1 June 2022)), which was initiated in 1974 by the National Heart, Lung, and Blood Institute (NHLBI) [7]. NDSR was developed from 19,000 foods that embody an array of ethnic foods and common menu items for analysis of 178 nutrients [12,[52][53][54]. NDSR is widely used for nutrient analysis based on the 24-h diary, recipes, and menus, across populations in many countries with different diet types [12,[52][53][54]. NDSR provided detailed reports on the quantity of macro-and micro-nutrients [55]. The app was developed by a nutrigenomics company that specialized in the internetbased mobile app to assess daily dietary intake (GB HealthWatch, San Diego, CA, USA, https://healthwatch360.gbhealthwatch.com, accessed on 1 June 2021) [12,23,56]. The app, based on daily food logs, could extricate 30 essential nutrients. Data accuracy was confirmed by team members before analysis.

Data Analysis
We analyzed data with JMP version 15.0.0 statistical software [57][58][59][60] (SAS Institute Inc., Cary, NC). We evaluated the agreement and bias between the apps and NDSR. Means and standard deviations (SD) for nutrients [61] were calculated for both measures. The agreement between the two measures for the parameters were analyzed using ICCs and mean % differences, and bias with standard errors (SE). Pairwise correlations for ICCs (r) between the two measures presented the strengths of association (excellent: ≥0.9, good: 0.75 ≤ 0.90, moderate: 0.50 ≤ 0.75, poor: <0.50) [39,55,62]. The Bland-Altman plots presented differences with limits of agreement (LoAs: mean differences ± 2 SD) between the two measures [12,63,64], with good agreement at 95% or greater [65][66][67]. The alpha for the significance level for all analyses was set at 0.05.
The analytics were described in detail in earlier studies [7,12,42], and summarized in the following. We employed GR models with AI-based machine learning methods to identify the source of differences between the two measures by progressively incorporating related factors in the analysis [7,12,42]. The source factors that could contribute to the differences between the two measures included (1) caloric ranges of total calories (<1000, 1000-2000, or >2000); (2) effects of differences from carbohydrates, protein, fat; and (3) diet types [12]. JMP software provided logistic regression (LR) as the default model to explain the baseline dependent variables. Following LR, we selected the Elastic Net estimated method (Leave-One-Out (LOO) and Validation Column for confirmatory analysis to predict the accuracy with smaller misclassification rate for minimal prediction error by avoiding over-fitting [7,12,42,68]. Elastic Net models were employed for effective use in handling complex multiple domains in the datasets, balancing potential interactions [69]. Both LOO and AICc validation are effective for small sample sizes and handling multiple domains [7,59]. LOO is used to select source factors within domains of caloric ranges, macro-nutrients, and diet types [70,71]. We used AICc validation columns to confirm how well the model fits with unbiased best model prediction [70], with 80/20 randomized split for training and validation set for predictive modeling. The selection of the best model is based on AICc (lower score is fitter, more precise), misclassification (smaller is more accurate), and area under the Receiver Operating Characteristics (ROC) curve (AUC, higher coverage is better) [42,60]. The interaction profilers were used to visualize the significant interactions, if found could be included in the final model [70,71].

Predictive Modeling for the Difference between the Internet-Based App against NDSR: Generalized Regression Analysis
For predictive modeling, we progressively examined significant factors per domains of caloric ranges (coded as one of the three versus the other two categories for <1000, 1000-2000, and >2000), energy-producing macro-nutrients, and diets. We included the significant factors of all domain factors in the final combined model (Table S5 progression  examples for total calories, Table S6 for folate, and Table S7 for cobalamin). For total calories, differences of <1000 over other caloric ranges, fat, and protein were significant contributing factors to the difference between the app and NDSR (misclassification 0.19, AICc 30.29, and AUC 0.89) ( Table 3, baseline LR model on the left panel and GR model validation on the right panel). As an example, Figure 5 illustrates the AUC curve for total calories with 89% for both sensitivity and specificity for the accuracy of the selected model [71]. Through the progressive analysis, we observed a higher AICc and less precise model by including an additional factor, the Chinese diet, thus producing a less favorable model than the selected model (Table S5). Through the progressive analysis, we observed a higher AICc and less precise model by including an additional factor, the Chinese diet, thus producing a less favorable model than the selected model (Table S5).  We selected folate and cobalamin as the most representative essential micro-nutrients in precision nutrition for more detailed presentation. Factors that contributed to the differences in folate between the two measures included caloric range (< 1000 versus the other two categories), carbohydrate, and Italian diet (misclassification 0.30, AICc 38.5, and AUC 0.90) ( Table 4). Through the progression analysis, we noted a higher AICc and AUC by including an additional factor, the Chinese diet, thus, this factor was not included (Table S6). For cobalamin, significant factors that contributed to the differences between the two measures included protein, and American and Japanese diets (misclassification 0.26, AICc 37.7, and AUC 0.81) ( Table 5). In the progression analysis, we observed a higher AICc by including additional factors of Chinese and Mexican diets, which presented less favorable models than the selected model; thus, they were not included in the final model (Tablet S7). We selected folate and cobalamin as the most representative essential micro-nutrients in precision nutrition for more detailed presentation. Factors that contributed to the differences in folate between the two measures included caloric range (<1000 versus the other two categories), carbohydrate, and Italian diet (misclassification 0.30, AICc 38.5, and AUC 0.90) ( Table 4). Through the progression analysis, we noted a higher AICc and AUC by including an additional factor, the Chinese diet, thus, this factor was not included (Table S6). For cobalamin, significant factors that contributed to the differences between the two measures included protein, and American and Japanese diets (misclassification 0.26, AICc 37.7, and AUC 0.81) ( Table 5). In the progression analysis, we observed a higher AICc by including additional factors of Chinese and Mexican diets, which presented less favorable models than the selected model; thus, they were not included in the final model (Table S7).  We also assessed significant factors between the app and NDSR for other nutrients in the progressive analysis, and summarized the final models in the Supplementary Tables (carbohydrate, protein, fat, saturated fat, cholesterol, and fiber in Table S8; thiamin, riboflavin, niacin, pyridoxine, choline, glycine, and zinc in Table S9; and vitamins A, C, D, and E, and calcium, magnesium, iron, and sodium in Table S10). Significant factors that contributed to the differences between the two measures for carbohydrate included calories, fiber, and Italian diet; for protein: caloric range (1000-2000 versus others), total calories, cholesterol, and canned foods; for fat: saturated fat, cholesterol, and fast food; for saturated fat: total calories, fat, and Korean diets; for cholesterol: fat and pure liquids; for fiber: caloric range of 1000-2000, carbohydrate, and canned-foods (misclassification 0.11-0.22, AICc 24.0-33.7, AUC 0.82-0.98). Similarly, significant factors for thiamin included total calories, fiber, high school, and Chinese diet; for riboflavin: carbohydrate, protein, and high school diets; for niacin: total calories and Chinese diet; for pyridoxine: total calories and canned-food; for choline: total calories, and canned-food; for glycine: protein, canned-food, and Japanese diet; and for zinc: total calories, protein, canned-food, and Japanese diet (misclassification 0.04-0.22, AICc 19.8-35.7, AUC 0.88-0.97). Additionally, significant factors for vitamin A included protein, and Korean and smoothie-added diets; for vitamin D: cholesterol, canned-food, and Mediterranean diet; for vitamin E: fat, fast food, and Italian diet; vitamin C: fiber, fast food, and Mexican diet; for calcium: cholesterol, and Mexican and Chinese diets; for magnesium: total calories, protein, fiber, canned-food, and high school food; for iron: protein, high school food and Mediterranean diet; for sodium: saturated fat, high school food, and Mexican diet (misclassification 0.11-0.33, AICc 31.2-41.7, AUC 0.73-0.90). The canned-food diet type was a common contributing factor to the differences between the two measures for protein, fiber, pyridoxine, choline, glycine, zinc, vitamin D, and magnesium (Tables S8-S10). We did not explicitly test the model for methionine, as methionine is purely dependent on protein for calculating the intake level. The interaction profiler plots did not present any significant three-way interactions for inclusion in the final models for all nutrients.

Discussion
For precision nutrition, we validated the accuracy of an internet-based app in assessing macro-and micro-nutrients in various social-ethnic diets, against 3-day NDSR in this study, in addition to previous validation against FFQ [13,14,19]. Compared to previous findings with excellent ICCs on calories (0.9-1) between the app and NDSR [13,19], we found good agreement (≥0.75) between the two measures for total calories (0.85). We further identified good agreement (≥0.75) between the two measures for total calories with caloric ranges of 1000-2000 (0.76), but moderate agreement with caloric ranges of <1000 (0.75) and >2000 (0.57). Compared to previous findings of moderate to excellent agreement for other major nutrients [13,19], the ICCs in this study between the two measures for most macro-nutrients were good (≥0.75) (average: 0.85 for all macro-nutrients). We further demonstrated the good agreement between the two measures using ICCs for most micronutrients (average 0.83), but found only moderate agreement for cobalamin (0.73) and calcium (0.51) (all p < 0.001).
In comparison to NDSR, the app underestimated protein and fat-based nutrients (protein: −5.82%, fat: −12.78%, vitamin B12: −13.59%, methionine: −8.76%, zinc: −12.49%), while it overestimated carbohydrate-based nutrients (fiber: 6.7%, B9: 9.06%). Thus, we found similar underestimation of protein and fat for macro-nutrients between the app and NDSR, as in a previous study [19]. Contrary to the validation against FFQ with the acceptable estimation of vitamins B1 (2.46%) and B9 (3.24%), this study demonstrated that the app overestimated vitamins B1 (6.48%) and B9 (9.06%) against NDSR. For choline, the app presented acceptable estimation (−4.51%) using 5% criteria, against NDSR, while it was underestimated with the app against FFQ (−6.23%). The bias (SE) was small for all three caloric ranges (0.63-0.88), between the two measures of the app and NDSR; which are smaller and more precise in this study compared to the findings in a prior study with the app against FFQ (1.44-5.91) [12,19]. The smaller bias could be due the same recording duration of 3-day diaries with both the measures in this study, as contrary to a longer duration in measure with FFQ for the prior study. The correlations between the two measures in this study were strong ≥0.70 for most nutrients (average for macro-0.85 and 0.82 for micro-nutrients) except for calcium (0.51), that are of similar findings as in a prior study with app being validated against FFQ [12].
We identified the factors that contributed to the differences between the two measures. For total calories, the sources that contributed to the difference between the two measures included caloric range (<1000 versus others), fat, and protein. For folate, the sources included caloric range (<1000 versus others), carbohydrate, and Italian diet. Additionally, for cobalamin, the sources included protein, and American and Japanese diets. Therefore, continuing from a previous study [12], caloric ranges and various diets could be used to identify the sources that might contribute to the differences between the two measures, when validating a new measure such as an internet-based app against established dietary measures.
In summary, fat and protein continued to be the major nutrient sources of differences in the validation of the app against established dietary measures, when compared to NDSR as well as FFQ [12]. The confirmatory predictive modeling further substantiated this result, with specific caloric ranges and diet types also as sources of differences. Hence, these source factors could be used to adjust the algorithm of readings in the development of the internet-based app. With advanced technology and AI analytics in the e-Health era, these apps have the capacity to enhance precision nutrition, by integrating potential contributing factors in the development of new accurate measures. Various diets across populations and related factors should be included in future studies to further validate dietary measures.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the parent study.