Performance of the Digital Dietary Assessment Tool MyFoodRepo

Digital dietary assessment devices could help overcome the limitations of traditional tools to assess dietary intake in clinical and/or epidemiological studies. We evaluated the accuracy of the automated dietary app MyFoodRepo (MFR) against controlled reference values from weighted food diaries (WFD). MFR’s capability to identify, classify and analyze the content of 189 different records was assessed using Cohen and uniform kappa coefficients and linear regressions. MFR identified 98.0% ± 1.5 of all edible components and was not affected by increasing numbers of ingredients. Linear regression analysis showed wide limits of agreement between MFR and WFD methods to estimate energy, carbohydrates, fat, proteins, fiber and alcohol contents of all records and a constant overestimation of proteins, likely reflecting the overestimation of portion sizes for meat, fish and seafood. The MFR mean portion size error was 9.2% ± 48.1 with individual errors ranging between −88.5% and +242.5% compared to true values. Beverages were impacted by the app’s difficulty in correctly identifying the nature of liquids (41.9% ± 17.7 of composed beverages correctly classified). Fair estimations of portion size by MFR, along with its strong segmentation and classification capabilities, resulted in a generally good agreement between MFR and WFD which would be suited for the identification of dietary patterns, eating habits and regime types.


Introduction
Although diet is recognized as a large contributor to the onset and etiology of noncommunicable diseases, its valid and reliable measurement in clinical and epidemiological studies remains a challenge, mainly because of its reliance on self-reported information and the lack of accessible tools to collect good quality information. Conventional dietary assessment tools are either: of good scientific quality but involve high implementation costs (24-h recall) and substantial commitment from the participants (dietary records); or are easily implemented but lack accuracy and precision (food frequency questionnaires) [1].
Digital measurement devices can help overcome the limitations of conventional dietary assessment tools and provide a cost-effective way to simplify and scale up nutritional data collection. Such devices were shown to increase user acceptance while providing valuable real-time food intake data [2][3][4], and have the potential to eliminate participant burden linked to portion size estimation [5]. Digital image capture of foods is further facilitated by the distribution of mobile phones and the population's familiarity with this technology. In 2017, more than 325,000 health mobile applications were available via major app stores all over the world [6]. The majority of available mobile dieting apps are designed to support behavioral changes and either have not been validated for use in research or lack accuracy [7][8][9][10]. Although digital dietary assessment tools employed in research overcome these limitations, they often rely on more cumbersome participation from users; for example, necessitating specific experimental settings [11], requiring participants to wear impractical gear (e.g., chest-worn camera) [12][13][14] or manually select dishes and estimate portion sizes [2,[15][16][17]. The accuracy of these tools relies on the comprehensiveness and quality of their underlying nutrition databases, whose continuous update is a challenge [3,10]. The relevance of emerging digital dietary assessment tools for epidemiological research is also made difficult by the scarcity of information relative to the tools' development process, the large variation in intake calculations and the differences in methodologies employed amongst validation studies [10]. Validation studies rarely assess all the different stages of dietary recognition: (1) segmentation (i.e., the ability of a tool to recognize where the different edible components of an image find themselves); (2) classification (i.e., the ability to correctly identify what the content of each segment is); (3) portion size estimation and (4) energy and macronutrient calculation. Whereas some researchers investigating digital tools mainly focus on the segmentation and classification of food components-as is the case for DietCam [18]-other validation studies-e.g., for e-Ca [19] or mFR [20]concentrate on determining weight error and related energy and macronutrient intakes. In most validation studies-amongst which the performance analysis of the digital dietary assessment tools Keenoa [9], PIQNIQ [2], EaT [21] and Bridge2U [22]-the accuracy of energy and macronutrient content is the sole endpoint investigated. One exception is Snap-n-Eat, a mobile phone recognition system able to perform automatic segmentation and classification of foods and allowing subsequent weight estimation and energy and macronutrient content calculation, but this app has not yet been validated [23].
The aforementioned tools often present low to moderate levels of accuracy [4,24], showing wide limits of agreement compared to established dietary methods [9,21,22]. Newly developed digital dietary assessment tools need to be compared against a reference method (e.g., weighted food diaries over several days) and assessed among participants with similar dietary habits as the population of interest, which is not systematically the case.
In this context, we aimed to assess the accuracy of the automated dietary assessment device MyFoodRepo, by investigating its capability to identify, classify, and estimate portion sizes and determine the macronutrient content of diet against reference values from weighted food diaries.

Materials and Methods
MyFoodRepo (MFR) is a mobile application developed by the team of Prof. Marcel Salathé (Digital Epidemiology Lab-École Polytechnique Fédérale de Lausanne), which can be used to track food consumption from pictures of meals and beverages or from scanned food products' barcodes [25]. The app does not require any fiduciary marker for image recognition and its algorithm, based on thousands of images (May 2021), uses artificial intelligence for image content analysis. The system incorporates an annotation interface, which allows textual conversation between MFR app users and a human reviewer from the MFR app developers' team.
Three researchers conducted the present validation study, using the photography and barcode scanning features of MFR mobile app to record foods and beverages. MFR was evaluated on four different criteria: (1) segmentation (the ability to accurately differentiate the distinct edible components of a record, for example discerning the presence of a beigecolored segment from a red-colored segment in a plate while leaving aside the background and cutlery); (2) classification,(the ability to correctly identify the content of each detected segment, for example, identifying that the beige-colored segment is pasta and that the red-colored segment is Bolognese sauce); (3) portion size estimation (the accuracy of weight estimates for each detected and exactly classified segment); (4) overall performance (the accuracy and agreement of energy and macronutrient estimates compared to weighted food diaries).

Data Collection
Data collection extended from September to December 2019. We aimed at gathering a minimum of 180 records-1 record defined as either 1 photograph or 1 scan entered into MFR-distributed as such: 60 composite foods, made up of ≥3 segments; 60 simple foods, made up of 1-2 segment(s); 30 composite beverages, made up of ≥2 segments; 30 simple beverages, made up of 1 segment only. For foods, a segment may refer to a single ingredient (e.g., carrots) or mixed ingredients that form a unified item to be recognized by MFR (e.g., ratatouille). For beverages, a segment refers to a single ingredient (e.g., tea). Records were arbitrarily selected by the researchers and did not represent daily intake. Industrial processed foods with a barcode were directly scanned into the app.

Controlled Values Measured from the Weighted Food Diaries
To produce controlled values, we created, tested and optimized food diaries with the advice of a registered dietician. We entered each record into the food diaries. Weight and nutritional values from barcoded products were directly transcribed from their respective packaging and nutrition labels. Ingredients and complete segments from photographed records were carefully weighed and described. For cooked or mixed items, we noted the precise recipe.
Data from the weighted food diaries were analyzed by the dietician using the software PRODI 6.5 Swiss (Nutri-Science GmbH, Hausach, Germany) and food composition databases, resulting in nutritional values being retrieved from the Swiss Food Composition Database [26], the French Food Composition Database [27] and the German Nutrient Database [28] to obtain energy and macronutrient content data on all records.
We additionally classified all segments into 37 food types (Table S1). Segments made up of mixed ingredients were classified according to the ingredient with the highest calorific content (e.g., potato gratin classified into "Tubers"). Food types were further coded into 23 food groups and again into 7 food categories, corresponding to categories of the Swiss food pyramid [29].

Measurements Made by MFR
For each record entered into the weighted food diaries, a new picture or scan was saved into MFR. Records were processed by the MFR algorithm, and curated by the MFR app developers, who were able to ask for clarifications about the entered records via the built-in annotation interface of the app. To test MFR's ability to recognize and analyze food content, the researchers were instructed not to leave any spontaneous descriptions about the content of the pictured foods in the app's annotation interface. However, researchers were allowed to answer questions posed by the MFR app developers (e.g., "Did you put sugar in the tea?"; "Is it beef or veal in the picture?"). All MFR app developers performed their tasks while being blind to the study, since they did not know if they had interactions with researchers from this validation study or participants from other ongoing studies.
MFR draws the nutritional values of food and beverages from the Swiss Food Composition Database [26] and the French Food Composition Table, Ciqual [27]. Nutritional values of barcode scanned records are extracted from MFR's community-driven associated database, Open Food Repo [30], whose members can add and correct information from nutrition labels.
The researchers obtained the data extracted from MFR by the app development team on 17 February 2021 with details on date and time of collection, name, weight and/or volume, energy and macronutrient content of each detected segment contained in the records. The researchers consequently listed all segments identified by the MFR app, and classified them into food types, groups and categories.

Segmentation
Segments (i.e., different components of a record) correctly identified by MFR were coded as found segments (F); overlooked segments were coded as omissions (O); additional segments erroneously identified by MFR and not actually present on a record were coded as intrusions (I). The segmentation percentage of accuracy was then calculated by the number of found, omitted and intruded segments, respectively, over the total number of original segments in the weighted food diaries.

Classification
MFR naming of each found segment was compared to the corresponding true segment designation. MFR's classification performance was assessed by describing each match as exact match, close match, far match or mismatch according to the following criteria: exact match (E): MFR segment belongs to the correct food type and/or is labeled after a product meaning the exact same thing (e.g., cherry tomatoes vs. tomatoes); close match (C): MFR segment belongs to the correct food type but its naming is either too generic, not specific enough, refers to a slightly different product (e.g., beef vs. veal) or contains an overlooked ingredient within (e.g., tea with added sugar vs. tea); far match (F): MFR segment belongs to the wrong food type but to the correct food group (e.g., pasta vs. rice); mismatch (M): MFR segment belongs to the wrong food type and the wrong food group (e.g., carrot vs. potato).
The classification accuracy was calculated as the percentage of exact, close, far and mismatches among the total number of found segments. Proportion differences between MFR and controlled values were tested using Fisher's exact test.
To evaluate MFR categorization performance at different levels of granularity, it was assessed by main food categories, food groups and food types using the Cohen kappa as a reliability indicator and the uniform kappa coefficient as an agreement indicator, specificity and sensitivity calculation. As inter-rater agreement between two researchers judging MFR segmentation and classification was high for 30 random records (uniform kappa Ku = 1 [95% confidence interval: 1;1] and Ku = 0.744 [0.607;0.843], respectively [31]), one researcher could proceed with coding the remaining records.

Portion Size Estimation
The weight error (difference between MFR weight estimation and true weight) was determined for each exactly classified segment. Mean weight, mean error, and mean absolute error were then calculated per food type, food group and food category and mean errors plotted into boxplots. The mean differences between true and estimated values were assessed with paired t-tests. Two-sided p-values ≤ 0.05 were considered significant. Finally, the accuracy of portion size estimates was assessed dividing the mean estimated weight by the mean true weight for each food type, group or category.

Overall Performance for Energy and Macronutrient Content
To assess the agreement between the two methods, linear regression analysis was performed for energy and macronutrient estimates (fat, carbohydrates, proteins, fiber, alcohol). Linear regression was preferred (with MFR measurement as the dependent variable and weighted food diaries' measurements as the independent variable) to the commonly used Bland-Altman method, as the latter was shown to provide biased results when one of the two measurement methods has negligible measurement error [32]. Therefore, under the assumption that weighted food diaries correspond to an unbiased gold standard with negligible measurement error, we estimated the differential and proportional bias from the app, by regressing MFR measurements as a function of the controlled values. The 95% limits of agreement were then calculated by modeling the measurements' heteroscedasticity from the app [33]. To allow for comparison and give a general idea of the MFR data dispersion, we additionally calculated the coefficient of variation of MFR at different controlled values (25th percentile, median, 75th percentile) as well as the mean coefficient of variations (Cυ) for energy and macronutrients.

Results
In total, 189 records were collected (63 composite foods, 63 simple foods, 30 composite beverages, 33 simple beverages). Among all records, 174 (92%) were recorded by photography, while 15 (8%) were barcode scanned. For practical reasons, only simple foods and simple beverages benefited from the scan feature in this study. Clarifications were demanded by the MFR app developers via the annotation interface in 43% (n = 81) of all cases but exclusively for dishes and beverages recorded by photography ( Figure S1).
Four probable weight transcription errors resulting in unrealistic entries in the food diaries were considered as outliers and removed from the portion size and overall performance analyses.
Scanned records showed perfect classification accuracy, with 100% of segments classified as an exact match.
The best results were found in records where clarifications had been demanded by the MFR app developers via the built-in annotation interface, compared to records where no clarification had been asked, with a slightly higher percentage of exact and close matches (98.4% ± 1.8 vs. 93.0% ± 4.0) and a lower percentage of far matches and mismatches (1.6% ± 1.8 vs. 7.0% ± 4.0) (p = 0.01339).
Globally, classification reliability and agreement of MFR compared to controlled values were nevertheless high in all three levels of classification granularity: food categories (0.963), food groups (0.9554) and food types (0.9559) ( Table 4). Table 4. Global classification reliability (Cohen kappa) and agreement (uniform kappa) between MyFoodRepo and controlled values from weighted food diaries. Cohen's kappa, uniform kappa, sensitivity and specificity by food types and food categories can be found in the Supplementary Materials (Tables S2 and S3).

Portion Size Estimation
The mean true weight of all exactly classified segments (n = 302) was 116.8 g ± 92.0, whereas mean estimated weight was 114.4 g ± 83.0 (p = 0.424), with a mean error of −2.4 g ± 51.8. Nevertheless, mean absolute error was 32.8% and the range of percentage error fluctuated between −88.5% and 242.5% of true weight.
Among all 23 food groups, half presented a mean absolute error between 25% and 50%. "Milk and milk-based beverages", as well as "Non-alcoholic sweetened beverages" had a mean absolute error below 10%. On the other hand, "Fats & oils", "Sweeteners" and "Condiments & sauces" showed a mean absolute error over 50%. As depicted in Figure 1, MFR significantly overestimated weight for "Meat & Poultry" (p = 0.0001), "Fish & Seafood" (p = 0.004), "Eggs & meat substitutes" (p = 0.027) and "Potatoes, legumes and Beans" (p = 0.023). Colored boxplots indicate significant mean differences between estimated and true values (two-sided p-value ≤ 0.05). Four weight transcription errors resulting from unrealistic entries in the food diaries were removed from portion size analysis (not shown). * Only one observation in the "milk substitutes" food group. No significant differences were observed between estimated and true mean weight at the food category level.
MFR portion size estimation performance calculated by food types, food groups and food categories can be accessed in the Supplementary Materials (Tables S4-S6).

Overall Performance for Energy and Macronutrient Content
The overall performance analysis included all 185 records. The linear regression performed on MFR measurements versus controlled values from the food diaries show an overestimation tendency by the app at small true values of energy and macronutrients and an underestimation tendency at higher true values (Figure 2). The y intercept was 113.3 kcal for energy, 5.7 g for fat, 20.4 g for carbohydrates, 1.8 g for fibers, and 0.6 g for alcohol.
Only the linear regression line for protein content fell above the 1:1 line, indicating a systematic overestimation of proteins by MFR.
For alcohol, note that the lower confidence line crossed the zero line. This happens because the 95% limits of agreement were built based on the Wald method with no transformation, indicating that the variance of the measurement errors is very large, in turn showing that the agreement is extremely poor.
The coefficients of variation (Cυ) and mean coefficients of variation (Cυ) for energy and macronutrients of all records also suggest important levels of dispersion from MFR estimates ( Table 5). As per the coefficients of variation (Cυ) calculated at the 25th percentile, median and 75th percentile of true values, MFR's accuracy increased with increasing true energy and macronutrient content, meaning that the dispersion was higher for small quantities. The Cυ at the 25th percentile were particularly high for protein (1.96), alcohol (1.70), fat (1.68) and fibers (1.47). Fibers and alcohol had the highest Cυ, whereas carbohydrates had the lowest Cυ. The mean coefficients of variation were higher for beverages compared to food records, except for alcohol and to a lesser extent, carbohydrates ( Table 6). The higher alcohol Cυ in foods comes from the unaccounted alcohol content in some sauce recipes, showing MFR's inability to identify sauce composition. Beverages Cυ negatively affected Cυ of all records, especially for fat and fibers.
Linear regression figures for food and beverages separately can be found in the Supplementary Materials (Figures S2 and S3).

Discussion
The purpose of this study was to assess the accuracy of the smartphone application MFR against weighted food diaries, which are currently considered the gold standard for dietary assessment. To our knowledge, this is the first study validating an automated digital dietary assessment tool by distinctly examining its different stages of food and beverage recognition, namely segmentation, classification, portion size estimation and energy and macronutrient content calculation.
Compared to most of its digital dietary assessment counterparts used in research, MFR requires minimal user input to record diet. While other digital dietary assessment tools require preliminary groundwork-for instance necessitating specific experimental settings with a fixed background [11]-others involve a fiduciary marker to be placed on the image [15][16][17] or for the users to delineate segments themselves in their specific tool [34,35]. Despite only relying on a smartphone camera, MFR showed strong segmentation capacity, identifying 98.0% of all segments present. This did not only include visible items, as for similar technologies [23], but also blended or mixed segments. Additionally, segmentation accuracy did not significantly decrease for complex records, unlike observations made by the automated dietary tool, DietCam [5,18].
MFR also performed generally well to classify found segments into food types, food groups and food categories. Scanned records showed perfect segmentation and classification results, bypassing the well-reported reduction in accuracy and reliability of dietary assessment tools associated with non-exhaustive databases [3,10]. Scanned items in MFR are indeed directly associated with the Open Food Repo database [30], which currently gathers more than 370,000 barcoded products sold in Switzerland. The database is openaccess and user-enriched, ensuring its continuous update and alignment with population dietary habits. Uniform kappas over 0.958 indicated a good classification agreement between methods, at all levels of granularity. Percentages of exact matches exceeded 90% for composite foods, simple foods, and simple beverages, but only reached 41.9% for composite beverages, which could partially explain the large coefficients of dispersion for energy and macronutrients among beverages. "Alcoholic beverages", "non-alcoholic sweetened beverages" and "non-sweetened non-alcoholic beverages" were classified interchangeably and additions of milk and/or sugar in tea and coffee were often overlooked.
Unlike MFR, many digital dietary assessment tools used in studies rely on user participation for portion size estimation, either via a portion size selector [2,19] or a complementary portion size booklet [22]. Portion size estimation relying on the capture of a single image was proven to reduce user burden as automated estimations are not affected by the user's lack of knowledge about quantities [36,37]. Photography can also decrease data collection time and participant's disturbances in complex settings such as school cafeterias [11], and facilitate study implementation in environments with lower health and nutrition literacy or language barriers [38,39]. MFR would nonetheless benefit from more precise estimations of portion sizes. Although the global mean portion size error was −2.4 g ± 51.8 or 9.2% ± 48.1%, the error range produced by individual errors varying between −88.5% and +242.5% of true weight was wider than observed in the existing literature. In comparison, the electronic mobile-based food record e-Ca showed a mean error of 3% with errors ranging between −38% and +130% of true weights across 20 food and beverages displayed in a controlled setting [19], whereas the MFR app, developed by Lee et al., found a minimum error of −38% and a maximum error of 26% between automatically determined portion weights and control weights of 19 individual foods [20,24]. Nonetheless, the relatively small number of participants and items assessed in the aforementioned studies reduce the likelihood of extreme errors compared to the present study.
MFR performance was particularly challenged by small or hidden ingredients within records. The greatest mean absolute weight errors were observed in the "Fats & oils", "Sweeteners" and "Condiments & sauces" food groups. In the linear regression, the 95% limits of agreement for alcohol extended into the negatives, likely reflecting the oversight of alcohol in sauces and the misclassification of alcoholic beverages by MFR. Imperceptible elements (e.g., sugar, oils and sauces) were indeed harder to classify by the app and showed weaker classification sensitivity compared to other food groups, an inevitable limitation of dietary data collection by photography [24]. The same conclusion can be extended to segmentation, where omissions and intrusions made by MFR mainly affected subsidiary food items, such as capers, sauces, or vinegar, as well as two additional segments blended in vegetable mixes.
The segmentation, classification and portion size estimation findings all influence the overall performance of MFR. We observed higher coefficients of variation Cυ for energy and macronutrient estimates when true quantities were small, with a tendency towards overestimation. After a certain threshold, MFR underestimated all macronutrients with the exception of proteins. MFR's overestimation tendency towards proteins could be exacerbated by the significant weight overestimation of segments of "Meat & Poultry", "Fish & Seafood" as well as "Eggs and Meat substitutes". While carbohydrate estimates of all records showed reasonable results (Cυ = 0.31), fiber and alcohol had the highest mean coefficient of dispersion globally (0.58 and 1.25, respectively), especially in the case of beverages. Overall, linear regression analysis showed wide limits of agreements between MFR and weight record control method for the energy, fat, carbohydrates, proteins, fiber and alcohol content of all records. Wide limits of agreement between a novel method and a control method are commonly observed in similar studies, whose digital dietary assessment tools are often validated for a utilization at the group level [9,21,22].
Nonetheless, our methodology focused on MFR's performance as a dietary assessment device, with no consideration regarding true daily dietary intake and real-life conditions (i.e., study participants taking pictures of their food on selected days). This made the comparison with other digital tools difficult and restricted our analysis to a specific record's energy and macronutrient accuracy and precision. To avoid discriminating against MFR for erroneously classifying or forgetting segments, we assessed weight errors on exactly classified segments only, which constitutes another limitation of our work. Furthermore, the decision not to use the app's comment fields during data collection may have reduced the accuracy of MFR. Indeed, MFR users are normally able to provide spontaneous description or comments in these integrated annotation fields, but we intentionally ignored this tool in the present study, in order to test MFR's sole capability to identify and classify record content.

Conclusions and Recommendations
In light of the above, we would advise caution in the analysis of energy and macronutrient content for precise individual dietary assessment. Good agreement for portion size estimation between MFR and weighted food diaries, along with the app's strong segmentation and classification capabilities appears to be nonetheless suited for the identification of dietary patterns, eating habits and regime types.
Statistical recalibrations to adjust for measurement error could potentially be used to improve MFR's current estimations. Energy adjustments could also be applied to increase the overall accuracy of MFR. This analytic method, which helps mitigate the effects of measurement errors when data are collected via a self-reported dietary assessment tool, has been assessed and applied in similar validation studies and could constitute the subject of subsequent research, provided that total energy intake is assessed [22,40].
Currently, MFR's energy and macronutrient assessment is highly affected by imprecise portion size estimation. Improving portion size estimation capabilities would therefore prove valuable in strengthening the app's general performance. Combined with MFR's user-friendly recording interface, this would distinguish the app from other digital dietary assessment tools currently available for research purposes. Supported by a significant classification improvement with annotators' intervention, we would recommend MFR developers to focus on beverage content identification, to enhance MFR classification accuracy. The presence of alcohol, milk or sugar in beverages should be of particular focus and could be flagged by systematically asking participants for the content of their beverages. This is, for instance, applied in the mobile device food record mpFR, which allows users to rectify misclassified segments before confirmation of intake [41,42]. MFR already features an optional field for remarks which is visible during record entry. It would be in the app user's best interest to benefit from systematic prompts to ensure a more accurate classification of composite beverages. The same recommendation could be made for sauces and condiments.
These adaptations could be put to the test in a subsequent study, further investigating MFR use in real-life settings with the measurement of daily dietary intake from study participants. In such conditions, and in order to fully compare MFR performance and practical implementation in epidemiological studies over traditional dietary assessment methods, researchers should assess the relevance of participants' notes, potential prompts or the use of a fiduciary marker on the pictures for portion size estimation and energy and macronutrient calculation. Tradeoffs in terms of time, cost-efficiency and practicability should nevertheless be considered to avoid increasing user burden.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/nu14030635/s1, Table S1: Description of food types and corresponding food groups and food categories, Figure S1: Characteristics of records collected, Table S2: Cohen kappa, uniform kappa, sensitivity and specificity of MFR classification compared to controlled values from weighted food diaries, by food types, Table S3: Cohen kappa, uniform kappa, sensitivity and specificity of MFR classification compared to controlled values from weighted food diaries, by food categories, Table S4: MFR portion size estimation performance: mean estimated weights versus true weight, mean error and mean absolute error of all exactly classified segments, displayed by food types, Table S5: MFR portion size estimation performance: mean estimated weights versus true weights, mean error and mean absolute error of all exactly classified segments, displayed by food groups, Table S6: MFR portion size estimation performance: mean estimated weights versus true weights, mean error and mean absolute error of all exactly classified segments, displayed by food categories, Figure S2

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to intellectual property reasons.

Conflicts of Interest:
The authors declare no conflict of interest.