Precision Nutrient Management Using Artiﬁcial Intelligence Based on Digital Data Collection Framework

: (1) Background: Nutritional intake is fundamental to human growth and health, and the intake of different types of nutrients and micronutrients can affect health. The content of the diet affects the occurrence of disease, with the incidence of many diseases increasing each year while the age group at which they occur is gradually decreasing. (2) Methods: An artiﬁcial intelligence model for precision nutritional analysis allows the user to enter the name and serving size of a dish to assess a total of 24 nutrients. A total of two AI models, including semantic and nutritional analysis models, were integrated into the Precision Nutritional Analysis. A total of ﬁve different algorithms were used to identify the most similar recipes and to determine differences in text using cosine similarity. (3) Results: This study developed two models to form a precision nutrient analysis model. The 2013–2016 Taiwan National Nutrition Health Status Change Survey (NNHS) was used for model veriﬁcation. The model’s accuracy was determined by comparing the results of the model with the NNHS. The results show that the AI model has very little error and can signiﬁcantly improve the efﬁciency of the analysis. (4) Conclusions: This study proposed an Intelligence Precision Nutrient Analysis Model based on a digital data collection framework, where the nutrient intake was analyzed by entering dietary recall data. The AI model can be used as a reference for nutrition surveys and personal nutrition analysis.


Introduction
Nutritional intake is the basis for human growth and health, and the intake of different types of nutrients and micronutrients can affect health. Most diseases are inextricably linked to diet. Diabetes, cardiovascular diseases (hypertension, hyperlipidemia), gout, peptic ulcers, and gastroenteritis are all diet-related diseases that are increasing in prevalence every year, while the age group of those suffering from these diseases is gradually decreasing. The development of the Internet has made it possible to conduct online nutrition surveys through large-scale food and nutrition databases linked to automated dietary records, and there are now a growing number of software, platforms, and applications for nutrition surveys [1].
The most common technologies used for dietary recording are web-based or online tools, mobile apps, camera-based image analysis tools, wearable sensors, etc., while traditional methods rely on the use of Food Frequency Questionnaires (FFQs) or 24 h dietary recording methods. However, past techniques have suffered from a lack of accuracy in recording, as recall methods may not accurately record the food consumed or have difficulty estimating portion sizes or limited food ingredient lists [2].
The coding and translation of food records from nutrition surveys into nutrient analyses are labor-intensive and time-consuming, meaning that it is more difficult to collect detailed information regarding food intake in large scale population studies. Such studies rely on answers to food frequency questionnaires, and the accuracy of this data is dependent on the expertise of the interviewer compared to other self-reported measures [3,4].
Innovative technological tools have evolved with the development of various IT technologies, including natural language analysis of text, speech analysis, and image processing. The popularity of smartphones, tablets, and computers has increased the acceptance of using IT for nutritional intake assessments [5][6][7][8].
This study develops an artificial intelligence model for a precision nutrient analysis, which allows users to enter the name of a dish and serving size to assess a total of 24 nutrients. The recipes can be modified by the user, which allows the model to be used in all countries and all contexts, thus improving interoperability and accuracy of the analysis.

Related Works
The Food Record, the 24 h dietary recall (24HR), and the Food Frequency Questionnaires (FFQs) are three common methods of collecting nutritional data. The Food Record is a comprehensive record of all foods, beverages, and nutritional supplements consumed by the respondent over a specified period of time. Usually, 3-4 days of intake are recorded, as the quality (accuracy) of the record is reduced due to the burden of recording too many days. Ideally, dietary intake should be weighed and measured; however, most respondents only recorded pre and postestimates of intake, which would lead to differences in weight judgments [9].
The 24HR method assesses the nutritional intake of a respondent over the past 24 h. Ideally, the survey collects information on nutritional intake over multiple 24 h periods on nonconsecutive, random dates. The 24HR method is usually conducted by a dedicated interviewer by telephone or in person [9]. Some 24HR surveys can also be self-recorded or collected online (e.g., Automated Self-Administered 24 h dietary recall and ASA24 [10]). The differences between the ASA24 and 24HR methods primarily reduce interviewer burden and interview costs and allow respondents to answer questions at their own pace; however, this method may not be suitable for all study populations.
The use of exploratory questions in the 24HR recall method facilitates easy response and has been shown to improve the accuracy of data collection [11]. The survey includes how the food was prepared, what was added after preparation (seasonings, creams, and spices), and when the meal was served [9]. The FFQ assesses general nutritional intake over a specific period of time, usually a longer period, and asks how often a person consumes food. The FFQ method is a more cost-effective alternative to the 24HR method because respondents can complete the survey themselves, and it can be used for large sample studies [12].
There are several types of systematic measures of self-reported dietary information; for example, based on general perceptions, most respondents tend to report foods that are perceived as healthy and to report less on less healthy foods. However, differences in susceptibility to this tendency between groups of respondents can lead to additional personal bias. Differences in the ability to self-assess and recall portions can also lead to individual subjective differences. This systematic error is unpredictable, but studies suggest that it may be related to factors such as age and gender [13]. While each person uses different strategies to recall portion sizes, including taking photographs and using measurement aids to estimate (e.g., food models) [14,15], research shows that training can lead to a more accurate assessment of food portions [16,17]. In addition, researchers or the methods used to collect dietary data may also be biased [18].
Finally, the accuracy of the conversion of nutrient totals from nutrition dietary records depends on the accuracy and availability of the food ingredient database for conversion to calories and nutrients. In summary, both types of errors reduce the judgement of the relationship between diet and health, as well as the accuracy of the statistical analysis. However, while there may be some slight deviations in the database of the relationships tested [19], when the results of significant analyses are properly evaluated, valid conclusions can be drawn.

Materials and Methods
This study developed an AI model based on semantic text to analyze the nutritional ingredients of a nutrient, and a digital data semantic analysis model was designed to determine the names and servings of the dishes consumed. The AI model is based on the ingredients of common Taiwanese recipes and automatically calculates the nutrient intake. The model structure consists of a digital data semantic analysis model, an AI precision nutrient analysis model, a database of 1590 recipes, and 7869 ingredients from common Taiwanese recipe databases, and the model structure is shown in Figure 1. The nutrition information of the ingredients was obtained from the public data of the Health Promotion Administration, Ministry of Health and Welfare Taiwan (HPA, MoHW).

Artificial Intelligence Semantic Analysis Model
Data were intercepted and annotated after data entry, and a CKIP pretraining model was used to interpret Chinese words. After completion, lexical annotation and entity identification were performed. Finally, the nouns (dish names) were converted into vector structures using word2vec, which is an application of Natural Language Processing proposed by Tomas Mikolov et al. at Google in 2013 and is one of the most significant advances in the field of machine learning in recent years. Word2vec is an application framework that learns large amounts of textual data and transforms words into mathematical vectors to discriminate their semantic meanings by embedding words into a two-dimensional space in order that words with similar semantic meanings can be closer together.
This study used the continuous bag-of-words (CBOW) method, which aims to determine the lexical properties of the input words using a whole paragraph of context and to determine the relationship between similar words by concatenating them. As similar words are clustered together, the direction of the vector corresponds to the relative relationship.

Artificial Intelligence Nutritional Analysis Model
The Nutritional Analysis Model is divided into three steps.
Step 1 conducts artificial intelligence analysis to determine the most similar recipes. Due to the multicharacter nature of Chinese, single algorithm of semantic analysis may not be precise enough. Therefore, a variety of algorithms were used for the analysis. The AI model is composed of five different algorithms, including 1. Okapi BM25, 2. TF-IDF, 3. Levenshtein, 4. Jaccard, and 5. Synonyms. The algorithm also uses cosine similarity to determine differences in text and then compares it with a database to obtain food information and portion sizes for recipes and ingredient judgement.
Step 2 is to determine the best solution by the common voting mechanism.
Step 3 is nutritional ingredient calculation.  [20][21][22]. As a probabilistic search framework, BM25 is still widely regarded as one of the most advanced ranking algorithms. BM25 is a bag-of-words model, which ranks a set of documents based on their similarity to each other and obtains a set of scores that can be compared with each other.
The BM25 similarity formula is shown in Equation (1).
: Frequency of the term q i in Document D 0 • |D|: Length of Document D (in words). • K 1 : The terminology described above is saturated with parameters.
The length normalization parameters, as described above. (2) Term Frequency-Inverse Document Frequency (TF-IDF) This algorithm is a weighting technique widely used in information retrieval and text mining, and the combination of TF and IDF was first discussed by Karen Spärck Jones [23]. The TF-IDF was used to assess the importance of a word in a document, which increased positively with the number of times the word appears in the document but decreased inversely with the frequency of its occurrence. The TF-IDF formula is shown in Equation (2), while the Inverse Document Frequency IDF Formula is shown in Equation (3).
Equation (2) Equation (3)  The result of the calculation is obtained by quoting the logarithm of the number of documents with a base of 10.
(3) Levenshtein The Russian scientist Vladimir Levenshtein first proposed this algorithm in 1965 [24]. The basic form of Levenshtein is carried out using a regressive algorithm, where a threshold can be set as an upper limit for the number of steps to be moved. The Levenshtein distance formula is shown in Equation (4).

Equation (4) Levenshtein Distance Formula
(4) Jaccard The intersection and union of the two samples can be used to derive the Jaccard similarity coefficient and Jaccard distance for different applications [25]. Jaccard's coefficient gives the degree of similarity and the ratio between the size of the intersection of two sets and the size of the union in a finite set of samples. The Jaccard index formula is shown in Equation (5).
Synonyms is an open-source package for natural language tasks in Python and maintained by Chatopera. It provides a variety of NLP tasks, such as text alignment, recommendation algorithms, similarity calculation, semantic shifting, keyword extraction, concept extraction, automatic summarization, and search engines with a multisource lexical database for predata use. Regarding the word vector conversion task, the suite uses Google's gensim suite with a word2vec model for conversion and the vector distance of words with a smooth gradient descent algorithm for approximation [26].

Step 2. Common Voting Mechanism
In this study, the same approximation task was assigned to the abovementioned five different algorithms, and after obtaining the best dish selection results for each algorithm, the highest vote was tallied as the best solution by pooling. The confidence scores of the algorithms were not equally comparable among the different algorithms (Levenshtein distance does not have a confidence score, but a minimum step), as the meanings of the confidence scores of the algorithms are limited to intragroup comparisons. For this reason, instead of using the average of the sum of similar scores for the same project, the highest score of each algorithm was used for vote recognition, and in the final vote counting process, the votes for each algorithm were equal, which rendered it a fair majority vote decision.

Step 3. Nutritional Ingredient Analysis
The recipe data were obtained through a fuzzy analysis of the artificial intelligence model, and the nutritional ingredient analysis automatically determined all the ingredients in the dish. Finally, this study consolidated all the nutrients by means of portion calculation to complete the nutrient analysis. The dietary information conversion process is shown in Figure 2.

Results
This study developed two models to form a precision nutrient analysis model. The first model is a Digitized Data Semantic Analysis Model for dish analysis and portion size determination. The second model is a Nutrient Analysis Model that uses five different algorithms to find precision recipes, which conducts analyses of dish ingredients and nutrients using a common voting process, and the final outputs from both models calculate the intake of 24 common nutrients. The operational framework of the model is illustrated below. The recipe database contains 1590 recipes and nutrient information for 7869 ingredients. The model operating framework is shown in Figure 3.

Operation Example
An example of a dietary recall record for precise nutritional analysis is as follows: 1.
Input the dietary record to the model. Dietary Record: "Today I had a plate of cabbage with pork fat and a bowl of bamboo shoots and pork ribs soup." 2.

3.
Nutrition intake calculation by the Precision Nutrient Analysis Model.
(1) In In Step 2: 24 nutrients were calculated for each ingredient, and the precision nutrient analysis results were calculated based on the sum of all nutrients.

Model Accuracy Verification
The accuracy of the model was analyzed using data from the Nutrition Survey. In this study, the 2013-2016 National Nutrition Health Status Change Survey (NNHS) was used for analysis. The NNHS was initiated by the HPA MoHW and conducted in a four-year cycle and considered county and city distribution, as well as seasonal effects. The collected data were used as a reference for the formulation of national nutrition and health-related policies in Taiwan.
The aim of the survey is to understand the nutrition, health, diet, and lifestyle of the Taiwanese people and their relevance, in order to establish a long-term, stable, and nationally representative nutrition and health surveillance mechanism. The results can be used as a basis for government policies regarding diet and nutrition and health promotion and disease prevention and can help improve the health status of the population and prevent possible future health problems.
The NNHS uses a multistage stratified cluster sampling design, with the sample group being the entire age cohort, excluding pregnant and breastfeeding women, people without self-awareness, and institutional care residents, and the overall sample is representative of the Taiwanese population. The nutrition data were stored in a 24 h dietary memory record and analyzed by a professional nutritionist.

Data Resource
The "2013-2016 National Survey of Changes in Nutritional Health Status" was used to validate the accuracy of the model. The data contain a "24-h dietary recall nutrient intake sum analysis file" and a "24-h dietary recall food weight and nutrient ingredient file" with the information of 24

Validation Process
(1) Inputting data from the "102-105 National Survey of Changes in Nutritional Health Status" into a digitized data semantic analysis model; (2) Model analysis of dishes, portion sizes, and the ingredients in the dishes; (3) Analysis of nutrient intake using the AI Precision Nutrient Analysis Model; (4) Analyze the results against the "24-h dietary recall nutrient intake sum analysis file" and the "24-h dietary recall food weight and nutrient ingredient file"; (5) Compare the accuracy of the model.

Analysis Result
The results of the nutrition survey team analysis (from the 24 h dietary recall nutrient intake sum analysis file) were used as the gold standard, while the results of this study model analysis were used as the control group for the nutrient difference ratio analysis. The discrepancy comparison tables of the NNHS analysis with the results of this study in the 24 h dietary recall nutrient intake sum analysis are shown in Tables 1-3. A total of 2602 data entries were analyzed for total nutrient intake, with 24 different nutrients analyzed for each data item. The differences between the results of this study and the results of the nutrition survey are shown in Tables 1-3. While 13 nutrients had a total of more than 95% (2472 data entries) of the data with an intake error of <5%, 3 nutrients had a total of 90-94% of the data with an intake error of <5%; 5 nutrients had a total of 89.99-80% of the data with an intake error of <5%; Vitamin E alpha TE had a total of more than 95% (2472 data entries) with an intake error of <10%; Sugars Total and VitaminD2D3 had a 70% data error <10%.
The results of the nutrition survey team analysis (from 24 h dietary recall food weight and nutrient ingredient file) were used as the gold standard, while the results of this study model analysis were used as the control group for a nutrient difference ratio analysis. The discrepancy comparison table of the NNHS analysis with the results of this study in the 24 h dietary recall food weight and nutrient ingredients are shown in Tables 4-6. A total of 113,824 data entries were analyzed for food weight and nutrient ingredients, with 24 different nutrients analyzed for each data item. The differences between the results of this study and the results of the nutrition survey are shown in Tables 4-6.     While 3 nutrients had a total of more than 95% of the data with an intake error of <2%, 9 nutrients had a total of 90-94% of the data with an intake error of <2%; 12 nutrients had a total of 89.99-80% of the data with an intake error of <2%.

Discussion
Each 24 h dietary recall nutrition survey in this study took approximately 40 min. The volume and complexity of the survey data and the variation in the ability to self-assess and recall portions can lead to individual subjective differences [13]. Similarly, the researchers or the methods used to collect dietary data may be biased [18].
Therefore, this study balanced the accuracy of nutrient intake analysis by compensating for errors through fuzzy analysis and artificial intelligence. Conventional FFQs are primarily designed to assess total nutrient intake or changes in intake over time [27][28][29]; however, the FFQ limits the range of foods that can be investigated as it combines food and beverages thus determining the exact amount of nutrients is less precise than other more detailed methods. It is also not possible to accurately measure absolute intakes of different food components. Moreover, FFQs require literacy and the physical ability to complete the questionnaire, and the FFQ survey can be burdensome for subjects and difficult or confusing to complete due to poor descriptions or difficult-to-understand questions. The most commonly used methods in nutrition research are the Diet Record, 24HR, and FFQ.
The Food Record is also used as the gold standard in validation studies [30]. Given the contingent nature of the respondents' food choices, a variety of food and beverage combinations [31] and nutrient supplementation [32] are the best methods to investigate. In order to reduce the burden on surveyors, the artificial intelligence model in this study has proven to be a feasible strategy for large-scale nutritional surveys after data discrepancy comparisons.
When comparing the difference between our model and the data analyzed in the actual nutrition survey, it was found that the results of the "24-h dietary recall food weight and nutrient ingredient" method were highly accurate, with less than 2% discrepancy in analysis for almost all nutrients. This result shows that the nutrients of the ingredient data in our model are correct. In the 24 h dietary recall nutrient intake sum analysis, the model was used to conduct an artificial intelligence analysis of the dishes, meaning it conducted an automated analysis of the components and servings to estimate nutrient intake. The results show a margin of error of less than 10% thus confirming the high accuracy of the model in this study.

Conclusions
This study proposed an Intelligence Precision Nutrient Analysis Model based on a digital data collection framework, where the nutrient intake was analyzed by entering dietary recall data. The AI Precision Nutrient Analysis Model was used to analyze the ingredients of the dishes and calculate nutrient intake by automatically analyzing the dishes, and portion sizes were analyzed using a digital data semantic analysis model. The results of this study show very little difference in nutrient intake between the model and the NNHS analysis and are highly accurate; therefore, the AI model can be used as a reference for nutrition surveys and personal nutrition analysis. In terms of data access, as there is not yet a complete set of publicly available data on food nutrient ingredients; more complete data and references on micro-nutrients should be available in the future. On the other hand, the scope of recipes should be expanded.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.