Intake Differences between Subsequent 24-h Dietary Recalls Create Signiﬁcant Reporting Bias in Adults with Obesity

: In depth understanding of the dietary patterns of individuals with obesity is needed in practice and research, in order to support dietitians and physicians in the design and implementation of nutritional management. We aimed to analyze the consistency of energy, macro-, and micronutrient reported intakes in four non-consecutive 24-h dietary recalls from 388 adults with obesity using information collected in the NutriGen Study (ClinicalTrials.gov, NCT02837367). Signiﬁcant decreases were identiﬁed for reported energy and several, macro- and micronutrient intakes, between the ﬁrst and subsequent 24-h recalls. Signiﬁcant differences of reported intakes were identiﬁed in sensitivity analyses, suggesting that the ﬁrst recall (also the only one performed on site, face-to-face) might be a point of bias. A comparison of the differences in intakes between weekend and weekday, after adjustment for false discovery rate were non-statistically signiﬁcant either in male, females, or in total. To overcome this potential bias, studies should be carefully conducted, starting from the design phase, through to the analysis and interpretation phases of the study. Prior to averaging speciﬁc intakes across all sessions of reporting, a preliminary analysis must be conducted to identify if a certain time point had signiﬁcant differences from all other time points and overview potential sources of bias: reporting bias, training bias, or behavioral changes could be responsible for such differences.


Introduction
Twenty-four hour dietary recalls provide metrics of estimated food intake, necessary for studies that link nutrient intake to diseases or other health-related outcomes. The multiple pass 24-h recall method used for estimating food intake has been proven to be a reliable method for the estimation of nutritional intakes in individuals with obesity [1,2]. Usually, several 24-h recalls are needed, in order to provide sufficient time to capture intraindividual intake variations. The recommended number of 24-h recalls varies, depending on the outcome of the study, ranging from a minimum of two (for the comparison of protein and potassium intake between European countries [3]) to a maximum of 10-15 days when assessing comprehensive diets across a six month period [4]. Jackson et al.l [5] suggested a maximum of eight 24-h recalls in an overweight and obese population, in order to reduce random errors.
Several authors have underpinned that the simple averaging of several 24-h dietary intakes is not a suitable method to account for systematic errors in the design of the study. This method is useful only to account for random errors in this type of assessment. Therefore, through a carefully planned design, some sources of known bias on the measurement of intake such as gender, age, body mass index (BMI), day of the week, or rural or urban location can be accounted for [6,7].
Underreporting of food intakes has been acknowledged as a source of bias and has been associated with higher BMI or body fat percent, feminine gender, and social desirability [8][9][10]. Even when dietary restraint practices were related to lower energy and fat reporting, this did not modify the accuracy of the method [11]. Several studies support the idea that attentional bias toward food cues is increased in individuals with obesity [12]. Interventional studies have proven that this attentional bias in individuals with obesity can be altered in both directions [13,14].
Little is known about whether the order of the subsequent recalls could be a specific factor to account for when assessing bias, or whether the order of recalls would induce behavior changes through alterations of attentional bias to the interviewed individuals. The purpose of this study was to evaluate the consistency of reported energy, macro-, and micronutrient intakes in a series of four 24-h recalls in adults with obesity.

Recruitment of Subjects
Data represent the estimated food intake collected with a broader scope within the NutriGen Study (ClinicalTrials.gov NCT02837367), performed in Timisoara, Romania. For the current study, 197 men and 212 women were selected form the original cohort of 204 men and 217 women. The inclusion criteria for the NutriGen Study were: adults (age 18-70 years), with obesity (BMI ≥ 30 kg/m 2 ), abdominal circumference ≥84 cm in women and ≥90 cm in men, dyslipidemia with total serum cholesterol ≥200 mg/dL, HDLc ≤50 mg/dL in women and ≤40 mg/dL in men, serum triglycerides ≥150 mg/dL or treatment for dyslipidemia (e.g., statins, fibrates, omega 3 fatty acids, cholestyramine, ezetimibe) or for type 2 diabetes. Exclusion criteria consisted of the prior diagnosis of cancer, autoimmune disease, psychiatric, blood coagulation disorders, and history of drug and alcoholic abuse, as previously presented elsewhere [15]. The participants were under medical surveillance and treatment for these conditions at the time of recruitment, which took place from September 2016 to December 2018.

Dietary Intake Assessment
The personnel involved (interviewers) in the dietary assessment procedures received specialized training from one experienced person to ensure the accuracy of the collection and data entry. Trained physicians and medical students collected the estimates of dietary intakes using a 5-pass 24-h dietary recall method [1,2]. The assessment evaluated the food intakes for the previous day by assessing type/composition/brand, quantity (either in international units or estimated using usual household measurements), and hour of the day for each food or drink. This information was recorded on paper using a pre-established template and standard operating procedure. A total of 1587 24-h dietary recalls were collected, representing up to four days of food intake assessment for each participant. Except for the first 24-h recall, which was performed face-to-face during the baseline visit, the other 24-h recalls were performed at various intervals during the study over the telephone, without prior announcement. For the purpose of this study, 388 adults were included who had complete sets of four 24-h recalls. The median timeframe for the included recalls was 62 days, with an interquartile range (IQR) of 56.5 days. Of the four dietary recalls, three recalls were performed during weekdays (Monday to Friday), and one on a day of the weekend (Saturday or Sunday). Food intake estimates obtained from the recalls were converted into nutrient intakes using Nutritioapp (https://nutritioapp.com, accessed 22 September 2021) [16], a web-application using data from both the USDA Food and Nutrient Database for Dietary Studies, and from European and Romanian databases, as previously described [15]. Subjects were on ad libitum diets during the recording of all 24-h recalls.

Anthropometry, Biochemistry, and Diagnostic of Chronic Diseases
During the baseline visit, the weight, height and abdominal circumference were measured using international standards, and blood samples were collected in the morning, following overnight fasting of at least 8 h, in EDTA sterile vacutainers. Among other biochemical assessments, total cholesterol (TC) and triglycerides (Tg) were determined following the manufacturer's protocols, as previously described [17]. Total serum lipids (TSL) were estimated using Philips's formula TSL (g/l) = 2.27 × TC + Tg + 0.623 [18]. The diagnostic of associated chronic diseases had been previously established by specialist physicians and were recorded from medical documents.

Data Analysis
Data analysis was performed using the IBM-SPSS version 25 software (IBM, Armonk, NY, USA). The Kolmogorov-Smirnoff test was used to test for normal distribution. Variables were expressed as the mean and standard deviation or as medians and interquartile range. The Mann-Whitney test was used for comparing two level factors. Proportions of participants with associated chronic diseases were compared using the chi-square test. Proportions of weekend/weekdays days in each 24-h recall set were compared using chisquare, and statistical significance was adjusted using the Bonferroni method. For each participant, three 24-h recalls performed on weekdays were also averaged and further used in analysis. Average nutrient intakes from four ordered 24-h recalls and averaged nutrient intakes from three 24-h recalls (resulting from the exclusion of either the first, second, third, or fourth ordered 24-h recall) were also used in the analyses. Three different types of models for repeated design were used separately each nutrient category: energy, macro-, and micronutrient intakes in the following circumstances: (1) the four ordered 24-h recalls; (2) the weekend and mean weekday intakes; (3) the mean of four 24-h recalls compared to mean of three 24-h (resulting from the exclusion of the first ordered 24-h recall); (4) the mean of four 24-h recalls compared to the mean of three 24-h (resulting from the exclusion of the second ordered 24-h recall); and (5) the mean of four 24-h recalls compared to the mean of three 24-h (resulting from the exclusion of the third ordered 24-h recall); and (5) the mean of four 24-h recalls compared to the mean of three 24-h (resulting from the exclusion of the fourth ordered 24-h recall). The general linear model and Wilcoxon-signed rank test were used for comparisons.
For the general linear model, results of pairwise comparisons between the four ordered 24-h recalls were adjusted using the Sidak method. All results were also adjusted for false discovery rate (FDR) using an online tool (https://tools.carbocation.com/FDR, accessed 15 March 2020) [19], which adjusted the p-values to the number of comparisons per research hypothesis. After the adjustments, the new p-values < 0.05 were considered statistically significant. For size effect interpretation, r was calculated and interpreted according to Cohen's criteria: for r = 0.10-0.30, there was a small effect size; for r = 0.30-0.50, there was a medium effect size; and for r > 0.50, there was a large effect size [20].

Results
The baseline measurements are presented in Table 1 Table 2. For each variable, the differences between the ordered recalls were assessed for statistical significance using general linear models. Significant differences were identified between 24-h recalls, after FDR correction, for energy, carbohydrates, vitamin C, calcium, fiber, folates, potassium, and total sugar. The estimated intakes for vitamin C, folates, fiber, and total sugar were different between recalls 1 and 4, while all other differences identified were between recalls 1 and 2. Figure 1 illustrates the mean values for thee energy and macronutrients for each recall, while Figure 2 shows the mean values for the micronutrient intakes found to be significantly different between recalls. Sex was considered as a between-subjects factor. All interactions between sex and time factors were not statistically significant (p > 0.05). Table 2. Mean intakes of energy and of 36 macro-and micronutrients from 24-h dietary recalls (n = 388 individuals with obesity).  Table 2. Cont.  Means are calculated using a general linear model-repeated measures design with gender as between subjects' factor and Sidak adjustment. Values in bold are statistical significantly lower compared to the first evaluation after false discovery rate adjustment.

Model 2: Weekend vs. Weekday Intakes
We tested whether the presence of weekend days was balanced between ordered recalls, in order to identify a potential bias that could be associated with the differences previously identified. Weekend days represented 12.0% of the first ordered 24-h recall, 11.3% of the second ordered 24-h recall, 10.6% of the third ordered 24-h recall, and 66.0% of the fourth ordered 24-h recall. Except for the fourth ordered 24-h recall, which had a significantly higher proportion of weekend days compared to previous recalls (p > 0.0083), no significant differences were observed after Bonferroni correction (p < 0.0083 as significance threshold) between the first, second, or third recalls.
General linear models used the energy and nutrients from weekend intakes versus the mean intake values of the three weekdays. Significant interactions were observed between sex and type of day of the week, and therefore, the analysis presented in Table 3 was stratified by sex. Table 3 presents the means of energy and nutrients, with a 95% confidence interval of means. Differences in intakes between weekend and weekday for each gender were compared with the Wilcoxon-signed rank test and, after adjustment for FDR, were non-statistically significant either in male, females, or in total.

Model 2: Weekend vs. Weekday Intakes
We tested whether the presence of weekend days was balanced between ordered recalls, in order to identify a potential bias that could be associated with the differences previously identified. Weekend days represented 12.0% of the first ordered 24-hour recall,

Model 2: Weekend vs. Weekday Intakes
We tested whether the presence of weekend days was balanced between ordered recalls, in order to identify a potential bias that could be associated with the differences previously identified. Weekend days represented 12.0% of the first ordered 24-hour recall,    Wilcoxon signed rank test was used between weekend-weekday comparisons per each gender, using false discovery rate adjustment, ALA-alpha-linolenic acid, EPA-eicosapentaenoic acid, DHA-docosahexaenoic acid, LA-linoleic acid.

Model 3: Exclusion of the First 24-h Recall
In this model, the comparison was made between the mean of recalls with the first 24-h recall excluded, and the mean of all 24-h recall. Table 4 presents the effect size of the differences between the mean of recalls when the first 24-h recall was omitted and the mean of all 24-h recalls. Exclusion of the first 24-h recall determined an increase in the mean of several nutrients, statistically significant after FDR correction for fibers, folates, potassium, and a decrease in the mean of vitamin B12, EPA (eicosapentaenoic acid), DHA (docosahexaenoic acid), and ALA (alpha-linolenic acid) ( Table 4) after FDR correction. Table 4. Effect size and statistical significance of the mean differences between four 24-h recalls and three 24-h recalls (after exclusion of each of the four 24-h recall) (N = 388 individuals with obesity).  Wilcoxon signed rank test, r = size effect. Significant differences after false discovery rate correction are marked in bold; * negative differences; ** positive differences; ALA-alpha-linolenic acid, EPA-eicosapentaenoic acid, DHA-docosahexaenoic acid, LA-linoleic acid.

Model 4: Exclusion of the Second 24-h Recall
Because the second ordered 24-h recall was identified to have differences in reported intakes of energy and carbohydrates when compared to the first ordered 24-h recall, we sought to evaluate whether, by omitting the second ordered 24-h recall, significant changes could be identified when compared to the means from all four recalls. Table 4 indicates the size effect and statistical significance of the differences in mean intakes for the complete set of recalls and for the mean intakes of recalls from which the second 24-h recall was excluded. Results indicated an increase in the mean of 24-h recalls when second ordered recall was excluded, which was significant for energy, protein, fat, carbohydrates, vitamin C, vitamin D, iron, calcium, magnesium, total water, fiber, vitamin K, vitamin B12, folates, betaine, choline, copper, phosphor, manganese, potassium, sodium, EPA, DHA, total sugars, fatty acids total saturated, LA (linoleic acid), and ALA (Table 4).

Model 5: Exclusion of the Third 24-h Recall
In this model, the comparison was made between the mean of recalls, when the third 24-h recall was omitted, and the mean of all 24-h recall. Excluding the third 24-h recall from the total means determines an increase in the mean of several nutrients, statistically significant after FDR correction for vitamin C, vitamin K, betaine, EPA, and DHA (Table 4).

Model 6: Exclusion of the Fourth 24-h Recall
Excluding the fourth 24-h recall from the total means determined an increase in the means of several nutrients, statistically significant after FDR correction for vitamin C, vitamin D, vitamin A, iron, calcium, magnesium, fiber, vitamin K, thiamine, riboflavin, folates, vitamin B12, pantothenic acid, betaine, copper, fluor, phosphor, manganese, zinc, EPA, DHA, total sugars, vitamin E, and ALA (Table 4).

Misreporting in Food and Drink Consumption
This is the only study, to the best of our knowledge, which investigated the consistency of energy, macro-, and micronutrient estimated intakes between four non-consecutive 24-h recalls in Romanian adults with obesity. It is also one of the few studies worldwide to explore these consistencies. In discussing how these results should be interpreted, several aspects were considered. Misreporting in both directions (over-and under-reporting) has been previously reported from people with obesity, with underreporting being more prevalent [21]. Estimation of underreporting in 24-h recalls by using recovery biomarkers was considered between 12% to 23%, much lower than in other types of dietary assessment [22][23][24][25]. Random error in dietary assessment represents day-to-day variability due to different daily patterns of food intakes, and can be accounted by averaging intakes from several 24-h recalls [6,26,27]. Therefore, the decrease reported in the second and fourth 24-h recalls was tested for consistency by a sensitivity analysis presented through models 3-6 and in Table 4). Systematic error in dietary assessment includes measurements that are shifted in the same direction from the true value. It has been shown that the systematic errors influence 24-h recalls to a lesser extent, and can be accounted for by collecting and controlling for known potential confounders such as day of the week, gender, age, BMI status, etc. [6,26,27].

Controlling for Systematic Bias across Multiple Days
A decrease in intakes across multiple days of assessment has been reported by others using food diaries [28,29]; however, there are no studies that have reported the same changes in individuals with obesity using 24-h recalls. In our dataset, the energy and carbohydrate intakes (in which a reduction of reporting was observed after the first recall) and fat and protein intakes (in which similar intakes were observed) support the results found by Arab et al. in a series of eight days of evaluation [4,28] and by Whybrow [29], who reported that the energy decrease in four days of food diary was greater in individuals with obesity compared to lean individuals. The decreased reporting in a series of 24-h recalls could be explained by subjects becoming aware of their intakes in the process of declaring the intakes, also known as training bias (the "big brother" effect). However, the same findings could also be due to reporting fatigue, or could reflect genuine changes in eating habits, induced by the fact that subjects become more aware of their diets and the importance of healthy eating, influence that then subsides or diminishes during subsequent recalls (third and fourth).

Controlling for Systematic Bias across Weekday versus Weekend 24-h Dietary Recalls
The origin of differences reported in energy and carbohydrate intakes, specific to the second recall, and to the fourth recall, respectively, is not clear. Such differences could reflect true variation in intakes. However, the second to fourth 24-h recalls were performed for each individual on different days of the week, with no obvious reason to consider that such differences could arise from a different distribution between weekends and weekdays. Our analysis indicated that these differences could not be determined by the distribution of weekends in the first two recalls; however, the fourth recall had a significantly higher proportion of weekends than the other recalls. Furthermore, we did not identify differences in nutrient intakes between weekends and weekdays (Table 3). Similarly, studies performed in other countries reported that the day of the week had little impact on the variance of reported values [4,28]. Recently, Gibson et al. discussed many causes of misreporting and measurement errors in self-administered 24-h recalls [6]. In this study, efforts were made to minimize this bias by ensuring specialized training performed by one experienced tutor. Nevertheless, the first recall was performed during a face-to-face visit, while the others were performed via telephone. The validity and relative reliability of face-to-face versus telephone 24-h recalls were found to be similar in several older studies [30][31][32][33], but more recently, Brassard et al. [34] noticed, using a web based method for measuring intake versus face-to-face interviews, higher reported intakes in the web-based method. Several publications have presented higher levels in attentional bias from food cues in individuals with overweight and obesity compared to normal weight individuals. Most of the results are consistent with the possibility of the attrition of attentional bias through several types of interventions including the modified food Stroop task, visual dot-probe tasks, the attentional network task and one-back visual recognition tasks, and even passive picture presentation [12]. In our sample, after the enrolment visit and first reporting of the 24-h recall, a reduction in food cues endogenous activation, leading to a reduced intake of food after this timepoint, while an effect of unknown duration might be another explanation of our findings.
Although the weekend/weekday discrepancies had been considered in previous studies on children [35,36], young women [37], middle life women [38], or the general population [39], to the best of our knowledge, this assessment has not been previously conducted in people with obesity, when looking at the differences between the order of recalls. Several publications agreed that weekend diet was lower in quality for both men and women [40][41][42]. The trend observed in men (Table 3) was similar to other results already published, suggesting that weekend intakes were higher than weekday intakes, but not necessarily reaching statistical significance [38][39][40]. For females, no difference was observed between weekdays and weekends, a result that is in contrast to results published in other studies, possibly suggesting cultural differences [38,39].

Controlling for Systematic Bias across First Time versus Repeated and Face-to-Face versus Phone Call 24-h Dietary Recalls
The differences reported in this study ( Table 2 and Figure 1) between the first and second recalls and first and fourth, suggested a possible bias that required in-depth investigation. The exclusion of the first recall led to a decrease in the mean reported of several nutrients, but not for energy, nor macronutrients when compared to averages across all four recalls, with small size effects. The alternate exclusion of the second and fourth recalls led to an increase in mean reported energy and intakes for many nutrients when compared to averages across all four recalls, with small or small to medium size effects. In this context, three issues were considered: (1) If a systematic bias exists in the reported values of one specific ordered recall, and this bias cannot be ascertained to other known factors, should this recall be included in further analyses? (2) If this specific second recall is eliminated from the calculations, how does this change the reported energy and nutrient intakes that are considered further in the subsequent analyses? Our study suggests that the exclusion of the recall identified with order-specific bias would significantly change the reported intakes, and this could potentially create further methodological issues when intake data would be used for further analyses (Table 4); and (3) Could these differences be seen in the context of switching between face-to-face acquisition of intake and telephone interview? Although others have compared these methods for the general population, women, or a rural population, this has not been conducted so far in individuals with obesity.
This study also identified lower intakes for several micronutrients (vitamin C, calcium, folates, potassium, or fiber), which were specific for either the second or fourth recalls ( Table 2 and Figure 2). Due to the design of our study, it is difficult to speculate whether such differences were truly systematic and specific to the order of recalls, or if, within the FDR limits, these were spurious. It is important to point out that the first recall, having the higher estimates for several variables, was the only one performed on site, face-to-face, while all other three were performed over the phone. These differences could be seen in the context of switching between face-to-face acquisition of intake and telephone interviews. Although others have compared these methods for general population, women, or a rural population, this has not been conducted so far in individuals with obesity [33]. Since the first recall had higher values than recalls 2 and 4 for various nutrients, this possibility of bias induced by different means of communication cannot be excluded. This potential bias could point toward carefully considering the means of reporting and communicating with participants in a particular cultural context.

Suggestions for Improvement of Repeated 24-h Dietary Recalls Method
In order to improve the overall quality of ordered 24-h recalls used to capture the energy and nutrient intakes and to reduce the error observed between ordered 24-h recalls, our study suggested the possibility of systematic differences in the reported intakes for energy and macronutrients, which can be specific between recalls, with the first recall having potentially higher estimates for some nutrients and energy intakes. Whether these differences are due to true different intakes in the context of random error or due to reporting bias, this study indicates that repeated 24-h recalls can inherently present systematic differences between specific recalls. In our study, these differences were specifically between session 1 versus sessions 2 or 4.
To overcome this potential bias, we suggest that care should be taken starting from the design phase through the analysis phase of the study. A preliminary analysis should be conducted to identify whether a certain time point has significant differences from all other time points and overview potential sources of bias or other reasons for this finding.
One of the limitations of this study was that the true intake of nutrients was not assessed, nor were biomarkers of available nutrients available, so the true cause of the differences found for the second and fourth time points could not be identified. Another limitation was that the study did not use a control sample and, therefore, one cannot ascertain whether such differences were specific or not to individuals with obesity and associated morbidities.

Conclusions
This study identified that adults with obesity reported different energy and nutrient intakes between session 1, and sessions 2 and 4, respectively. Significant decreases in reported intakes were identified when comparing the mean intakes of all four 24-h recalls with mean intakes when each of the first, second, third, and fourth 24-h recall were excluded from the mean, suggesting that the first recall (also the only one performed on site, face-toface) could be a significant point of bias. Therefore, when considering whether to further use the data obtained from repeated 24-h recalls, preliminary analysis of potential differences between time points should identify whether session-specific bias exists, possibly related to using different means of reporting or communication (e.g., face-to-face versus reporting by telephone).