Strategies to Address Misestimation of Energy Intake Based on Self-Report Dietary Consumption in Examining Associations Between Dietary Patterns and Cancer Risk

The objective of this study was to determine the influence of strategies of handling misestimation of energy intake (EI) on observed associations between dietary patterns and cancer risk. Data from Alberta’s Tomorrow Project participants (n = 9,847 men and 16,241 women) were linked to the Alberta Cancer Registry. The revised-Goldberg method was used to characterize EI misestimation. Four strategies assessed the influence of EI misestimation: Retaining individuals with EI misestimation in the cluster analysis (Inclusion), excluding before (ExBefore) or after cluster analysis (ExAfter), or reassigning into ExBefore clusters using the nearest neighbor method (InclusionNN). Misestimation of EI affected approximately 50% of participants. Cluster analysis identified three patterns: Healthy, Meats/Pizza and Sweets/Dairy. Cox proportional hazard regression models assessed associations between the risk of cancer and dietary patterns. Among men, no significant associations (based on an often-used threshold of p < 0.05) between dietary patterns and cancer risk were observed. In women, significant associations were observed between the Sweets/Dairy and Meats/Pizza patterns and all cancer risk in the ExBefore (HR (95% CI): 1.28 (1.04–1.58)) and InclusionNN (HR (95% CI): 1.14 (1.00–1.30)), respectively. Thus, strategies to address misestimation of EI can influence associations between dietary patterns and disease outcomes. Identifying optimal approaches for addressing EI misestimation, for example, by leveraging biomarker-based studies could improve our ability to characterize diet-disease associations.


Introduction
Cancer continues to exert a large toll on morbidity and mortality globally [1]. Cancer prevention recommendations emphasize the importance of behaviors such as tobacco cessation, physical activity, and healthy eating [2]. With regard to characterizing healthy eating, there is a growing emphasis on moving beyond single dietary components to a more holistic approach that embraces overall eating patterns [3]. The relationship between diet and disease is complex: foods and beverages are consumed in different combinations that allow for countless interactions between nutrients and other dietary 35-69 years at enrollment, with no history of cancer except non-melanoma skin cancer, were recruited by telephone-based random digit dialing which facilitated balanced recruitment across the province. Eligible participants were mailed a consent form and a Health and Lifestyle Questionnaire (HLQ), followed by a past-year FFQ (Canadian Diet History Questionnaire-I; CDHQ-I), and the Past-Year Total Physical Activity Questionnaire (PYTPAQ). Participants had the opportunity to consent to linkage with the Alberta Cancer Registry (ACR) and provided personal health numbers. All questionnaires were sent via postal mail to participants who returned completed questionnaires in pre-paid envelopes.
Inclusion in the current study was limited to participants who consented to administrative data linkage and completed the HLQ, PYTPAQ, and CDHQ-I. Participants were excluded if they resided outside of Alberta (n = 29), had a prior cancer diagnosis, except for non-melanoma skin cancer, assessed via ACR linkage (n = 71), were recruited as the second ATP member in their household (n = 342) (due to potential intra-class correlations among members of the same household), were pregnant (n = 63), or were characterized as underweight (body mass index (BMI) <18.5) based on self-reported heights and weights (n = 18) (due to potential association between underweight and increased risk of disease [26]). Additionally, participants with missing height or weight measures (n = 70) were excluded since these values are required to calculate BMR for the purpose of the revised Goldberg method. The final sample sizes were n = 9,847 men and n = 16,241 women.

Dietary Intake Assessment
The CDHQ-I is a 257-item past-year FFQ based on the Diet History Questionnaire developed by the U.S. National Cancer Institute [27] and modified to reflect food availability, brand names, nutrition composition and food fortification in Canada [28,29]. Responses to the CDHQ-I were analyzed using Diet Calc software (version 1.4.2; National Cancer Institute, MD, USA) and a nutrient database tailored to the CDHQ-I, resulting in data on intake of energy, 66 nutrients, and 284 single foods. On the basis of similarities in macronutrient composition and culinary use, the 284 single foods were categorized into 55 food groups [30]. The percentage of daily total EI contributed by each of the 55 food groups was calculated by dividing daily EI provided by each food group by daily total EI.

Physical Activity Assessment
The PYTPAQ collects domain-specific (transportation, occupational, household and recreational) information on frequency, duration, and intensity of physical activity in the past 12 months [31]. The PYTPAQ has been evaluated relative to accelerometer data, showing acceptable reliability (0.64) and validity (0.41) for measurement of past-year physical activity [31].

Energy Intake Estimation
For EI estimation, participants were classified as EI under-reporters, plausible-reporters, or over-reporters using the revised Goldberg method [15,32]. Briefly, the plausibility of total reported energy intake (rEI) was determined based on the 95% confidence limits of agreement (cut-offs) between the ratio of total rEI to BMR and the ratio of total EE to BMR (PAL). BMR was calculated based on the participant's age, sex, body weight, and standing height using the Mifflin equation [33]. EE was calculated based on BMR, physical activity (sum of all domains from the PYTPAQ), and body weight [34]. To account for skewness in the distribution of rEI, the rEI to BMR ratio was transformed to a logarithmic scale. Individuals with rEI:BMR to PAL values below the lower Goldberg cut off, above the upper Goldberg cut off, and within Goldberg cut-offs were identified as under-reporters, over-reporters, and plausible-reporters, respectively. The Goldberg cut-offs were: lower = 0.75270, upper = 2.07586 for sedentary, lower = 0.90324, upper = 2.49103 for low active, lower = 1.05378, upper = 2.90620 for active and lower = 1.32475, upper = 3.65351 for very active.

Cancer Incidence and Sub-Groups
Primary incident cancer cases (All-Cancers, except non-melanoma skin cancer) were obtained by linkage to ACR in July 2017. The International Classification of Diseases for Oncology 3rd edition (ICD-O-3) was used to identify individual cancers. A subgroup of 21 primary cancers were identified based on a matrix from the World Cancer Research Fund/American Institute for Cancer Research Continuous Update Project (WCRF/AICR CUP) reporting on dietary components with convincing or probable evidence for increased or decreased risk of cancer [35] (Dietary-Cancers; Table 1). Another subgroup of 11 primary cancers were chosen based on the World Health Organization (WHO) classification of digestive system cancers [36] (Digestive-Cancers; Table 1). Follow-up time was calculated from the age at enrollment to the age at cancer diagnosis or at ACR linkage for participants who remained cancer-free during the follow-up period. All age variables were expressed with up to 2 decimal places for precision. To account for competing risk during follow-up due to death in participants who were cancer-free, vital statistics data were obtained from Alberta Health Services Data Integration, Measurement and Reporting (DIMR). In participants who remained cancer-free but died before linkage to ACR, follow-up time was calculated from age at enrollment to age at death.

Statistical Analysis
k-means cluster analyses [37] were performed to characterize dietary patterns. Individuals whose EI was determined to be affected by misestimation (EI under-reporters and over-reporters, henceforth collectively grouped as EI misreporters because over-reporters comprised only 1% of the study sample-and could not be assessed separately) were accounted for in the cluster analyses using four methods: included in the cluster analysis (Inclusion); excluded prior to completing the cluster analysis (ExBefore); excluded after completing the cluster analysis (ExAfter); and finally, excluded before the cluster analysis but added to the ExBefore cluster solution using the nearest neighbour method (Inclusion-NN) [38] (Figure 1). The nearest neighbour method (k = 1) is a pattern classification method that measures the Euclidean distance between a test example (i.e., participant) and the data set and assigns the test example to the cluster of the nearest neighbour [38].

Statistical Analysis
k-means cluster analyses [37] were performed to characterize dietary patterns. Individuals whose EI was determined to be affected by misestimation (EI under-reporters and over-reporters, henceforth collectively grouped as EI misreporters because over-reporters comprised only 1% of the study sample-and could not be assessed separately) were accounted for in the cluster analyses using four methods: included in the cluster analysis (Inclusion); excluded prior to completing the cluster analysis (ExBefore); excluded after completing the cluster analysis (ExAfter); and finally, excluded before the cluster analysis but added to the ExBefore cluster solution using the nearest neighbour method (Inclusion-NN) [38] (Figure 1). The nearest neighbour method (k = 1) is a pattern classification method that measures the Euclidean distance between a test example (i.e., participant) and the data set and assigns the test example to the cluster of the nearest neighbour [38]. All analyses were stratified by sex as self-reported by participants. The percentages of total rEI contributed by each of the 55 food groups were used as input variables. The k-means cluster analyses method started with the researcher selecting k initial clusters (a positive integer representing the number of clusters) and initial cluster seeds (a random positive integer representing the initial number of participants to be assigned to each cluster). Subsequently, each additional participant was automatically assigned to the nearest cluster on the basis of Euclidean distance, forming temporary clusters. Seeds were then replaced by the centroid of each temporary cluster, with the "centroid" referring to the mean observation of a cluster. Each participant was then reassigned to the nearest centroid, updating the location of the centroids. The process was repeated until centroids did not significantly change location. For these analyses, between two and seven cluster solutions were tested to balance feasibility and robustness. To reduce the impact of local optima [39], cluster analyses were run 10 times with different random starting seeds for each cluster solution. In both men and women, the cluster solution that provided the minimum total within-cluster sum of squares distance was selected. For all selected cluster solutions (2 to 7), the between-and within-cluster variances for each food group were calculated. Then, the natural log-transformed ratios of the between-versus withincluster variances were calculated to compare heterogeneity between and within clusters. The further apart the clusters, the larger the ratio; therefore, the optimal number of clusters is given by the cluster solution that has many food groups with large ratios. Dietary patterns were established by including each food group in the cluster to which it contributed the highest rEI. As such, food groups included in each of the three dietary patterns are mutually exclusive. All analyses were stratified by sex as self-reported by participants. The percentages of total rEI contributed by each of the 55 food groups were used as input variables. The k-means cluster analyses method started with the researcher selecting k initial clusters (a positive integer representing the number of clusters) and initial cluster seeds (a random positive integer representing the initial number of participants to be assigned to each cluster). Subsequently, each additional participant was automatically assigned to the nearest cluster on the basis of Euclidean distance, forming temporary clusters. Seeds were then replaced by the centroid of each temporary cluster, with the "centroid" referring to the mean observation of a cluster. Each participant was then reassigned to the nearest centroid, updating the location of the centroids. The process was repeated until centroids did not significantly change location. For these analyses, between two and seven cluster solutions were tested to balance feasibility and robustness. To reduce the impact of local optima [39], cluster analyses were run 10 times with different random starting seeds for each cluster solution. In both men and women, the cluster solution that provided the minimum total within-cluster sum of squares distance was selected. For all selected cluster solutions (2 to 7), the between-and within-cluster variances for each food group were calculated. Then, the natural log-transformed ratios of the between-versus within-cluster variances were calculated to compare heterogeneity between and within clusters. The further apart the clusters, the larger the ratio; therefore, the optimal number of clusters is given by the cluster solution that has many food groups with large ratios. Dietary patterns were established by including each food group in the cluster to which it contributed the highest rEI. As such, food groups included in each of the three dietary patterns are mutually exclusive.
Before cluster analysis, each input variable was standardized by subtracting the minimum input value and then dividing by the range. This standardization method, known as the range method, has been reported to give consistently better recovery of cluster structure in different error conditions, separation distances, clustering methods, and coverage levels when compared with other standardization methods, such as the z score [40].
Multivariable Cox proportional hazard regression models were used to assess the associations between observed dietary patterns and cancer risk, including All-Cancers, Dietary-Cancers, and Digestive-Cancers. Adjusted hazard ratios (AHR) were estimated in comparison to the association of a reference pattern with cancer outcomes. Competing risk analysis was performed, with the standard multivariable Cox proportional hazard regression model applied to the cause-specific hazard of interest and competing events treated as censored observations [41]. Regression models were adjusted for age (modelled on a continuous scale), BMI (modelled on a continuous scale), leisure-time physical activity (MET hours/week; modelled on a continuous scale), marital status, educational attainment, smoking status, family history of cancer, and personal history of chronic disease. In models for women only, menopausal status and hormone replacement therapy usage were included.
Means and SD are presented for continuous variables, and counts and percentages for categorical variables. For interpretation purposes, comparisons examined whether associations would be considered significant based on the often used p-value threshold of <0.05, though the consistency of estimates across methods of accounting for EI misestimation is also considered more holistically given that this p-value threshold is arbitrary [42]. All analyses were performed using SAS statistical software (version 9.2-Linux, SAS Institute, INC., Cary, North Carolina, USA).

Participant Baseline Sociodemographic Characteristics
Three dietary patterns, or clusters, were identified for both men and women: Healthy, Meats/Pizza, and Sweets/Dairy. Baseline sociodemographic characteristics stratified by dietary pattern and EI reporting status are presented in Table 2. Higher proportions of men and women assigned to the Meats/Pizza pattern were affected by obesity (BMI ≥ 30), while lower proportions had BMI < 25, compared to participants in both the Healthy and Sweets/Dairy patterns. Men and women in the Healthy pattern had higher reported leisure-time physical activity values compared to their counterparts in the Sweets/Dairy and Meats/Pizza patterns. The highest proportions of current smokers for both men and women were in the Meats/Pizza pattern. In men, the proportion who reported a personal history of chronic disease was highest in the Healthy pattern while in women, the proportion who reported a personal history of chronic disease was very similar across dietary patterns. For both men and women and across all dietary patterns, higher proportions of misreporters were affected by obesity, while lower proportions had BMI < 25, compared to plausible reporters. The proportions of EI misreporters were very similar between men (47.9%) and women (46.8%) and across all cancer cases and non-cases in men (49.0% vs. 47.8%) and in women (48.6% vs. 46.7%).

Dietary Patterns in Relation to Methods for Accounting for Misestimation of Energy Intake
The greatest contributors to total rEI in each dietary pattern across different methods of accounting for EI misreporters is summarized in Table 3. With few exceptions, the majority of the food groups in all three dietary patterns were common across the different methods in both men and women. However, Other Breads was not included in the Meats/Pizza pattern within the ExBefore and InclusionNN methods among men and ExBefore method in women. The percentage contribution of food groups in all three dietary patterns were very similar across different methods of accounting for EI misreporting. For the Inclusion method, fruits, high-fiber breakfast cereal, fruit juices, rice and nuts contributed the greatest proportions of energy for men within the Healthy pattern. For women in the Healthy pattern under the Inclusion method, fruit, regular-fat dairy products, lean fat poultry, nuts and rice were the largest contributors to total EI. Men assigned to the Meats/Pizza pattern with Inclusion had the highest total rEI contribution from meats, pasta/pizza, beer, regular soda and chips; while women in the Meats/Pizza pattern had similar intakes except for beer. Men and women assigned to the Sweets/Dairy pattern with Inclusion had high total rEI of low-fat dairy products and wholemeal (whole-grain) bread, and several sweets such as cakes, jams and ice cream. Mean intakes of plausible reporters in the ExBefore and ExAfter methods were similar in both men and women. Women in the Sweets/Dairy pattern ExBefore had only 3 food groups with the highest percentage contribution of total rEI compared to 7 food groups in the ExAfter. The largest contributors of rEI were similar between ExBefore and InclusionNN in both men and women. The mean intake of some food groups varied across different methods for accounting for potential misreporting of EI, this changed the ranking of food groups but the overall dietary patterns remained the same.  a Inclusion reports on all participants. Misreporters were included in the k-means cluster analysis. b ExBefore reports on plausible reporters; however, exclusion of misreporters identified using the revised-Goldberg method was completed before k-means cluster analysis. c ExAfter reports on plausible reporters; however, exclusion of misreporters identified using the revised-Goldberg method was completed after k-means cluster analysis; d InclusionNN reports on all participants; however, misreporters identified using the revised-Goldberg method excluded before the cluster analysis but added to the ExBefore cluster solution using the nearest neighbour method; e Mean percentage contribution by each food group.

Association between Dietary Patterns and Cancer Risk
For All-Cancers, no significant associations were observed between dietary patterns and cancer risk in men, regardless of the method used to account for misestimation of EI. However, the point estimates for the Sweets/Dairy and Meats/Pizza patterns and All-Cancer were higher in the ExAfter and Inclusion methods, respectively, compared to the other methods of accounting for EI misreporting. In women, a significant increased cancer risk was associated with the Meats/Pizza pattern in the InclusionNN (AHR (95%CI): 1.14 (1.00-1.30)) method and in the Sweets/Dairy pattern for the ExBefore (AHR (95%CI): 1.28 (1.04-1.58)) method (Table 4). Among women, the point estimate for the Meats/Pizza pattern under the Inclusion method was very similar to the estimate for InclusionNN, but the former would not be considered a statistically significant association if applying a p-value threshold of <0.05.  (Table 5). Also among men, the Sweets/Dairy pattern was associated with increased cancer risk under the InclusionNN (AHR (95%CI): 1.45 (1.07-1.97)) and ExBefore (AHR (95%CI): 1.74 (1.12-2.72)) methods (Table 5). In women, no significant associations were observed for this subset of cancers (Table 5). For Digestive-Cancers, no significant associations were observed with dietary patterns among men. Among women, a significantly increased risk of digestive cancers was observed for the Meats/Pizza pattern under the InclusionNN method (AHR (95%CI): 1.43 (1.02-2.01)) and the Sweets/Dairy pattern under the ExBefore method (AHR (95%CI): 1.73 (1.03-2.89)) ( Table 6). Competing risk analysis to account for deaths before ACR linkage date in participants who were cancer-free during follow-up did not significantly change the observed hazard ratios (Supplementary Materials: Tables S1-S3).

Discussion
The findings of this study suggest that misestimation of EI, ascertained using a prediction equation and self-reported physical activity and body weight and height, was prevalent among adults whose dietary intake was characterized using a FFQ within the context of a cohort study. Further, differing methods to account for this misestimation appear to impact observed associations between dietary patterns and cancer risk. Among men, there were no significant associations between dietary patterns and risk of all cancers regardless of the method of handling EI misestimation. However, the point estimates for All-cancers risk associated with the Sweets/Dairy and Meats/Pizza patterns were higher in ExAfter and Inclusion methods, respectively, compared to the other methods of accounting for EI misreporting. Among women, the Meats/Pizza pattern was associated with a 14% increased risk of all cancers in the method that included all participants regardless of EI misestimation (similar to that observed in the InclusionNN method). The Sweets/Dairy pattern was associated with a 28% increased risk of all cancers in the method that excluded women whose EI estimates were deemed to be affected by misestimation following the cluster analyses. Similarly, associations between dietary patterns and risk differed based on how EI misestimation was addressed for the subgroup of primary cancers for which there is evidence of the influence of dietary risk factors (men and women) and for digestive cancers (women). However, given that there is no marker of true dietary patterns, it is not possible to ascertain which method for accounting for EI misestimation results in observed associations that are the closest to truth.
Other studies have similarly suggested that analytical approaches used to account for potential EI misestimation can impact observed associations between dietary intake and disease outcomes among adults. A cross-sectional study of Norwegian women aged 50-69 years [44], which used an FFQ, found that self-reported CVD was significantly positively associated with "Western" dietary pattern scores among plausible reporters but not among all reporters. A prospective cohort study of Swedish adults [45] which used an interview-based diet history method, reported an increased risk of breast cancer with high alcohol intakes, with stronger risk estimates among plausible reporters compared with all reporters. A prospective cohort study of US adults [46] investigated the effect on the association between risk of breast, colon, endometrial and kidney cancer with reported EI calibrated to DLW data. Calibrated energy consumption was positively associated with risk of breast, colon, endometrial and kidney cancer, while uncalibrated energy was not. However, these studies reported lower proportions of misreporters (e.g., Norwegian 18%, Swedish 18% in men and 12% in women) compared to the current study (50% in both men and women). This could be due to the different equations used for calculating BMR. In the current study, BMR was calculated using the Mifflin equation while the Schofield and the Oxford equations were used in the Norwegian and Swedish studies respectively. In a study conducted with Korean adults [47], energy under-estimation was estimated to affect 14% of men and 23% of women, lower than the proportions observed in this study. This may be attributed to the use of a 24-hour recall in the Korean study as opposed to an FFQ in the current study.
Despite slight differences in methodology and design, the findings of this study are in line with previously published results indicating that estimated diet-disease associations can be influenced by measurement error [44]. Associations between dietary patterns and cancer risk varied depending on the methods used to account for misestimation of EI. Importantly, comparisons of findings based on different methods within and between studies are affected by considerations of what constitutes significant differences. For example, for women, the hazards ratios for cancer associated with the Meats/Pizza pattern were almost identical under two methods of accounting for energy misestimation, but under the conventional practice of applying a threshold of p < 0.05, only one of the two would be interpreted as significant. Thus, the findings highlight the need to consider not only how EI misestimation is accounted for across studies, but also to improve the reporting and interpretation of findings within nutritional epidemiology [42].
Prior analyses have highlighted the importance of considering measurement error and identified the need for caution in terms of the interpretation of diet-disease associations that have not been, at least partially, corrected for this error [45,46,48]. For example, regression calibration approaches are well developed and can make use of reference data, such as those collected using biomarkers or a less-biased tool such as 24-hour recalls in a subsample, to somewhat mitigate the impact of measurement error on diet-disease associations in large cohort studies in which an FFQ is the main tool [49]. Given that data from recalls have been shown to be affected by systematic measurement error to a lesser extent than data from FFQ [13], cohort studies administering recalls as the main assessment tool may be helpful for advancing our understanding of dietary intake and health. This is particularly true in the context of patterns since recalls provide comprehensive data including details on eating occasions and foods and beverages consumed in combination [7]. The use of recalls in cohort studies has become increasingly feasible with technological advances, such as online and mobile device-based tools [50]. Using such tools, cohort studies of the future can potentially take advantage of multiple modes of dietary assessment to dampen measurement error and its implications for observed diet-disease associations [51].
However, many current sources of data on diet and disease outcomes, with sufficient time elapsed from baseline data collection for cases of cancer and other conditions to accrue, may not provide opportunities for regression calibration. In this study, no reference data are available and we opted to use the revised Goldberg method to attempt to account for measurement error exhibited as EI misestimation. However, this method has challenges. The use of EI/BMR for evaluating EI depends on knowledge of energy requirements or EE [15]. For the purposes of the calculations, self-reported physical activity and anthropometric data were used-these data also undoubtedly contain measurement error, potentially resulting in misclassification of individuals based on their energy reporting status. Furthermore, the Goldberg method pertains to misestimation of energy only. It is known that misreporting is differential among different types of foods, beverages, and dietary components. For example, based on recovery biomarker-based studies, protein and potassium are less affected by misestimation than is energy [13]. This may be because errors in EI accumulate over many foods and beverages [7] but it may also be because energy-dense items are less accurately reported than other foods due to social desirability biases [52]. Studies based on observation and weighing have shown that different types of foods and beverages may be reported with differing levels of accuracy [53]. This is particularly relevant to studies of dietary patterns given that interest is inherently in combinations of foods and beverages consumed and the implications for health and disease risk.
Several statistical techniques are available for identifying dietary patterns and the choice of method depends largely on the research question at hand [54]. For example, cluster analysis may be useful for identifying mutually exclusive groups which differ according to their reported diet [54][55][56] and, as such, may help identify those at greater risk for developing specific cancers [57] or other chronic diseases. Alternatively, cluster analysis may group together those who tend to misreport their food and beverage consumption in similar ways, for example, due to social desirability biases. In this study, 55 food groups were created from the original 284 items in the FFQ while other studies have used smaller [54][55][56][58][59][60][61] or larger [57,[62][63][64] numbers of food groups, potentially influencing the findings. The k-means method has limitations, including the need to pre-specify the number of clusters to retain, sensitivity to initial cluster seeds [65], and challenges posed by the existence of clusters of different size or shapes or those that may be nonspherical or occur across several subspaces [66]. Other studies have used principal component analysis [44,[67][68][69], which aggregates food groups in linear combinations called principal components according to the extent to which they are correlated with each other. Studies using both k-means clustering and principal components analysis have observed similar patterns to those observed here. For example, Maree et al. [70] reported three dietary patterns in Australian men and women using k-means cluster analysis, with two of the clusters similar to the Healthy and Meats/Pizza patterns observed in the current study. Also using k-means cluster analysis, Freitas-Vilela et al. [71] also reported three dietary patterns, labelled Fruits and Vegetables, Meat and Potatoes and White Bread and Coffee, among pregnant women. Despite differences in naming, the three patterns are similar to those observed here. Further, using principal components analysis, Markussen et al. [44] identified similar patterns, named Prudent, Western and Continental, among both plausible and all reporters in a sample of women aged 50-60 years. Repeatability of dietary pattern analysis is often critiqued, since each cohort study can produce different patterns due to large variation between studies and their participants. However, the use of principal component analysis or cluster analysis appears to result in somewhat similar named dietary patterns.
This study made use of an existing cohort with a large sample size and careful validation of data and few missing values [24]. One exception was household income, which was characterized by a high degree of missingness and was not included in the Cox regression analysis despite evidence that socioeconomic status is associated with several types of cancers [72,73]. Cancer outcomes were ascertained via linkage with an accredited cancer registry (ACR), providing a more accurate diagnosis of the disease compared to self-report [44]. However, for privacy reasons, ATP does not release exact date of events and age at cancer diagnosis, given up to two decimal places, was used as an approximation for date of event. Thus, for the Cox regression analysis, precise follow-up times could not be calculated and therefore, hazard ratios might not have been precisely estimated. Due to the arbitrary nature of cluster analysis used in this study, the assignment of dietary patterns for an individual participant could have been different across methods of accounting for EI misreporting. This could also explain why differing methods of accounting for misestimation appear to impact observed associations between dietary patterns and cancer risk. For the k-means cluster analysis, total rEI was chosen as the input variable because EI is the foundation of the diet. All other nutrients must be provided within the quantity of food consumed to fulfill energy requirements. Therefore, if total EI is misreported, other dietary components may also be mis-estimated, albeit to differing degrees [74]. Other studies have used different measures such as the daily intake frequencies [70] and the average weight of food consumed per day [75]. These different measures may impact the results of the cluster analysis and hence the estimated diet-disease association. Finally, in addition to measurement error affecting the FFQ data, other variables, including physical activity, heights, and weights, are also subject to reporting error, potentially impacting the characterization of energy misestimation and the observed associations [76].

Conclusions
The results of this study suggest that observed associations between dietary patterns and health outcomes vary in relation to strategies for addressing EI misestimation. It is possible cohort studies that include the administration of biomarkers such as DLW in a subset of participants can shed light on misreporting of different dietary components and optimal strategies for accounting for it. Advances are also needed to enable improved characterization of dietary patterns, which inherently involve intake of many different foods and beverages that may be reported with different levels of accuracy.
In the meantime, researchers should carefully consider how misestimation and other sources and symptoms of measurement error are characterized and accounted for and carefully report these details to enable appropriate interpretation of their findings.