Relative Validity of a Short 15-Item Food Frequency Questionnaire Measuring Dietary Quality, by the Diet History Method

Food frequency questionnaires (FFQ) are commonly used dietary assessment tools. The aim was to assess the relative validity of a 15-item FFQ, designed for the screening of poor dietary patterns with a validated diet history (DH). The study population was derived from the Gothenburg H70 Birth Cohort Studies. The DH registrations were harmonized in accordance with the FFQ frequencies. The agreement was assessed by Cohen’s kappa with corresponding confidence intervals (CI) for the frequency and categorical variables. Bland–Altman plots were used for the numeric variables. The study comprised data from 848 individuals (55.2% women). Overall, there was high agreement between the methods, with the exact and adjacent level of agreement over 80% for eight variables. The proportion attributed to the opposite frequency was fairly low for most of the frequency variables. Most of the kappa values were in fair or moderate agreement. The highest kappa values were calculated for the type of cooking fat (k = 0.68, CI = 0.63–0.72) and sandwich spread (k = 0.55, CI = 0.49–0.53), and the lowest for type of bread (0.13, CI = 0.07–0.20) and sweets (0.22 CI = 0.18–0.27). In conclusion, the FFQ showed overall good agreement compared with the DH. We, therefore, think it, with some improvements, could serve as a simple screening tool for poor dietary patterns.


Introduction
Screening is considered as a quick and simple method aimed at identifying individuals at risk of an unhealthy condition [1]. Those identified at risk should be assessed by a more extensive method to confirm or exclude the actual condition.
Non-communicable diseases (NCDs), such as cardiovascular diseases, cancers, and diabetes, pose a great global health care challenge today. The risk of being affected by NCDs varies in the population, with low socioeconomic position as a strong predictor for poor health [2]. Most of the premature deaths are associated with such lifestyle factors as poor dietary patterns and low physical activity, risk factors that could be identified by screening. A large body of evidence shows that a healthy diet is a key factor for maintaining good health and preventing NCDs [3][4][5]. Such a diet is characterized by lower intakes of foods containing high levels of saturated fat and/or sugar combined with higher intakes of high-fiber foods [6].
However, according to the Nordic Monitoring System, the number of individuals with a healthy dietary pattern decreased by 20% between the years 2011 and 2014 [7]. In 2012, the Swedish National Board of Health and Welfare published the National Guidelines for Methods of Preventing Disease report [8], in which health-care personnel were urged to support patients in their efforts to improve their dietary pattern. Quick and easy screening methods are, therefore, needed to identify individuals at risk of poor dietary patterns. This is the rationale behind our previously developed short and simple food frequency questionnaire (FFQ) [9] aimed at indicating poor dietary patterns in relation to the recommended dietary pattern by the Nordic Nutrition Recommendations (NNR) [6].
The overall purpose of an FFQ is "A questionnaire in which the respondent is presented with a list of foods, and is required to say how often each is eaten in broad terms such as x times per day/per week/per month, etc. Foods included are usually chosen for the specific purposes and may not assess total diet" [10]. In the present case, the FFQ was aimed at screening the risk of poor dietary patterns [9]. The rationale was that health care personnel without specific knowledge within nutrition should be able to screen patients, for example, in a waiting room, within primary care. At the time this FFQ was developed, there were no similar tools available.
Our short 15-item FFQ has been tested in a feasibility study for its ability to predict dietary cardiovascular risk factors in a random sample from a healthy middle-aged population [9]. We concluded that the FFQ is able to predict cardiovascular risk factors. However, the FFQ has not been tested for validity. Dietary assessment tools are affected by measurement errors, and their accuracy needs to be assessed [11][12][13]. A validity control implies that a comparison is made with another method judged to be superior [14] or with other errors involved. However, there is no "golden" standard; only the relative validity of a new method can be assessed [15]. At the same time, dietary assessment tools with high accuracy are the key for identifying associations between diet and disease.
The aim of this study was to assess the relative validity of a short 15-item semiquantitative FFQ, designed as a tool for screening the risk of a poor dietary pattern, with a validated diet history (DH) interview in a sample of 70-year-olds. In this study, a poor dietary pattern was defined as a dietary pattern not in accordance with the dietary pattern recommended by the NNR.

Participants
We used a population-based sample of 70-year-olds (born in 1944) from the Gothenburg H70 Birth Cohort Studies (H70), Sweden, conducted in 2014-2016. Participants were systematically selected based on specific birth dates, and 1203 participated (response rate 72.2%). The study design has previously been described in detail [16].
All participants without dementia were invited to participate in the dietary examination; 342 individuals did not participate due to various reasons (e.g., declined to participate, poor health, lack of time, or no response). There were N = 861 individuals who participated in the dietary examination [17]. Of the 861 who participated in the dietary examination, 848 completed the FFQ. In total, the study population comprised 848 individuals who completed both the FFQ and participated in the DH interview. In the H70 study, results from the DH show that dietary patterns have changed during the past five decades when comparing data from five different birth cohorts of 70-year-olds, with an increase in healthy foods, such as fruits, vegetables, and fiber-rich foods, in later-born birth cohorts [17].
Ethical approval was obtained from the Ethics Committee for Medical Research in Gothenburg, reference number 869-13. The tenets of the Declaration of Helsinki were followed, and informed consent was obtained from all participants.

Food Frequency Questionnaire
The FFQ, found in the Supplementary Material, was originally designed as a quick screening tool to give an overall view of an individual's dietary pattern and to identify poor dietary patterns with focus on risk factors for cardiovascular diseases [9]. It comprised questions regarding frequency of consumption of different food groups based on indicators from the NNR of a healthy dietary pattern [6]. Information on portion sizes is not provided. The FFQ was either completed during the day for the general examination, or the participants could complete it later at home and return it by mail.

Diet History
A dietitian conducted a semi-structured face-to-face 60-90 min interview estimating food intake during the preceding three months. The interview took place either in the participant's own home, or at an outpatient clinic. The interview included both structured and open-ended questions about usual food patterns in order to capture habitual food intake as equivalent as possible. The same DH method has been used in all H70 cohorts where dietary habits have been investigated [17]. It was performed as a semi-structured interview by a dietitian capturing habitual dietary intake (food and beverages) during the past 3 months. The interview was conducted at the outpatient clinic or during a home visit, included not only open-ended questions about usual food patterns but also structured questions in order to capture total intake as closely as possible. Data were processed using the Swedish National Food Agency's nutrient database (The Swedish Food Composition Database) to estimate energy and nutrient intake. It has been described in detail previously [16,17]. The DH method has been validated and found to give comparable energy values to those predicted by the heart rate method, activity diary, and double labelled water, as well as by calculating the ratio between energy intake and basal metabolic rate (BMR) [18,19].
Dietary intake was registered as grams of food items usually consumed per day/week/ month for calculation of individual intake. The participants' reported intake of, in total, 1810 different food items divided into 35 food groups in accordance with a study from the Swedish Food Agency on dietary patterns [20] (the procedures have been described in detail previously [17]). Those of the 35 food groups applicable to the items in the FFQ were selected in further analyses and categorized in accordance with the options on the FFQ. The selected food groups and registered food intake in the DH are summarized in the Supplementary Material (Supplementary Table S1). Individual intake during the DH was summarized in frequencies and intervals (day/week/month), where each reported food item was considered as one intake of the applicable item on the FFQ. All intakes of each item were summarized as week total or daily total. As such, a reported consumption of, e.g., tuna once a week and salmon once a week in the DH were summarized as eating fish two times a week.

Harmonization of DH Data vs. the FFQ
All items measuring frequencies in the FFQ were condensed from four to three levels of frequencies (see Supplementary Material) because the removed level was not considered to add any further information. This applies to question (Q)1 vegetables, Q2 fruit/berries, Q3 nuts, Q4 fish, Q5 red meat, Q6 white meat, Q7 sweets, and Q10 dairy products. Corresponding frequencies and intervals (day/week/month) reported during the DH were summarized as week total or daily total and coded as applicable to the answer options for the corresponding item in the FFQ.
In the FFQ, Q8 "How often do you eat breakfast" was dichotomized into every day (every day/almost every day) and not every day (a few times a week/once a week). In the DH "Having breakfast" was identified by an open-ended question on usual meal pattern. Respondents reported usual meal pattern in terms of type of meal and time for 24 h on a regular day and on weekends. As the interviewer did not explicitly ask for breakfast (since there could be a risk of directing the answer of the respondent), breakfast was defined as regular intake of a lighter meal between the hours 5.30 and 9.30 a.m. For example, if the participant only reported intake of one glass of orange juice during these hours, it was not defined as a breakfast.
Q9 (bread) was divided into two separate questions: one for number of slices and one for type of bread in data analysis. Q9a, frequency of slices/pieces of bread usually eaten per day in total, was summarized as daily total intake both in the FFQ and DH. Q9b, types of bread, were coded as white bread, whole wheat/crispbread, or combinations of above. All individuals who stated "other" in the FFQ (n = 60) were not included in the agreement analysis in accordance with category classification. All types of bread registered in the DH were summarized and coded in the same way. Those who got a summarized intake of less than once a week in the DH were classified as "does not eat bread" and were not included in the agreement analysis (n = 31).
Type of milk/sour milk/and yoghurt usually consumed (Q11) was coded as full fat (3%), semi-skimmed/reduced fat (1.5%), and skimmed/low-fat/non-fat (≤0.5%) milk both in the FFQ and DH. Type of sandwich spread (Q12) was coded as butter (>75% fat), margarine with plant sterols, and margarine (30-70% fat). Individuals who stated "does not use spread" or "does not know" were not included in the agreement analysis (n = 138 and n = 7 in the FFQ and DH, respectively). Frequencies of types of dairy products and sandwich spreads registered during the DH were summarized, and the most frequent category was selected. For individuals who had consumed two categories equally (n = 19), and, if one of these was in agreement with the option chosen in the FFQ, it was interpreted as agreement between methods.
Type of cooking fat (Q13) was coded as butter/margarine (60-80%), margarine with seed and plant oils/liquid margarine, and vegetable oil. Information on type of cooking fat in the DH was obtained with open-ended questions. Hence, the participants could state several options. For individuals who had stated two different types of cooking fat in the DH (n = 442), and when one of these was in agreement with the option chosen in the FFQ, it was interpreted as agreement between the methods. The option "does not use fat in cooking" and "does not know" was not included in the analysis (FFQ, n = 7; DH, n = 76).
Q14, on adding salt to food in the FFQ, was dichotomized into no (no/yes, sometimes) and yes (yes often/yes, I always add salt before I taste the food). Information on salt consumption during the DH was obtained using the open-ended question "do you usually add salt to your food" (during a meal). Q15 regarding avoidance of salt in the FFQ was not included in the present study as there were no equivalent questions in the diet history. Since the bread question was divided into two questions, the analysis ended up with 15 items.

Participant Characteristics
Mean body mass index (BMI) with corresponding standard deviations (SD) was calculated based on measured height and weight values (kg/m 2 ). Other characteristics of participants were dichotomized. Smoking was divided into current or non-smoker (never smoked or past smoker). Educational level was divided into compulsory primary school (≤9 years) or higher. Marital status was divided into married (currently married and/or cohabiting) or not married (never been married/not cohabiting, divorced, widowed). Country of birth was divided into Sweden or other.

Statistical Methods
Participant characteristics are presented as numbers and percent. The proportion of answers classified into the same, adjacent, or opposite category by both methods was calculated for the items: Q1-Q7 and Q10. In the results, this was expressed as percent of exact agreement (same frequency measured in both methods), adjacent agreement (frequency in FFQ was one step away from frequency measured in DH), or opposite agreement (frequency in FFQ was opposite of the frequency measured in DH). The proportion of answers classified into the same category for both methods was calculated for the items: Q8, Q9b, and Q11-Q14. In the results, this was expressed as percent of exact agreement between methods.
The agreement of individual classification between both methods was evaluated, and weighted or unweighted Cohen's kappa values were calculated [21]. The Cohen's kappa statistics indicate poor level of agreement for values under 0.20, fair level of agreement for values between 0.21-0.40, moderate level of agreement for values 0.41-0.60, good level of agreement for values between 0.61-0.80, and excellent agreement level of agreement for the values above 0.80 [22]. In addition, the sensitivity and specificity of the two dichotomous items were calculated (Q8, Q14).
Intake of number of slices of bread (Q9a) is presented as mean, standard deviation, percentiles, and as minimum and maximum. The agreement of this numeric variable was analyzed using a Bland-Altman plot, showing the mean differences of bread intake between the two methods along with 95% limits of agreement (LOA) [23]. This was conducted to graphically assess the presence of bias or disagreement. Data management and statistical analyses were performed in R version 3.6.2 (The R Project for Statistical Computing, Vienna, Austria).

Participants
The present study included parallel data from the DH interview and FFQ in N = 848 participants (women n = 474, 55.2%). The background characteristics on the participants are presented in Table 1. BMI = body mass index; SD = standard deviation. 1 Missing on BMI, n = 11 (1.3%). 2 Missing on smoking, n = 2 (0.2%). 3 Missing on education, n = 8 (0.9%). 4 Missing on marital status, n = 3 (0.4%). Table 2 shows a comparison of the distributions on each food item for the FFQ (columns) and DH (rows). Figure 1 shows the proportion of agreement between the methods for items measuring frequencies (Q1-Q7 and Q10). The exact and adjacent level of agreement was over 85% for all the items except for sweets, where the agreement was 78.7%. Hence, the proportion of answers attributed to the opposite frequency was fairly low for all the questions except for nuts, sweets, and dairy products. All the kappa values were of fair or moderate agreement, and the highest kappa values were calculated for fruit/berries (0.48, CI = 0.40-0.55), dairy products (0.48, CI = 0.41-0.54), and fish (0.44 CI = 0.38-0.50) (Figure 1).    Figure 2 shows the proportion of individuals with exact agreement between the methods for all the categorical items: Q8, Q9b, and Q11-Q14. The agreement was over 70% for breakfast, type of sandwich spread, type of cooking fat, and salt, but somewhat lower for type of bread and type of dairy products, with an exact level of agreement of  Figure 2 shows the proportion of individuals with exact agreement between the methods for all the categorical items: Q8, Q9b, and Q11-Q14. The agreement was over 70% for breakfast, type of sandwich spread, type of cooking fat, and salt, but somewhat lower for type of bread and type of dairy products, with an exact level of agreement of 53.1% and 66.5%, respectively.  Figure 2 shows the proportion of individuals with exact agreement between the methods for all the categorical items: Q8, Q9b, and Q11-Q14. The agreement was over 70% for breakfast, type of sandwich spread, type of cooking fat, and salt, but somewhat lower for type of bread and type of dairy products, with an exact level of agreement of 53.1% and 66.5%, respectively. There was a statistically significant difference between the two means of slices of bread (Q9a). The Bland-Altman plot shows that the participants tend to underestimate intake through the FFQ compared with the DH. This means that there was a systematic underestimation bias on the reported slices of bread (Q9a) in the FFQ. The mean absolute difference in the bread intake between the methods was 2.93 (CI = 2.74-3.12) slices. The lower LOA was −2.65 (CI = −2.98-−2.32) and the upper LOA was 8.51 (CI = 8.18-8.84) (Figure 3).  (Figure 2). Both the breakfast and salt items had high sensitivity (0.95 and 0.83, respectively) but lower specificity (0.25 and 0.45, respectively) ( Figure 2).

Level of Agreement
There was a statistically significant difference between the two means of slices of bread (Q9a). The Bland-Altman plot shows that the participants tend to underestimate intake through the FFQ compared with the DH. This means that there was a systematic underestimation bias on the reported slices of bread (Q9a) in the FFQ. The mean absolute difference in the bread intake between the methods was 2.93 (CI = 2.74-3.12) slices. The lower LOA was −2.65 (CI = −2.98-−2.32) and the upper LOA was 8.51 (CI = 8.18-8.84) ( Figure 3).

Discussion
The present validation study has investigated the relative validity of a semi-quantitative 15-item FFQ aimed at screening the risk of poor dietary patterns. Data from the FFQ were compared with corresponding data from a DH, both used in a population-based sample of 70-year-olds. High agreement between the methods, with the exact and adjacent

Discussion
The present validation study has investigated the relative validity of a semi-quantitative 15-item FFQ aimed at screening the risk of poor dietary patterns. Data from the FFQ were compared with corresponding data from a DH, both used in a population-based sample of 70-year-olds. High agreement between the methods, with the exact and adjacent level of agreement over 80%, was found for a majority of the variables. The proportion attributed to the opposite frequency was fairly low for most of the frequency variables.
The agreement between the methods varied for different food groups. Good agreement between the methods, with high exact or adjacent agreement, was found for a majority of the frequency items, such as vegetables, fruit, fish, white meat, and dairy products. Good agreement was also found in most of the categorical items: breakfast, type of dairy products, type of sandwich spread, type of cooking fat, and salt use. Less good agreement was found for nuts, red meat, sweets, and type of bread. In addition, there was a systematic underestimation of the number of slices of bread, where participants on average underreported an intake of almost three slices.
Even though both methods rely on the participants' memory, the context of how the dietary intake is reported by the FFQ and the DH differs significantly, and the respondents may think differently when they report the intake by the two methods. The period for reporting the actual intake also differs somewhat between the methods. The FFQ was aimed at a habitual dietary pattern without any time perspective, while the time perspective in the DH was a habitual dietary pattern within the last three months. There is a risk in dietary validations that the first method used could influence the participants' answers in the second one since the reporting of dietary habits might make people aware of and reflect on their dietary intake, which, in turn, might affect the response to the second method. In the present case, the period between the participants' responses on the two dietary methods was from about 2 weeks to some months apart. Therefore, we judge the problem as minor that the FFQ answers may have affected the responses in the DH. Furthermore, the FFQ is a short self-administered questionnaire, opposite to the DH, which is based on an extensive interview by a dietitian of the respondents in their own homes.
The DH-interviewer needs, of course, to be as neutral and objective as possible when asking questions not to influence the answers. However, it might be easier to remember and reflect on intake when a professional interviewer put the question in a structured way for about one hour and encouraged the respondent not to forget anything consumed. If the participant was unsure, e.g., regarding the type of bread consumed, the interviewer could ask to see the packaging. Especially for red meat, the discrepancy between the methods was considerable. This may be because the respondents in the FFQ had difficulty categorizing dishes correctly. In the interview, the dietitian was able to ask follow-up questions to find out what type of meat the respondent ate.
Regarding sweets, there was a considerable difference in the reported intake between the methods, which may be explained by the fact that, in the DH, the interviewer supported the participant to remember by control questions. Furthermore, the question regarding sweets in the FFQ was broad, including many different food items. According to dairy products, the difference between the methods might be explained by the fact that some people use milk products with different fat contents and they reported the most common alternative differently in the FFQ and DH, respectively. With regard to nuts, the difference between the methods could be explained by the fact that nuts were not eaten routinely in this cohort of older adults.
There is no consensus in the literature regarding which statistical method is the most suitable for assessing the validity of dietary tools. Analysis by Bland-Altman plots and Kappa analysis are both frequently reported [10,24,25]. Using more than one approach demonstrates the robustness of the validation process [10]. The classification capacities of the tools can be analyzed by comparing, through Kappa analyses and contingency tables, the concordance or agreement within the distribution by tertiles, quartiles, or quintiles. The results can then be presented as an exact agreement (classified in the same category by both methods), plus or minus one category, and gross misclassification [10]. The main advantage is that, with cross-classification, the percentages misclassified clearly illustrate the likely impact of measurement error. It has been established that 50% of the subjects correctly classified and <10% of subjects grossly misclassified into thirds, and weighted kappa values above 0.4 are desirable [25]. However, this seems to be difficult to achieve in the validation studies of the FFQ [24].

Future Adaption and Improvements of the FFQ
A systematic review of the validation of the semi-quantitative FFQ summarizes that the validity results are not always favorable for all the nutrients or food groups evaluated. Improvements of the tool are recommended [24]. The FFQ method has several limitations; taken together, this means that it is only able to give an indication of the quality of the dietary pattern in relation to the recommendations [6]. However, its simplicity is also its greatest strength as a quick and easy screening tool trying to find individuals at risk of poor dietary patterns, e.g., within primary care and preventive medicine. Within health care, screening for the risk of malnutrition is well established [26]. The common methods for malnutrition screening are the SGA (subjective global assessment) [27] and MNA (minimal nutrition assessment) [28,29]. However, screening for a poor dietary pattern as a risk factor for NCD is not well established, at least not in Sweden. Primary care and preventive medicine could benefit from a quick screening as a basis for a conversation with the patient about further steps for diet change, for example, dietetic counselling.
In this validation study, we have obtained valuable experience on how to improve the FFQ. First, the items on the FFQ, measuring frequencies were condensed from four to three levels. This change was made before any analysis because the levels did not add any further information as separate frequencies. This merger also enabled better and more robust analysis through kappa statistics. We believe that three levels of frequencies are sufficient to determine the dietary pattern and recommend that the levels are merged in further use. Second, there was a systematic underestimation of the reported slices of bread (Q9) that should be taken into consideration when interpreting an individual's dietary pattern. We also observed some missing information on Q9b, number of breads, which might be explained by the duality of the question. Some respondents might have misunderstood the question because there is no option "does not eat bread" in the second step of the question, which is focusing on the type of bread. We suggest dividing this question into two separate questions, one for the number of slices and one for the type of bread, to prevent misunderstandings. Finally, regarding Q12 (spread) and Q13 (cooking fat), we suggest simplifying them by giving a lower number of alternatives. By these alterations, we think the conditions for this FFQ to serve as a simple screening tool will be improved.

Strengths and Limitations
The strength in the present study is the large homogeneous population-based sample in terms of the city of residence and age, which is an advantage when the aim is to compare the outcome from two different methods. Furthermore, dietary history is a well-established validated method, here considered as the reference method. However, there are some limitations related to the comparison between the methods. The data in the DH were not originally intended to be summarized in accordance with the frequencies in the FFQ. However, since the FFQ should be regarded as a screening instrument, not a detailed dietary examination, the calculated frequencies are rough estimations of the individual intake of food groups. Found "at risk" is not equal to having poor dietary habits. Therefore, individuals found at risk within primary care and preventive medicine might be referred to a dietitian for a more in-depth assessment and possibly consultation.
Information on the intake of polyunsaturated fat from cooking is missing, and the breakfast variable was derived by setting a time interval for the first meal during the day, which can be questioned as a method for classifying breakfast. A strength is that we have used different statistical methods, both Cohen's kappa and Bland-Altman, aiming to present the robustness of the validation process [10]. However, the analysis through Cohen's kappa has its limitations. Although the level of the exact and adjacent agreement between the methods was high for most items, some of the calculated kappa values indicated only a fair or moderate level of agreement. Cohen's kappa is sensitive to uneven answer distributions and is, therefore, mostly used on quintile data. However, it was not possible to recalculate the frequencies into quintiles in the present study. In addition, the greater the expected chance agreement, the lower the resulting value of the kappa. Therefore, dichotomous variables are assigned lower estimated kappa values because the odds of ending up in one or the other category is 50%. This considered, we think that the agreement between the methods is higher than indicated by the fair or moderate level of agreement (see Figure 1).

Conclusions
We have validated a short 15-item FFQ against a DH in a population-based sample of 70-year-olds. The validation study has some limitations related to the FFQ and some related to the DH. There was good agreement between the methods, with the exact and adjacent level of agreement over 80% for a majority of the variables. The proportion attributed to the opposite frequency was fairly low for all the frequency variables, except nuts, sweets, and dairy products. Based on the present results, we think that the FFQ, with some improvements, could serve as a simple screening tool for poor dietary patterns.