The Reproducibility and Relative Validity of a Food Frequency Questionnaire for Identifying Iron-Related Dietary Patterns in Pregnant Women

Analyzing pregnant women’s iron intake using dietary patterns would provide information that considers dietary relationships with other nutrients and their sources. The objective of this study was to evaluate the reproducibility and relative validity of a Qualitative Food Frequency Questionnaire to identify iron-related dietary patterns (FeP-FFQ) among Mexican pregnant women. A convenience sample of pregnant women (n = 110) completed two FeP-FFQ (FeP-FFQ1 and FeP-FFQ2) and a 3-day diet record (3DDR). Foods appearing in the 3DDR were classified into the same food groupings as the FeP-FFQ, and most consumed foods were identified. Exploratory factor analysis was used to determine dietary patterns. Scores were compared (FeP-FFQ for reproducibility and FeP-FFQ1 vs. 3DDR for validity) through intraclass correlation coefficients (ICC), cross-classification, Bland–Altman analysis, and weighed Cohen kappa (κw), using dietary patterns scores tertiles. Two dietary patterns were identified: “healthy” and “processed foods and dairy”. ICCs (p < 0.01) for “healthy” pattern and “processed foods and dairy” pattern were 0.76 for and 0.71 for reproducibility, and 0.36 and 0.37 for validity, respectively. Cross-classification and Bland–Altman analysis showed good agreement for reproducibility and validity; κw values showed moderate agreement for reproducibility and low agreement for validity. In conclusion, the FeP-FFQ showed good indicators of reproducibility and validity to identify dietary patterns related to iron intake among pregnant women.


Introduction
Anemia is a health problem that affects over 1.5 billion people worldwide, and its prevalence is higher among >5-year-old children and pregnant women [1]. Thus, it is considered one of the leading global public health challenges. In Mexico, the health of women of reproductive age is affected by conditions such as high blood pressure (6.8% in women aged 20-39 years), hypercholesterolemia (18.6% in women aged 20-39 years), overweight (26.9% in adolescents; 36.3% in women aged 20-49 years) and obesity (14.1% in

Study Participants
The research was conducted in Guadalajara, Mexico, from March to September 2018, including pregnant women between 20 and 40 years of age, in either their second or third trimester of pregnancy and attending outpatient service at the Maternal and Child Hospital "Esperanza López Mateos", a public hospital providing attention to pregnant adolescents and low-income women without social security. Exclusion criteria were women with multiple pregnancies, chronic disease diagnostic, or pregnancy complications (specifically, preeclampsia and gestational diabetes; gastrointestinal disorders were nor considered as exclusion criteria).
A convenience sampling with consecutive cases was used. The recommended sample size for the validity of semiquantitative food frequency questionnaires is 110 participants, approximately, considering a power of 80% and α = 0.5 [19]. Based on this, 140 participants were included at the beginning of the study, considering possible follow-up losses.

Study Design
The participants completed the qualitative Food Frequency Questionnaire through an interview to identify iron-related dietary patterns (FeP-FFQ) at baseline (FeP-FFQ1), which could be at either their 2nd or 3rd gestational trimester. During this first interview, sociodemographic and anthropometric data were collected, and participants were trained on how to complete a three-day diet record (3DDR), with two weekdays and one weekend day. The training was provided during the first interview and consisted of explaining how to fill out the diaries with five steps: the time and place of the meal, name of the dish, dish ingredients, type of ingredient (e.g., whole milk, light milk), amount consumed. These steps were also written on the diet record form along with an entry example. Participants were also instructed on standard cooking measures (tablespoon, teaspoon, cups). At the end of the training, participants were asked to write down one entry example to verify the instructions were clear and clarify questions. Participants were also instructed to take pictures of their reported meals and drinks whenever possible. They were asked to submit their 3DDR at their next interview and were contacted by phone in advance to verify they had no questions. One month later, the participants submitted their 3DDR and it was verified that they had been completed correctly; at this point in time they also completed the second FeP-FFQ (FeP-FFQ2).

FeP-FFQ Development and Application
The FeP-FFQ is an adaptation of Beck et al.'s [18] qualitative Fe-FFQ. Foods rich in iron, calcium, vitamin C, vitamin A, and fiber were added, considering those of frequent consumption in Mexico. Following Zhou's [20] suggestion, foods with at least 3% of the daily iron recommended intake for Mexican pregnant women were included (SAME [21]). The qualitative FeP-FFQ includes a total of 76 items divided into ten categories: vegetables; fruits; grains and starchy foods; legumes; meat, poultry, cured meats and eggs; fish and seafood; dairy; snacks, sweets and desserts; alcoholic beverages; and other beverages. Participants were asked how often they ate each food during pregnancy, with nine answer options: never or rarely, 1-3 times/month, once a week, 2-4 times/week, 5-6 times/week, once a day, 2-3 times/day, 4-6 times/day, and ≥7 times/day.

Measurements
Procedures for data collection are presented below.

Sociodemographic, Anthropometric Data and Iron Status
Sociodemographic (e.g., age, sex, civil status, education, socioeconomic status) and anthropometric (e.g., pregestational weight, current weight, height) data were collected at the moment participants were recruited, when they could be at either the 2nd or 3rd trimester of pregnancy. Data were collected through a questionnaire designed for this study and applied by a trained nutritionist. Socioeconomic status was evaluated and classified using the AMAI (Mexican Association of Market Intelligence and Opinion Agencies) [22] scores. Weight was measured using a Tanita scale, model TBF-300A with a 0.1 kg precision, and height was measured with a Seca portable stadiometer, model 213. Age was calculated from the participants' birthdates. Current and pregestational body mass indexes (BMI) were calculated by dividing weight in kilograms by height in meters squared. Current BMI was classified using the Atalah curves for the Latin-American population [23]. Hemoglobin status data were obtained from the medical records at the 3 rd trimester of pregnancy.

Diet Records
During the first interview, women received the 3DDR form, were instructed on how to fill it correctly, and were asked to submit it four weeks later, during their second appointment. Each participant filled two diet records of weekdays and one of a weekend day, specifying mealtime, place, and time of consumption. They were asked to write down the name of the dish and each ingredient and the type of food (e.g., whole milk, skim milk, lactose-free milk); and when possible, to take photos of all meals and drinks consumed during the three reported days. During the second appointment, 3DDRs were collected and screened for missing information. Subsequently, all foods in the records were counted, added up, and categorized manually according to the 76 items in the FeP-FFQ. For analysis purposes, when participants reported consuming two different food items of the same group at the same mealtime (e.g., strawberries and orange), it was registered as a two-time consumption of the food group, except for water, which was registered as one intake for every 250 mL consumed during a day. For dishes where the participants provided recipes, ingredients were assigned individually to the corresponding FeP-FFQ item. Some of the foods reported in the 3DDR did not match any of the FeP-FFQ items, which were excluded from analyses.

Statistical Analysis
Descriptive analytics were performed for quantitative variables and are presented as means and standard deviations (SD), while qualitative variables are presented as frequencies and percentages.
To evaluate the reproducibility and relative validity of each FeP-FFQ item, the intake frequency was adjusted for each participant and each food consumed by using the midpoint as the level of consumption. Then values were converted to weekly intake frequency (e.g., 1-3 times/month = 2 times/month = 0.5 times/week). These weekly values were later converted to a 3-day consumption frequency to align with the 3DDR (e.g., 0.5 times/week divided by seven and multiplied by 3 = 0.21 times in 3 days). Additionally, all food items reported in the 3DDRs were categorized into the 76 items of the FeP-FFQ.

Identification of the Most Frequently Consumed Food Items
To identify dietary patterns, first, the most frequently consumed food items were identified by obtaining the means and standard deviations of the consumption frequency of each food item from the 3DDR, FeP-FFQ1, and FeP-FFQ2. Food items not identified as frequent consumption in the 3DDR and both FeP-FFQ were excluded from the factor analyses. Thirty-two out of the 76 items emerged as of higher frequency of consumption.

Identification of Dietary Patterns from the FeP-FFQs and the Diet Records
Spearman correlation was performed to measure the reproducibility of the most frequently consumed food items from the questionnaire (FeP-FFQ1 and FeP-FFQ2), while for relative validity, Pearson's linear correlation was used for the consumption frequency of each item in the FeP-FFQ1 and 3DDR. Paired t-tests were used to compare food items' intake frequency mean differences: FeP-FFQ1 vs. FeP-FFQ2 for reproducibility; and FeP-FFQ1 vs. 3DDR for relative validity, with a significant p-value of <0.05. The effect size was obtained for the significant differences between the questionnaires using the formula: effect size r = √ t2/(t2 + df ) (where t = t statistic produced by paired t-test and df = degrees of freedom). An effect size of 0.1 indicates a small effect, 0.3 a medium effect, and ≥0.5 a large effect [22].
Subsequently, a factor analysis with varimax rotation was used for FeP-FFQ1, FeP-FFQ2, and 3DDR to identify dietary patterns. The Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett's test p values were used to determine the presence of relationships between variables in the factor analysis, with values of >0.5 and <0.05, respectively.
The relative validity and reproducibility of dietary patterns derived from FeP-FFQ1 were examined by calculating Pearson linear correlation coefficients, ICC, and crossclassification between diet pattern scores obtained from FeP-FFQ1 and the 3DDR (relative validity), and FeP-FFQ1 and FeP-FFQ2 (reproducibility).
Finally, diet pattern scores were divided into tertiles and reproducibility (FeP-FFQ1 vs. FeP-FFQ2) and validity (FeP-FFQ vs. 3DDR) analyses were complemented with the Bland-Altman method and weighted Kappa Cohen coefficient to assess the level of agreement between methods based on Landis and Koch [24] classification, where κw values between 0.21 and 0.40 indicate fair agreement, between 0.41 and 0.60 moderate agreement, between 0.61 and 0.80 good agreement, and >0.81 very good agreement.
All statistical analyses were performed using IBM-SPSS software version 21.

Participant Characteristics
A total of 140 women were initially invited to participate and completed the first FeP-FFQ; 30 were excluded because they did not complete all the required questionnaires; therefore, 110 participants completed the study. Participant characteristics are detailed in Table 1. Mean age was 26.89 years (SD 5.58). The mean monthly family income was 409.82 USD (SD 315.52), and from this, 199.42 USD (SD 77.84) is used as a monthly food stipend. Most of the participants live in free union (54.5%), have an education level of secondary school (46.4%), are housewives (80.9%), have a family composed of a mother, father, and children (73.6%), and are low-middle class (31.8%). The mean pregestational BMI was 26.0 kg/m 2 (SD 5.7), and the majority of participants were classified as "normal weight" (43.6%). In contrast, the mean current BMI was 28.8 kg/m 2 (SD 5.6) and "overweight" was the most common classification among participants (35.5%).

Food Intake Frequency
From the three questionnaires (FeP-FFQ1, FeP-FFQ2 y 3DDR), 32 food items with the most frequent consumption were obtained ( Table 2). Mean consumption fluctuated between 15.54 (for water) to 0.90 (for sausages and ham) times/3 days. Slight differences in frequency of consumption were found for the 32 items between FeP-FFQ2 and 3DDR compared to FeP-FFQ1. On the other hand, the mean intake frequency reported was higher in both FeP-FFQs than in the 3DDR for the 32 food items except water and beef. For most items, consumption frequency was significantly different between FeP-FFQ1 and 3DDR. This contrasts with the comparison between both FeP-FFQs, where significant differences were only found for 10 out of 32 items. Through paired t-tests, it was found that the consumption frequency was overestimated for FeP-FFQ1 in 19 out of 32 items (59.4% of the foods) compared to the 3DDR (p < 0.01): corn tortilla, tomato, citrus fruits, lime, stone fruits, homemade beans, banana and fried plantain, apple, lettuce, carrot, melon, watermelon, sour cream, rice, yogurt (drinking yogurt included), pasta soup, papaya, ice-cream/sorbet/popsicles, and beef.

Reproducibility and Validity of Dietary Patterns
Pearson correlation coefficients for the reproducibility of the dietary pattern scores from the factorial analysis between FeP-FFQ1 and FeP-FFQ2 suggested good reproducibility: 0.60 for the "healthy" pattern and 0.55 for the "industrialized food and dairy" pattern (p < 0.01); while the ICC were 0.76 and 0.71 (<0.01), respectively. Validity correlations of the dietary pattern scores between FeP-FFQ1 and 3DDR were: 0.11 for the "healthy" pattern (p > 0.05) and 0.20 for the "industrialized food and dairy" pattern (p < 0.05). ICC for the validity of the dietary pattern scores were 0.36 for the "healthy" pattern and 0.37 for the "in-dustrialized food and dairy" pattern (p < 0.01). Cross-classification of the reproducibility of dietary pattern scores found 58.2% of participants classified in the same third for "healthy" and 51.9% for "industrialized food and dairy" pattern. A low percentage was classified in the opposite third for the "healthy" pattern (5.5%) and 11.9% for the "industrialized food and dairy" pattern.
Moreover, a fair and significant agreement for the validity of both patterns was found (κw = 0.12 for "healthy" and κw = 0.14 for "industrialized food and dairy"); and a moderate and significant agreement for reproducibility of the "healthy" pattern (κw = 0.47) and "industrialized food and dairy" pattern (κw = 0.32).
Bland-Altman plots compare reproducibility between FeP-FFQ1 and FeP-FFQ2; and validity between FeP-FFQ1 and 3DDR of dietary patterns scores (Figures 1 and 2, respectively). The x-axis shows dietary pattern mean score, and the y-axis, the difference between scores, FeP-FFQ1 vs. FeP-FFQ2 for reproducibility and FeP-FFQ1 vs. 3DDR for validity.  It is noteworthy that the means for reproducibility and validity of both patterns was equal to zero. This allows identifying the agreement magnitude visually between both methods, in this case, of FeP-FFQ and 3DDR. Dotted lines show the agreement limits between both methods, and they are situated at ±1.96 SD from the mean agreement. The plot shows that most dietary pattern scores fall within the agreement limits.

Discussion
This study shows the relative reproducibility and validity of a qualitative food frequency questionnaire to identify iron-related dietary patterns (FeP-FFQ) among Mexican pregnant women. This is relevant considering the high prevalence of anemia among Mexican pregnant women (34.9%) [2], which is similar to the global prevalence (36.5%) [1] and because this prevalence doubled from 2012 to 2018, going from 17.9% to 34.9% [2]. Moreover, the anemia prevalence among Mexican pregnant women is significantly higher than in non-pregnant women, highlighting the importance of having a tool that helps to evaluate the iron-related dietary patterns among this population.

Reproducibility and Validity of FeP-FFQ
The FeP-FFQ showed sufficient reproducibility and validity indicators in the food items to identify iron-related dietary patterns in pregnant women. Moreover, this affirmation is supported by the obtained reproducibility coefficients (r > 0.3) for the items most frequently consumed. Other studies have reported correlation coefficients similar to those in this study [15,18,26,27]. Regarding the time interval in the administration of the questionnaires, it should be noted that in the present study, an interval of one month was used between the administration of FeP-FFQ1 and FeP-FFQ2, obtaining reproducibility coefficients greater than those expected. In contrast, the studies by Rodríguez [27] y Glabska [15] used wider intervals (one year and six weeks, respectively) to evaluate questionnaire reproducibility. The obtained coefficients were similar to those in this study (r > 0.3). On the other hand, the variability of the obtained coefficients could be explained by diverse factors: (a) the statistical analyses used: in this study, only the Spearman method with qualitative variables of frequency intake was used, while Baer [28] used the Pearson method quantitatively and adjusted to the total energy intake. (b) Participants characteristics: the present study included women between 20 and 40 years of age, although studies in other countries have included adolescent pregnant women; thus, the differences observed between these populations could be explained by the differences in the diet of adolescents [26,29].
The FeP-FFQ obtained validity correlation coefficients of >0.3 for the most frequently consumed items, similar to the values reported by Robinson [30], Galante [31], and Brunst [32]. However, other studies have reported slightly higher validity coefficients (r > 0.6) [29,33,34]. These differences could be attributed to diverse factors: the design of the questionnaire, including the number of items, seasonality, and time-frame can contribute to the coefficients' variability between studies; the variability in the methodology used, for example, the reference standard used as a method for validation may also contribute [35][36][37]. The reference standards most commonly used are diet records (DR), 24h-R, and the biochemical markers [17,27,35,36,38]. Generally, 24h-R and DR cannot represent the habitual consumption of the period of interest since they often contain errors and underestimate the intake of nutrients by almost 20% [36]. Therefore, correlation coefficients between both methods for most foods and nutrients are between 0.4 and 0.7 [36]. Nevertheless, DRs are deemed the best comparison method available [19]. The present study used one DR as the reference standard, a method used by different studies that obtained similar validity coefficients [15,18,26,29,30,[39][40][41][42].

Reproducibility and Validity of Dietary Patterns
Dietary patterns offer information about the characteristic combinations of the foods consumed by individuals or groups. The methodology to define them is relatively new and has become increasingly popular in nutrition epidemiology to evaluate a diet as an alternative to nutrient-based analyses [43][44][45][46][47][48]. The factorial analysis uses reported information from different instruments to identify common underlying dimensions (factors or patterns), where food intake or several foods are integrated through scores [43,46,47]. Subsequently, these are labeled, but this method can be subjective; sometimes it tends to confuse the population because it is done based on the foods patterns [46].
In the present study, the first dietary pattern was labeled "healthy", similar to the patterns reported by some other studies [18,45,47,49]. "Industrialized food and dairy" was the label for the second pattern, which was not found in similar studies screened by the authors [18,[43][44][45][46][47][48][49]. Few studies have examined the validity and reproducibility of dietary patterns obtained through factorial analysis [48].
In this study, Pearson correlation coefficients for reproducibility of the dietary pattern scores were 0.60 for the "healthy" pattern and 0.55 for the "industrialized food and dairy" pattern. These results are very similar to the ones reported in another three studies analyzing reproducibility of dietary pattern scores [43,45,48]; however, these studies did not analyze iron-related dietary patterns.
Validity correlation coefficients obtained in this study were "good", with 0.11 for the "healthy" dietary pattern and 0.20 for the "industrialized food and dairy" pattern. Other studies have reported higher dietary pattern scores correlation values for validity [18,[43][44][45][46][47][48]. For example, Beck et al. [18] reported reproducibility and validity coefficients higher than those in the present study. These discrepancies could be attributed to the population characteristics. Our study included pregnant women from 20 to 40 years of age, mainly with secondary school studies and a medium-low socioeconomic status, while Beck [18] included women between 18 and 44 years of age, not pregnant, with high education studies and high socioeconomic status. In this regard, Teixeira y col. [50] reported that age, education level, and socioeconomic status were directly associated with Brazilian pregnant women's dietary patterns; furthermore, they suggested there is a need for additional resources to allow vulnerable mothers to have an adequate diet during pregnancy.
The present research is the first study of its type that complemented its results with ICC for the reproducibility and validity of both patterns. However, the values for validity ("healthy" = 0.36 and "industrialized food and dairy" = 0.37) were not as good as the ones for reproducibility ("healthy" = 0.76 and "industrialized food and dairy" = 0.71). Because correlation coefficients can underestimate the level of agreement with actual intake between the reference method and the new method [35,51], this study additionally used the Bland-Altman analysis to evaluate the level of agreement between methods. The mean difference of the scores for both dietary patterns was equal to zero, similar to the result reported by Beck et al. [18] for their validated dietary patterns.
The "healthy" and "industrialized food and dairy" dietary patterns scores for FeP-FFQs and 3DDR were classified into tertiles. Both patterns were classified for the same third: 58.2% and 51.9%, and opposite third: 5.5% and 11.9%, respectively. Similar results were reported by Beck [18], finding that "healthy" and "sandwich and drinks" patterns classified in the same third: 52.17% and 54.78%; and opposite third: 52.17% and 54.78%, respectively.
In our study, weighted Cohen's kappa coefficient was used to quantify the relative importance between agreements and different levels of disagreement [52]. The agreement for relative validity was fair and significant for both dietary patterns, in contrast to the moderate agreement for validity found by Beck [15]. On the other hand, both dietary patterns showed a moderate reproducibility, similar to the one reported by Beck [18]. It is noteworthy that the use of weighted Cohen's kappa is controversial, given that its magnitude depends on the number of categories used and the weights applied, since the maximum weight is given to perfect agreement and proportionally lower weights according to the level of agreement [51,53].

Limitations and Strengths
This study presents some limitations, which are listed next. The use of 3DDR as a reference standard for validity could have been biased if the participants made mistakes in their reported intake; regardless, participants were trained to minimize these potential mistakes. Additionally, it has been reported that people tend to change their habitual intake when they know they are under evaluation [19], which may cause the wrong estimation of the actual intake. Nonetheless, it was decided to use this method because it minimizes potential memory biases of the 24h-R and reduces the time of administration, though they must be screened in detail through interviews. A card with the nine consumption frequencies suggested by Walter Willet [19] was used to minimize errors and reduce time during the FeP-FFQ application. This card was provided to every participant, and they indicated the frequency of intake of each food item.
Another limitation could be that all of the 32 food items most frequently consumed were included in the factorial analysis, which is different from the one used in other studies, which tend to group all consumed foods into categories [45,47,49,54,55]. However, it has been described that consuming 15 to 20 food items per week is possible for most people, even though optimal health can be achieved when a variety of 30 or more food items are consumed per week [56]. Including 32 food items in the analysis allows identifying poor dietary patterns through to the most healthy patterns. In the present study, biochemical markers were not included due to lack of feasibility to obtain serum ferritin and C-reactive protein as primary indicators for iron deficiency. A further study is planned to investigate if the identified dietary patterns are related to iron deficiency biochemical markers.
Lastly, in this study specific nutrients known to affect the absorption of iron, such as phytic acid/phytates, polyphenols/tannins, proteins from soya beans, and calcium [57], were not considered for analysis.
The strengths of this study are as follows: the FeP-FFQ included food items that have been identified as facilitators or inhibitors of iron absorption and the sources of this nutrient, which provides a complete picture of its consumption. In addition, to the best of our knowledge, the FeP-FFQ is the first tool designed specifically to determine iron-related dietary patterns, also reporting the reproducibility and validity of these dietary patterns for its use among pregnant women. Although a previous similar study was identified 15 , this was conducted among women aged 18 to 44 years from New Zealand, and it did not focus on identifying iron-related dietary patterns for pregnant women.

Conclusions
The FeP-FFQ showed good indicators of reproducibility and validity; therefore, this instrument could contribute to identifying iron-related dietary patterns among pregnant women since it provides an efficient way to evaluate iron intake qualitatively and the influence of other diet components on iron absorption (facilitators and inhibitors). In addition, this instrument could be helpful to determine how the combination of foods and drinks might affect iron status and thus, be able to provide practical clinical and public health information on the prevention of iron deficiency in pregnant Mexican women.
Because the application of the FeP-FFQ is easy, it will allow further studies to analyze the relationships between dietary patterns, iron status, and its risk factors among pregnant women and the baby. Likewise, because this tool is quick to be applied, simple, and relatively inexpensive, it might be helpful in epidemiological studies to determine the iron intake of pregnant women qualitatively, contributing to the early detection of risk factors related to iron deficiency and anemia prevention among Mexican pregnant women, and in consequence, reduction in health expenses.
Moreover, the implementation of this tool could also be helpful to develop and/or evaluate programs and public policies aiming at the optimal iron status of populations and, therefore, healthier pregnancies and babies. Further studies might explore the associations between dietary patterns and iron-deficiency anemia, through biomarkers, for a complete picture of the validity of this tool.