Exploring Risk Factors of Recall-Associated Foodborne Disease Outbreaks in the United States, 2009–2019

Earlier identification and removal of contaminated food products is crucial in reducing economic burdens of foodborne outbreaks. Recalls are a safety measure that is deployed to prevent foodborne illnesses. However, few studies have examined temporal trends in recalls or compared risk factors between non-recall and recall outbreaks in the United States, due to disparate and often incomplete surveillance records in publicly reported data. We demonstrated the usability of the electronic Foodborne Outbreak Reporting System (eFORS) and National Outbreak Reporting System (NORS) for describing temporal trends and outbreak risk factors of food recalls in 1998–2019. We examined monthly trends between surveillance systems by using segmented time-series analyses. We compared the risk factors (e.g., multistate outbreak, contamination supply chain stage, pathogen etiology, and food products) of recalls and non-recalls by using logistic regression models. Out of 22,972 outbreaks, 305 (1.3%) resulted in recalls and 9378 (41%) had missing recall information. However, outbreaks with missing recall information decreased at an accelerating rate of ~25%/month in 2004–2009 and at a decelerating rate of ~13%/month after the transition from eFORS to NORS in 2009–2019. Irrespective of the contaminant etiology, multistate outbreaks according to the residence of ill persons had odds 11.00–13.50 times (7.00, 21.60) that of single-state outbreaks resulting in a recall (p < 0.001) when controlling for all risk factors. Electronic reporting has improved the availability of food recall data, yet retrospective investigations of historical records are needed. The investigation of recalls enhances public health professionals’ understanding of their annual financial burden and improves outbreak prediction analytics to reduce the likelihood and severity of recalls.


Introduction
Every year,~48 million Americans get sick, 128,000 are hospitalized, and 3000 die from foodborne diseases [1]. From 2013 to 2018, the US Department of Agriculture's (USDA) Economic Research Service estimated that the value of preventing foodborne illnesses, a measure of demand for reduction in mortality risk, increased by 12%, from $12.8 to $14.4 billion (USD), respectively [1]. These estimates account for inpatient and outpatient hospital costs and costs of prescription drugs and medical supplies used to treat infected persons [1].
One safety measure deployed to prevent foodborne illnesses is food recalls, or when a manufacturer or distributor voluntarily removes food products from commerce due to their expected risk to human health [2]. In 2011, the Food Marketing Institute and Grocery Manufacturers Association reported that food recalls cost~$10 M/recall in direct costs to food companies [3]. The study further noted that~23% of annual recalls exceed~$30 M/recall in direct costs, which accounts for product retrieval, storage destruction, and regulatory In fact, the most recent 2020/2021 GAO report emphasized that gaps in the USDA/FDA foodborne illness and recall surveillance stress the need for improved surveillance capacity on these topics by the Centers for Disease Control and Prevention (CDC) [30]. Though not responsible for investigating recalls, the CDC has conducted thorough event-based surveillance of waterborne and foodborne outbreaks since 1971 and 1973, respectively [31]. In 1998, CDC surveillance transitioned from the paper-based Foodborne Outbreak Reporting System (pFORS) to electronic reporting (eFORS) [32]. In contrast to the FSIS and FDA, the CDC's records include data on where, when, how many persons, what food sources, and which pathogens are associated with outbreaks [32]. In November 2004, the CDC began identifying if outbreaks resulted in recalls and reporting recall-related traceback information [33]. In January 2009, the CDC integrated all outbreak surveillance data streams into the National Outbreak Report System (NORS), which monitors, tracks, and reports on 45 person-to-person, zoonotic, environmental, and unknown/indeterminate sources of outbreaks [32].
Despite the NORS's comprehensiveness, public health professionals have only recently begun exploring the usability of NORS for describing recall records [34]. These studies have largely explored the likelihood of increased morbidity and mortality of recalls compared to non-recalls [34]. To the best of our knowledge, no studies have utilized recall record data to examine temporal trends, particularly before and after November 2004 or January 2009. Additionally, no studies have compared risk factors between recalls and non-recalls, perhaps due to the limited completeness of surveillance records [35]. Thus, further investigation of recall records reported by the CDC is warranted.
In this study, we demonstrated the use of recall record data from electronic Foodborne Outbreak Reporting System and National Outbreak Reporting System for investigating temporal trends and risk factors of food recalls in 1998-2019. First, we described monthly trends and seasonality of recalls, non-recalls, and outbreaks with missing recall information, using segmented negative binomial regression models with respect to three delineated periods: prior to November 2004 (surveillance reporting begins under eFORS), prior to January 2009 (foodborne outbreak reporting revised to include if any product was recalled from an outbreak), and after January 2009 (surveillance reporting begins under NORS). Next, we examined risk factors (e.g., multistate exposure outbreak, contamination supply chain stage, pathogen etiology, and food products) associated with recalls, using logistic regression models. Our findings highlight improvements in CDC surveillance reporting over time; however, they still note extensive incomplete records on food recalls.

Data Source
On 4 March 2021, we requested CDC surveillance records for foodborne and waterborne outbreaks from 1 January 1998 to 31 December 2019. The NORS Foodborne and Animal Contact Team investigated the accuracy and quality of data prior to distribution. Due to the volume of surveillance records requested, data were separated into 65 data tables, each corresponding to characteristics of the time, location, pathogen, preparation/consumption location, food ingredients, and food inspection methods of an outbreak. NORS used unique outbreak identifiers to harmonize records across tables and provided a comprehensive dictionary to describe variables and their units of measurement per table [36].
Reported recall record data included the type of food product, a description of the recalled product, and the product's brand or lot number, a distinct combination of letters, numbers, or symbols that correspond to the complete history of the manufacturer, processing, packing, holding, and distribution of a product [38]. We examined the recall status according to three categories: outbreaks resulting in recalls (recalls), outbreaks not resulting in recalls (non-recalls), and outbreaks with missing recall information (missing).
Under NORS, recall-related traceback information was incorporated into the recall record. NORS classified 3 supply chain contamination points, namely before preparation (i.e., production, harvesting, packaging, and transporting), preparation (i.e., cooking, retail, consumption), and unknown. The before preparation category was further disaggregated into 3 subcategories: pre-harvest (e.g., traceback to producer farms and fields), preprocessing (e.g., traceback to leaking produce cleaning and storage facility), and unknown preparation.
NORS reports food products by using the Interagency Food Safety Analytics Collaboration (IFSAC) classification scheme. Developed in 2011 by the CDC, FSIS, and FDA, IFSAC is a 5-level food-categorization hierarchy that specifies 234 food categories [39]. Level 1 refers to the coarsest categorization of food groups (e.g., aquatic animals, land animals, plants, and other foods), while Level 5 disaggregates groups by processing, preparation, and consumption type (e.g., fermented, cured, salt-cured, etc.).
NORS reported 23 locations where ill persons prepared and consumed contaminated foods implicated in causing an outbreak, including restaurants (e.g., fast food, sit-down, and other), function halls (e.g., private home, banquet facility, and caterer), community gathering areas (e.g., daycare, school, prison/jail, religious location, camp, picnic, and fair), and other locations (e.g., grocery store, workplace, nursing home, assisted living facility, hospital, and home) [40]. These categories were not mutually exclusive; we created dichotomous variables for each location, as well as multi-location food preparation or consumption.

Investigating Temporal Trends
We conducted a segmented time-series regression analysis to investigate monthly count trends and seasonality in outbreaks and recalls over our 22-year study period. We divided our study period into 3 critical periods according to 2 critical points: November 2004, or when the CDC began food recall reporting; and January 2009, or when surveillance transitioned from eFORS to NORS (Table 1). To estimate the mean monthly counts of outbreaks, recalls, non-recalls, and outbreaks missing recall information for the entire study period, we applied a generalized linear model with a negative binomial distribution and logarithmic link function: where Y i is the estimated mean of i-outcome (e.g., outbreaks, recalls, non-recalls, and missing); β 0 is the estimated mean for i-outcome for the study period of 264 months; t is the consecutive time, in months, ranging from 1 to 264 sequentially; and z is a binary indicator variable for i-critical period ranging from 1 to 3, where the outcome of interest occurred (z = 1). By exponentiating the model's intercept, we calculated the estimated mean, exp{β 0 }, and their 95% confidence interval estimates, exp{β 0 ± 1.96se}.
The study period of 264 months divided into three critical periods was marked with knots, or critical points where a represents the start of critical period 2 at study month 82 and b represents the start of critical period 3 at study month 132. Using the selected periods, we developed a segmented negative binomial regression model to examine the temporal trends across the three critical periods for all outcomes: where Y t,i represents the monthly counts of i-outcome (e.g., outbreaks, recalls, non-recalls, and missing) in t-month; t is the consecutive time in months, ranging from 1 to 264, sequentially; and a and b are the locations of the critical points at 82 and 132 months, respectively. Moreover, t, (t − a), and (t − b); and t 2 , (t − a) 2 , and (t − b) 2 are the linear and quadratic trends of continuous time-series variables in months, respectively. In addition, sin(2πωt), sin(2πω(t − a), and sin(2πω(t − b) ; and cos(2πωt), cos(2πω(t − a)), and cos(2πω(t − b)) are the sinusoidal and co-sinusoidal harmonic terms, respectively, with a frequency of ω = 1/M, where M = 12 represents the length of the annual cycle in months.
We assessed the contribution of linear and quadratic trend terms in Model 4. The linear term indicated overall increases (β 1 t > 0, β 2 (t − a) > 0, β 3 (t − b) > 0) or decreases (β 1 t < 0, β 2 (t − a) < 0, β 3 (t − b) < 0), while the quadratic term indicated acceleration (β 1 t 2 > 0, β 2 (t − a) 2 > 0, β 3 (t − b) 2 > 0) or deceleration (β 1 t 2 < 0, β 2 (t − a) 2 < 0, β 3 (t − b) 2 < 0) within each critical period. We calculated the trend contribution by multiplying each coefficient by the trend-associated time unit to recover the corresponding predicted rates: where TC i,j,m,k is the contribution of the i-outcome for j-trend (j = 1 for linear term, j = 2 for quadratic term) in the β m coefficient, with m ranging from 1 to 6 for k-continuous time series variable (e.g., a and b; for summary of model coefficients and diagnostics, see Supplementary Table S1). The trend terms across all critical periods were summed to 1.00 per outcome regression model. We determined seasonality by the significance of either harmonic term in Model 5.

Assessing Risk Factors Associated with Recalls
Based on the trend analyses, we found that very few non-recalls occurred as monthly counts of outbreaks missing recall information increased in Period 1. However, while monthly counts of non-recalls began to rise, outbreaks missing recall information decreased in Period 2. To better understand the risk factors associated with an outbreak resulting in a recall, we chose to conduct risk-factor analyses amongst recalls and non-recalls aggregated with outbreaks missing recall information, subsequently referred to as non-recalls.
In these analyses, we considered the following risk factors: multistate exposure outbreaks, supply chain contamination stage, pathogen etiology, and IFSAC Level 1 category food products. We analyzed multistate exposure outbreaks by using a binary variable where single-state exposure outbreaks were the reference. We analyzed supply chain contamination stage by using a 3-level categorical variable (i.e., before preparation, preparation, or unknown) where before preparation was the reference. We analyzed IFSAC Level 1 category food products by using a 4-level categorical variable (i.e., land animals, aquatic animals, plants, or other) where land animals were the reference. We restricted our analyses and independently evaluated 5 etiologies (i.e., Salmonella, E. coli, Listeria, norovirus, and scombroid toxin), as they attributed to 46.8% of all outbreaks and 78.7% of all recalls. We analyzed pathogen etiology by using a binary variable indicating whether the specific pathogen was associated with the outbreak or not.
First, we continued to explore patterns of missingness among risk factors, using frequency tables. Second, we compared differences in frequencies of recalls and non-recalls. Third, we examined the likelihood of a recall with each factor, using univariate logistic regression models. Lastly, we performed multivariate models in a stepwise order, where parameters were specified in accordance with univariate findings: where R i is a recall for i-outbreak (reference: non-recall and missing combined); S i is a binary variable indicating multistate exposure of illness for i-outbreak; C i is a categorical variable indicating the supply chain contamination stage of i-outbreak; F i is the IFSAC Level 1 category for i-outbreak; and D i is a binary variable indicating specific etiology associated with i-outbreak.
In a sub-analysis, we examined the likelihood of identifying recalls (n = 305) during the before-preparation supply chain stage, using logistic regression models and the following risk factors: IFSACL Level 1 category, pathogen etiology, and preparation and consumption locations. We created a dichotomous variable for contamination stage (i.e., before preparation or preparation) by setting outbreaks of unknown preparation stage to missing and using preparation stage as the reference. We restricted our analyses to the 3 most common locations for preparation and consumption (i.e., home, diner, restaurant and other), which accounted for 38.7% and 54.1% of all outbreaks and all recalls, respectively. We analyzed preparation and consumption locations by using a binary variable indicating whether the specific location was associated with the outbreak resulting in a recall or not.
We explored associations between supply chain stage and outbreak etiology, food product, and location of preparation or consumption: where C r is the r-recall identified in before-preparation supply chain stage; F r is the IFSAC Level 1 category for i-outbreak; D r is a binary variable indicating specific etiology associated with i-outbreak; and L r is a binary variable indicating specific locations where persons prepared or consumed contaminated foods associated with r-recall.
We defined statistical significance as α < 0.05. We evaluated model goodness-of-fit for all models by using the Akaike's Information Criterion (AIC). We performed data extraction, alignment, management, and cleaning by using Excel 2016 Version 16.59 and Stata SE/16.1 software. We conducted statistical analyses and created data visualizations by using Stata SE/16.1 and RStudio Version 1.2.5042 software.  (Table 2).  Figure 1).

Investigating Temporal Trends
Across the entire study period, outbreaks, non-recalls, and outbreaks missing recall information increased by 0.06%/month, 0.24%/month, and 1.90%/month, respectively, whereas recalls decreased by 0.01%/month (Supplementary Table S1; Figure 1). We found no significant linear or quadratic trends in monthly outbreaks in Period 1, though outbreaks with missing recall information steadily decreased by 1.03%/month. Though recall information was not formally collected until November 2004, NORS does report non-recalls consistently from January 1998 to November 2004. Non-recalls in Period 1 decreased at an accelerating rate of 1.78%/month (−3.20, −0.34); p = 0.014). Unexpectedly, before the official collection of recall status data, in Period 1, NORS reported one recall in April 1998, June 2002, and April 2004; and two recalls in June 2004.
Across critical periods, we found that outbreaks decreased by 0.77%/month from Periods 1 to 2 and increased by 0.16%/month and 0.72%/month from Periods 2 to 3 and Periods 1 to 3, respectively. Similarly, outbreaks missing recall information decreased by 6.87%/month from Periods 1 to 2 but increased by 2.97%/month from Periods 2 to 3 and Across critical periods, we found that outbreaks decreased by 0.77%/month from Periods 1 to 2 and increased by 0.16%/month and 0.72%/month from Periods 2 to 3 and Periods 1 to 3, respectively. Similarly, outbreaks missing recall information decreased by 6.87%/month from Periods 1 to 2 but increased by 2.97%/month from Periods 2 to 3 and 7.77%/month from Periods 1 to 3. In contrast, both non-recalls and recalls increased by 2.14%/month and 3.02%/month, respectively, between Periods 1 and 2, followed by decreases of similar magnitudes from Periods 2 to 3 (2.14%/month and 2.88%/month, respectively). Both non-recalls and recalls increased slightly between Periods 1 and 3 (0.27%/month and 1.15%/month, respectively).
When examining trend contributions and modeling diagnostics, we found that linear trends contributed to 96.6-98.5% of the overall trend for all models compared to just 1.5-3.4% for quadratic terms (Supplementary Table S2). The model fit improved in Model 4, as indicated by a~0.52-12.4% reduction in AIC for all outcomes. These findings suggested the need for inclusion of quadratic terms when examining seasonal patterns of outcomes.
Outbreaks, non-recalls, and outbreaks with missing recall information demonstrated significant seasonality in at least one critical period (Supplementary Table S1; Figure 2). While seasonal patterns of outbreaks appeared visually in all periods, harmonic terms were only significant in Period 1. Non-recalls had significant seasonal patterns in both Periods 1 and 2, whereas outbreaks with missing recall information had significant seasonal patterns in Period 2 only. Though insignificant, outbreaks with missing recalls appeared to have a seasonal pattern in Period 1, also with maximum counts reported in both May and December. All outcomes shared similar patterns, such that maximum counts occurred in April/May, while minimum counts occurred in September/October. the need for inclusion of quadratic terms when examining seasonal patterns of outcomes.
Outbreaks, non-recalls, and outbreaks with missing recall information demonstrated significant seasonality in at least one critical period (Supplementary Table S1; Figure 2). While seasonal patterns of outbreaks appeared visually in all periods, harmonic terms were only significant in Period 1. Non-recalls had significant seasonal patterns in both Periods 1 and 2, whereas outbreaks with missing recall information had significant seasonal patterns in Period 2 only. Though insignificant, outbreaks with missing recalls appeared to have a seasonal pattern in Period 1, also with maximum counts reported in both May and December. All outcomes shared similar patterns, such that maximum counts occurred in April/May, while minimum counts occurred in September/October.

Comparing Risk Factors-Food Recalls
The temporal analyses showed that the reporting of recalls and non-recalls began in Period 2 and continued through Period 3. In comparison, outbreaks missing recall information largely occurred in Period 1, with minimal reporting during Periods 2 and 3. Due to the opposite trends seen in non-recalls and outbreaks missing recall information over the study period, we continued to explore missingness amongst risk factors for outbreaks resulting in a recall with those resulting in non-recalls combined with outbreaks missing recall information.
We found extensive missing data among outbreak risk factors (Table 3). Only 7.6% of outbreaks (51.5% of recalls and 7.0% of non-recalls) had non-missing records for all factors (Figure 3). The location of outbreak exposure had no missing data in our study period. In contrast, 75.9% of outbreaks (n = 17,292) had missing supply chain contamination-stage data, including 41.3% of recalls (n = 126) and 76.3% of non-recalls (n = 17,166). While only 3.28% of recalls had missing etiology information (n = 10), nearly one-third of non-recalls failed to report this risk factor (n = 7383; 32.4%). Similarly, 12.8% of recalls (n = 39) failed to report IFSAC Level 1 information compared to 68.6% of non-recalls (n = 15,459). In a subanalysis of IFSAC, reporting proved even scarcer in further disaggregated subcategories, with 3.08% of outbreaks missing IFSAC Level 2 (n = 125 of 4058 outbreaks) and 20.92% of outbreaks missing IFSAC Level 3 (n = 823 of 3933 outbreaks) (Figure 4). Both recalls and non-recalls had limited missing records for the preparation (9.51% and 5.01%, respectively) and consumption location (10.16% and 5.23%, respectively) of contaminated foods. Table 3. Frequency and percentage of outbreaks with missing information by recall status (total, recall, and non-recall) and risk factor. We extracted data from the electronic Foodborne Outbreak Reporting System (eFORS) and National Outbreak Reporting System (NORS) in 1998-2019. Risk factors include location of residence for ill persons, etiology of contaminant, location of preparation and consumption of contaminated foods, Interagency Food Safety Analytics Collaboration (IFSAC) Level 1 food categorization, and supply chain contamination stage. We list risk factors in ascending order by percentage of outbreaks with missing information.  Figure 3. A Sankey Diagram of the distribution of recalls and non-recalls with and without missing information across all risk factors examined by using Model 6. We report frequencies and percentages of foodborne and waterborne outbreaks associated with each risk factor for all 22,792 outbreaks reported by the electronic Foodborne Outbreak Reporting System (eFORS) and National Outbreak Reporting System (NORS) in 1998-2019. Blue and orange colors define recalls and non-recalls, respectively. Outbreaks with missing recall status or risk-factor information are defined with orange terminal nodes. We calculated percentages according to the frequency of observations available for each risk factor, which include recall status, single-or multistate exposure outbreak, supply chain contamination stage, Interagency Food Safety Analytics Collaboration (IFSAC) Level 1 food categorization, and etiology of contaminant. Other IFSAC 1 includes outbreaks associated with Other (n = 36), Unclassifiable (n = 33), Undetermined (n = 229), and Invalid (n = 3) food products. For contaminant etiology, we list the 5 etiologies of interest in our study (Salmonella, E. coli, norovirus, Listeria, and scombroid poisoning), as well as Other Etiology to account for contaminants not considered in our analyses.  Blue and orange colors define outbreaks with non-missing and missing recall status information, respectively. We calculated percentages according to the frequency of observations in each level. Level 1 describes overarching food groups, including aquatic animals, land animals, plants, and other foodstuffs. Level 2 further categorizes groups into fish/shellfish, other aquatic animals, dairy, game, meat/poultry, eggs, oils/sugars, produce, grains/beans, and seeds/nuts. Level 3 provides more refined categories by specific food subtypes.
We found that 58.3% of recalls (n = 164) were the result of multistate exposure outbreaks compared to only 1.70% of non-recalls (n = 382; Table 4). The univariate analyses demonstrated that the odds of multistate exposure outbreaks resulting in a recall were 24.75 times (18.87, 32.55; p < 0.001) that of single-state exposure outbreaks-the singlemost influential risk factor found. Similarly, 30.16% of recalls (n = 92) were associated with plant food products, compared to 5.36% of non-recalls (n = 1205); the odds of plant foods resulting in recall were 74% higher (OR = 1.74, 1.31, 2.31; p < 0.001) than outbreaks associated with land animals or their byproducts. In contrast, only 2.62% of recalls (n = 8) occurred within the preparation supply chain stage, whereas 47.21% of recalls (n = 144) occurred within the before-preparation stage. We found that the odds of recall following a preparation-stage outbreak were 95% lower (OR = 0.05, 0.01, 0.11; p < 0.001) than a recall following a before-preparation-stage outbreak.

Outbreak
Recall (  Locations where contaminated foods were prepared had nearly identical patterns with respect to recall status as with consumption locations. We found that 17.70% (n = 54) and 18.69% (n = 57) of recalls had contaminated foods prepared or consumed, respectively, in multiple locations compared to only 7.16% (n = 1611) and 4.41% (n = 991) of nonrecalls. Outbreaks with multiple locations for preparation and consumption had odds of 3.59 times (2.57, 4.93) and 4.75 times (3.39, 6.56), respectively, that of single preparation or consumption location outbreaks to result in a recall. Similarly, we found that 22.30% (n = 68) and 40.33% (n = 123) of recalls were either prepared or consumed at the home, respectively. Outbreaks with at-home preparation and consumption had odds of 1.36 times (1.00, 1.83) and 2.11 times (1.62, 2.73), respectively, that of outbreaks with away-fromhome preparation or consumption to result in a recall. In contrast, outbreaks with food preparation or consumption at restaurants had 80-82% lower odds (0.11, 0.32) of resulting in a recall compared to non-restaurant outbreaks.
We found similar patterns when examining the combined effect of all risk factors, with fully adjusted multivariate models having the lowest reported AIC values (Table 5;  Supplementary Table S3). Irrespective of contaminant etiology, multistate exposure outbreaks had odds that were 11.00-13.50 times (7.00, 21.60) that of single-state exposure outbreaks to result in recall (p < 0.001). In contrast, outbreaks where supply chain contamination occurred in the preparation and unknown stages had 93-97% and 53-62% lower odds, respectively, of resulting in a recall (p < 0.05) compared to the before-preparation stage. Though outbreaks associated with other foods had significantly greater odds to result in recall compared to land animals, we assumed that the results were spurious due to small sample size within this category.
Across contaminant etiologies, we found that Listeriaand norovirus-associated outbreaks had odds of 5.81 times (2.20, 16.40) and 4.93 times (2.39, 9.82) that of non-Listeria and non-norovirus outbreaks of resulting in recall, respectively (p < 0.001). Though of a lesser magnitude, E. coli-associated outbreaks had similarly higher odds of 1.86 (1.08, 3.18) resulting in a recall compared to non-E. coli outbreaks. We found no significant findings for either Salmonellaor scombroid-poisoning-associated outbreaks.

Comparing Risk Factors-Supply Chain Contamination Stage
After comparing risk factors by recall status, we aimed to examine the likelihood of supply chain contamination in the preparation stage compared to the before-preparation stage among recalls. This analysis would have provided critical information on where within the supply chain recalls commonly occur to inform guidelines for improving outbreak analytics to enhance food traceability in accordance with the 2020 New Era of Smarter Food Safety Blueprint [21,22]. However, due to an insufficient sample size, we were unable to perform these logistic regression analyses.
Of the 305 recalls identified in our study period, 144 recalls (47.21%) were identified in the before-preparation supply chain stage, while only 8 and 27 recalls (5.56% and 8.85%, respectively) were identified in the preparation or unknown stages. Among the beforepreparation stage recalls, we found that 20.83% and 28.47% (n = 30 and n = 41, respectively) occurred within the pre-harvest and pre-processing stages, respectively, compared to 36.73% (n = 396) and 9.46% (n = 102) of the 1078 before preparation stage non-recalls. Table 5. Logistic regression results examining the likelihood of foodborne and waterborne outbreaks resulting in food recalls, as reported by the electronic Foodborne Outbreak Reporting System (eFORS) and National Outbreak Reporting System (NORS) in 1998-2019. We selected risk factors according to univariate logistic regression results and added factors in a stepwise order. Risk factors include multistate exposure outbreaks (reference: single-state exposure outbreaks), supply chain contamination stage (reference: before-preparation stage), IFSAC Level 1 food categories (reference: land animals), and presence of a contaminant etiology (reference: absence of or unknown etiology). We report fully specified models for 5 contaminant etiologies, namely Salmonella, E. coli, Listeria, norovirus, and scombroid poisoning associated outbreaks. We report the odds ratio estimates (and 95% confidence intervals), Akaike's Information Criterion (AIC), and the number of observations per model. Superscripts indicate statistical significance at p < 0.001 ( a ), and p < 0.05 ( b ).

Discussion
Our study demonstrated the usability of CDC foodborne national surveillance records for investigating food recalls. In doing so, we described temporal trends of recalls for the past two decades and identified risk factors most likely to drive recall occurrence. We found that, while improving since the transition from eFORS to NORS, recall records and information on recall-related risk factors were largely incomplete. Approximately 41.1% (n = 9378) of the 22,792 outbreaks reported from 1 January 1998 to 31 December 2019 had a missing recall status. However, our findings suggest that outbreaks missing recall information occurred most frequently before November 2004, with substantial improvements after November 2004 and January 2009, following changes in data-collection methods and reporting standards. Furthermore, only 7.6% of outbreaks (51.5% of recalls and 7.0% of nonrecalls) had non-missing records for all factors. These findings alone suggest that current publicly available surveillance records may be insufficient to adequately investigate the financial and human-health burdens of food recalls and foodborne/waterborne outbreaks more broadly.
The New Era of Smarter Food Safety Blueprint aims to enhance interagency communications, design interoperable tools, and improve the timeliness of foodborne outbreak responses [21,22]. While acknowledging the importance of data quality, the Blueprint fails to promote interagency harmonization of existing recall surveillance systems between the FDA, USDA, and CDC. Both the FDA and USDA report traceback information on expenses, manufacturers, and volume of recalled foods not currently traced by the CDC [2,12,13], whereas the CDC traceback investigations identify supply chain contamination locations and where ill persons prepared and consumed contaminated foods. Harmonizing recall record data across these agencies could lead to more comprehensive estimates of healthcare and economic burdens, a better understanding of the impact food recalls has on food waste, and predictive analytics of foodborne outbreaks.
Furthermore, reporting standards impede the ease of temporally or spatially aligning data across agencies or other environmental datasets. In contrast, eFORS and NORS provide comprehensive information on all foodborne outbreaks, thus enabling both descriptions of temporal trends and comparisons of recall-associated risk factors. However, 41.3% and 12.8% of recall-associated records lack information on supply chain contamination stage and IFSAC Level 1 grouping. Other food-and waterborne disease research has explored the supplementation of food-safety surveillance systems with hospitalization records for more precise and complete reporting of notifiable diseases [41,42]. By refocusing collaborative efforts toward interdepartmental data harmonization and considering triangulation of additional public-health-system data, these agencies can create more comprehensive and complete outbreak and recall surveillance vital to understanding food traceability at refined spatiotemporal scales.
Though the volume and velocity of newly reported data increase annually, the CDC must continue to allocate fiscal and personnel resources to check the quality and accuracy of reported data. From January 1998 to November 2004, we found a consistent decrease in the reporting of outbreaks with missing recall information. However, we also found consistent reporting of non-recalls and five reported recalls, despite these records preceding the formal mandate to conduct recall traceback investigations. These findings may reflect the CDC's attempt to modify historic records, as this will greatly improve the precision and accuracy of temporal trend and risk factor analyses on recalls in future studies. However, these findings may also reflect reporting anomalies requiring further investigation by CDC data-quality and accuracy personnel. Overall, our finding further illustrates the need for interagency collaboration on and greater attention to improving the quality of existing data amidst plans for strengthening surveillance capacity [21,22].
In fact, temporal trends from November 2004 to December 2019 already demonstrated the advantages of regulatory oversight and enforcement of improved data-reporting protocols. After the standardization of reporting of food recalls, found that outbreaks with missing recall information decreased at an accelerating rate, by~25%/month, while nonrecalls decreased at a decelerating rate, by~10%/month. The expansion of surveillance capacity from eFORS to NORS brought further reductions in outbreaks with missing recall information at a decelerating rate, by~13%/month. Such extensive reductions in missing recall information over time illustrate the importance of standardized outbreak surveillance reporting and improved usability of CDC surveillance data for investigating food recalls over time.
In our study period, recalls increased by~3%/month after the beginning of standard traceback investigations and by~3%/month, again, after the transition to NORS. However, our trend analysis demonstrated that recalls steadily decreased by~3%/month from January 2009 to December 2019. These trends reflect the improvements in food traceability in the supply chain and, thereby, the mitigating of food recalls after the enactment of the Food Safety Modernization Act in 2009 [43]. Signed into law in 2011, this legislation enabled the FDA to impose mandatory produce safety standards, controls, and inspections for potential hazards in food production, distribution, transport, and retail facilities [44]. Subsequent appendices to the law have mandated increased frequency of food safety inspections, distribution of supply chain records, and testing of food company products to improve early detection and warning of potential outbreaks [45]. Continued support for regulatory oversight and technological advancement on food traceability throughout the supply chain is critical for the continued reduction and prevention of recall events.
Investigations of seasonality can also improve emergency and incident response coordination and enhance early warnings of foodborne outbreaks. In our prior work, we demonstrated stable seasonal patterns of foodborne illnesses in the United States and how these patterns can be examined and understood visually [8,27,46,47]. Across all studies, outbreak peak timing ranged from early July to late August for most enteric infections. More recently, we found that foodborne outbreak severity, measured using an 11-metric index score, similarly peaked in June-September (Simpson et al. (Personal Communication)). In this study, we found that foodborne outbreaks slightly preceded illness and outbreak severity peaks, as the maximum count of outbreaks occurred in April/May, with minimal counts in September/October. These findings suggest that outbreaks and illness may have synchronized seasonal patterns requiring further investigation to determine the exact lags between seasonal peaks of illnesses, outbreaks, outbreak severity, and recalls. The early onset of outbreaks further emphasizes the need for increased product testing, safety inspections, and toxicologicalhazard screenings in April-June annually. However, we cannot discount that these temporal patterns may also reflect changes in annual resources and make the efficient identification of contaminated products through the harmonization of trace-back data even more crucial to food safety [48]. As 47.21% of recalls were associated with the before-preparation supply chain stage, our results suggest that food traceability operations and data reporting must more closely target pre-harvest and preprocessing techniques among producers [21,22]. This will improve data completeness and allow for a closer examination of food traceability, using CDC surveillance data, which we could not perform, due to sample size limitations. Improved monitoring of food safety earlier in the supply chain may reduce both the volume and severity of seasonal outbreaks and illnesses.
Among risk factors, we found that multistate exposure outbreaks consistently had odds of~10-15 times that of single-state exposure outbreaks to result in a recall. This underscores one of the Blueprint's main directives of enhanced outbreak responsiveness, rapid traceback deployment, and strengthened root-cause analyses to identify the location of outbreaks and recalls [21,22]. In doing so, multistate exposures outbreaks can be more thoroughly contained to minimize the volume and severity of ill persons per recall. Furthermore, improved traceback investigations will better identify food distribution and retail pathways to mitigate the expansiveness of outbreaks within the supply chain. These efforts must more readily target outbreaks associated with E. coli, Listeria, and norovirus, as these thee etiologies had odds of~1.5-6 times that of non-E. coli, non-Listeria, and non-norovirus outbreaks to result in food recalls.
Our study was subject to several limitations. First, recall records were vulnerable to reporting bias of high-priority pathogens and food products that are most burdensome because they cause pathogen-related deaths. In 2015, just five pathogens (i.e., Salmonella, Toxoplasma gondii, Listeria monocytogenes, Campylobacter, and norovirus) caused 90% of the economic burden imposed by foodborne outbreaks [49]. Similarly, in this study, we found that Salmonella, Listeria and norovirus outbreaks predominantly resulted in a food recall. Second, further risk factor analysis by IFSAC Levels 2-5 and the sub-analysis of recalls in the preparation stage compared to the before-preparation stage were not possible, due to insufficient sample size. These analyses would have determined major food types or subtypes and supply chain contamination locations with higher probabilities of contamination resulting in a recall, thus informing the prioritization of traceback food products and locations by regulatory agencies. Next, the pathogen etiology and the location of preparation and consumption of contaminated foods were not originally mutually exclusive variables. By creating multi-pathogen and multi-location food preparation or consumption variables, we might have introduced potential multiplicity when comparing specific pathogen etiology, or location of preparation or consumption to their respective reference groups. Lastly, we paid sufficient attention to missing data and the structure of the missing data [50]. On the surface, we could handle missing data by using imputation; however, due to structural missingness, this could create bias.
To our knowledge, this study is the first to utilize NORS recall record data to examine temporal trends, particularly before and after November 2004 and January 2009. Additionally, our comparison of risk factors between recalls and non-recalls highlighted existing biases in reporting influenced by available resources or outbreak healthcare and economic burden. Future directions should include more granular analyses of contaminant etiology and preparation and consumption locations and explore the relationship of these risk factors on foodborne outbreak severity. With the FDA taking a new approach to food safety via the New Era of Smarter Food Safety Blueprint, we urge food-safety and public-health agencies to collaborate more closely and standardize data-reporting protocols, thereby improving the spatiotemporal alignment and harmonization of publicly reported national surveillance databases on food recalls.

Conclusions
Food recalls impose an extensive fiscal burden on the food economy in the United States, in addition to recall-and outbreak-associated foodborne illnesses. However, current national surveillance systems lack sufficient data quality and completeness for establishing precise and accurate early outbreak and recall detection and warnings. While data quality has improved over time, as the result of federal food-safety policies, further regulatory oversight is still needed. Future policy regulations must standardize timely and thorough data reporting of food recall and outbreak events to improve the traceability of food throughout the supply chain and responsiveness to multistate exposure outbreak events.
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/ijerph19094947/s1. Table S1: Summary of model coefficients and diagnostics estimated by using segmented negative binomial regression analyses for linear, quadratic, and harmonic trends; Table S2: Estimated trend contribution of linear and quadratic trend, using Model 6 for outbreaks, recalls, non-recalls, and outbreaks missing recall information; and Table S3: Logistic regression results examining the likelihood of foodborne and waterborne outbreaks resulting in food recalls.
Author Contributions: E.S., R.B.S. and L.E.S. contributed to the data extraction and statistical analyses performed to complete this manuscript; E.S., R.B.S. and Y.Z. contributed to the construction of visual aids and validation of both extracted data and statistical methods; E.S. contributed to the original drafting of the manuscript, while E.S. and R.B.S. contributed to the editing of manuscript text; E.N.N. contributed to the review and editing of the manuscript, as well as project supervision, administration, and funding acquisition. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: The Centers for Disease Control and Prevention (CDC) publicly report records for the electronic Foodborne Outbreak Reporting System (eFORS) and National Outbreak Reporting System (NORS) on their data dashboard [36]. We received more detailed records than provided publicly through a formal data request with the NORS Foodborne and Animal Contact Team.