1. Introduction
Vibrio parahaemolyticus is the leading cause of seafood-borne gastroenteritis in the US and worldwide [
1,
2,
3]. Most strains are believed to be non-pathogenic and the strains that do cause gastroenteritis and septicemia in humans have been historically associated with warm water environments [
4,
5,
6]. Over the past decade, however, illnesses caused by
V. parahaemolyticus have become more frequent in some cold and temperate water environments where illnesses were previously rare [
7,
8,
9,
10,
11,
12,
13,
14]. This new pattern of
V. parahaemolyticus disease likely stems from a combination of observed trends, such as introduced and ecosystem establishment of pathogenic strains, increased summertime production and consumption of raw shellfish, and climate related changes causing warmer sea surface temperatures and more variable salinities [
7,
8,
13,
14,
15,
16,
17,
18,
19]. In the Northeast United States (US) where pathogenic
V. parahaemolyticus is now established, foodborne illness is most frequently acquired from the consumption of raw or undercooked shellfish [
3]. Post-harvest management has effectively reduced the incidence of
V. parahaemolyticus disease outbreaks in this region. However, illness still occurs and achieving effective post-harvest control is both resource and time intensive. Effective pre-harvest
V. parahaemolyticus forecasting tools would be valuable to shellfish growers and managers alike to make informed decisions about the
V. parahaemolyticus risk conditions at the time of harvest and potentially reduce the risk and cost of
V. parahaemolyticus management.
V. parahaemolyticus is a naturally occurring bacterial species that persists in a wide range of conditions in most marine and estuarine environments [
5,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30]. In multiple studies, temperature and salinity correlate most strongly with
V. parahaemolyticus, but the strength of this relationship varies by region and season [
31]. Similarly, nutrients, chlorophyll
a, pH and turbidity were inconsistent and depended on the region and the variability of these factors. Therefore, region and even harvest area-specific studies are necessary to provide an accurate description of the influence of environmental conditions on
V. parahaemolyticus concentration [
32].
Long-term monitoring has been established in the Great Bay Estuary (GBE) by the Northeast Center for
Vibrio Disease and Ecology at the University of New Hampshire (UNH) since 2007 [
33,
34,
35,
36]. The GBE is located on the border of New Hampshire and Maine (
Figure 1) and has a long history of studies on pathogenic
Vibrio spp. [
37,
38,
39]. It is a regionally significant estuary that experiences wide-ranging environmental, climatic, and biological conditions [
10], and thus serves as a useful model representative of regional estuaries. It is unique in that
V. parahaemolyticus illnesses are still rare [
40], although the
V. parahaemolyticus population in the Northeast is evolving [
13,
14] and commercial shellfish harvests are rapidly increasing. The ongoing surveillance enables the development of pre-harvest risk-forecasting models.
The goal of this study was to develop an integrated modeling approach to predict V. parahaemolyticus concentrations in shellfish at a pre-harvesting stage as a tool for managing this significant public health issue. We used data from 2007 to 2016 to capture long-term trends, seasonal fluctuations in a broad range of environmental and climatic predictors of V. parahaemolyticus dynamics, aiming to create a model development approach that could be transferable to other estuaries.
4. Discussion
The intrinsic link that
V. parahaemolyticus has with coastal ecosystems has been well studied and characterized. Previous studies have provided many useful site- and time-specific descriptive models for describing
V. parahaemolyticus concentration dynamics. However, few of them have been evaluated for their ability to forecast
V. parahaemolyticus dynamics, or to be generalizable and transferable to other geographic areas or time periods. A wide range of environmental conditions and ecological interactions have been reported to influence, or at least correlate with,
V. parahaemolyticus concentrations including water temperature, salinity, inorganic and organic nutrients, suspended solids-turbidity, chlorophyll-
a and plankton levels, light availability, and meteorological conditions [
4,
5,
16,
17,
27,
29,
36,
38,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62]. The temporal and spatial data analysis methods vary greatly in these studies, from simple correlation to more complex models [
31]. These have often included the application of multiple regression analysis to characterize and model the interactions between multiple environmental parameters and
V. parahaemolyticus levels [
5,
18,
28,
36,
63,
64], even though they have not been useful for forecasting
V. parahaemolyticus dynamics and risk conditions. Based on clearly observable aspects of the
V. parahaemolyticus concentration data for this study and some initial analyses, the combination of models applied here incorporate seasonality, trend and dispersion concepts to characterize
V. parahaemolyticus dynamics and accurately predict
V. parahaemolyticus concentrations. Model accuracy is in part a function of using variables that are known and consistent such as photoperiod or day of the year that are ecologically interpretable, but stable for effective
V. parahaemolyticus forecasting. This approach of seasonality and trend analysis has the potential to be transferable for developing similar forecasting models patterns of
V. parahaemolyticus dynamics in other locations.
V. parahaemolyticus concentrations in the GBE during this study followed the same pattern each year as concentrations increased rapidly each springtime as water temperatures increased, and after peak concentrations during the warmest summer conditions, decreased as water temperatures decreased in the fall each year. Such seasonality, where regular and predictable changes in environmental and climatic conditions re-occur every calendar year, tends to become more pronounced with increasing distance from the equator and is largely due to extreme temperature variation driven by variable photoperiod [
65]. Water temperature accounted for approximately 48.1% of the variation observed in
V. parahaemolyticus concentrations in this study, similar to what has been observed globally and especially in highly seasonal, temperate water regions [
27,
28,
60]. Thus, seasonality is a significant aspect of
V. parahaemolyticus concentration dynamics in temperate coastal areas like New Hampshire and the Northeast US.
Photoperiod and harmonic regression models along with correlation analysis showed that
V. parahaemolyticus concentration, water temperature, dissolved oxygen, pH, salinity and chlorophyll-
a are significantly related to variables that mirror seasonal patterns in the GBE. Likewise, these variables accurately estimate
V. parahaemolyticus concentrations in oysters. The synchronized seasonal periodic oscillation is one probable explanation for why regression modeling favors water temperature as the most significant model parameter. A complex combination of biological and physical environmental variables certainly drives
V. parahaemolyticus population dynamics. However, many of these environmental variables are, in turn, driven mainly by seasonal temperature. Therefore, the variability they contribute to
V. parahaemolyticus concentrations is not significantly different than what is provided by water temperature. For example, dissolved oxygen was negatively correlated with
V. parahaemolyticus concentrations, similar to what has been previously reported [
20] and was the second strongest variable, estimating over 32% of the variability in
V. parahaemolyticus over the course of the study in the GBE. Since
V. parahaemolyticus is a facultative anaerobe, this finding has the potential to elucidate important biological dimensions of the ecology of
V. parahaemolyticus. Water temperature is a dominant driver of dissolved oxygen concentrations so collinearity between these variables is likely. In addition, because of the constraints of mathematical modeling, well-fit models are not necessarily mechanistically or ecologically descriptive [
66] and, in this case, dissolved oxygen was omitted from model development to avoid multicollinearity in favor of water temperature as a stronger model variable.
Salinity and water temperature are both seasonally variable parameters that, together are the most commonly cited environmental drivers of
V. parahaemolyticus concentration variation [
16,
31,
67]. Salinity was a significant predictive parameter for
V. parahaemolyticus concentration in this study, though the significance of salinity was dependent on the time interval (2007–2013 versus 2014–2016) of the data and the trend adjustment in the model (
Table S2). This type of variability has also been observed in risk assessment [
68,
69] and in previous studies where salinity sometimes shows a strong positive correlation with
V. parahaemolyticus [
5,
6,
25,
63], whereas for others [
28,
53,
60,
70], salinity and
V. parahaemolyticus dynamics do not correlate. Thus, the finding that salinity and other variables reported to be significant in other
V. parahaemolyticus concentration models were not included in this study’s final model may be, at least in part, a function of both the specific conditions at this study site and time period and a function of the in-depth statistical approach used.
Though most studies find little to no correlation between pH and
V. parahaemolyticus concentration [
20,
21,
28], non-linear regression and correlation analysis identified pH as an important parameter for the predictive models in the GBE. Loess smoothing highlighted the marked non-linearity of the relationship between pH and
V. parahaemolyticus concentrations and suggested a biological optimum/optimal range for pH where
V. parahaemolyticus concentrations decreased as pH increased or decreased relative to pH 7.8. For the purposes of optimal model development, a new pH variable was constructed by reparametrizing the measurements to create a linear response in
V. parahaemolyticus as pH measurements moved from the optima of 7.8. An optimal pH of 7.8 is near the pH (8.5) of alkaline peptone water medium used to optimally enrich for
Vibrio species [
42] and has also been suggested as an optimal pH by laboratory-based observations [
71]. Wong et al., (2015) [
72] found that exposure to more acidic environments tended to reduce cell density and cause stress responses in
V. parahaemolyticus. In this study, we observe that pH measurements in the GBE appeared to become less variable and more basic in recent years, which was also reported by Lopez-Hernandez et al., (2015) [
5]. Thus, going beyond simple linear regression and including the use of non-linear analysis reveals pH as an important and ecologically linked variable to explain
V. parahaemolyticus population dynamics.
In other studies [
36,
60,
63], variables other than salinity and pH were significant for estimating
V. parahaemolyticus concentrations in univariate regression. However, in this study, they provided an insignificant amount of improvement to a multiple regression model that included water temperature. For example, chlorophyll-
a, considered a proxy measurement for phytoplankton abundance [
21,
31], was significantly related to
V. parahaemolyticus concentrations in correlation and univariate regression, but it was not significant in a multiple regression model that included water temperature. Chlorophyll-
a was thus omitted from further model development because it did not contribute additional information in describing
V. parahaemolyticus variation. Many studies have suggested an important ecological interaction between
V. parahaemolyticus and plankton [
16,
27,
57,
63,
69,
73,
74], and though chlorophyll-
a was not included in the multiple regression models, we have also conducted a parallel study to explore the relationship between
V. parahaemolyticus and plankton species across several years in the GBE [
75,
76] to determine covarying plankton species. These have included phytoplankton that have been reported to be significantly associated with
V. parahaemolyticus elsewhere [
77,
78] that could provide more in-depth insight into the importance of phytoplankton and the proxy chlorophyll-
a to the
V. parahaemolyticus concentration dynamics observed in the GBE.
Approximately half of the variability of V. parahaemolyticus in the GBE could be predicted using the contribution of photoperiod (in hours), sine and cosine of the day of the study in harmonic regression, and the day of the study. Even though the model consisting solely of environmental variables was potentially more ecologically informative, the trend and seasonality variables of calendar day of the study, photoperiod, sine and cosine were more stable to estimate and predict the patterns of seasonality and trend of increasingly high concentrations over time in V. parahaemolyticus than salinity and to a lesser degree pH and do not require in situ measurements. Additionally, evaluation of the environmental model for its forecasting ability highlighted that some evaluation measures were discordant, while the harmonic regression and photoperiod model goodness-of-fit and forecasting error were in agreement. This highlights that though multiple evaluation measures can cause complexities in model selection, in this study the model with conflicting evaluation measures may indicate underlying issues, while the models where evaluation measurements were in agreement provided stronger prediction accuracy. Harmonic regression analyses also lead to identification of the day of year for peak V. parahaemolyticus concentration that occurs in mid-August (day 222 ± 5 days) that followed the peak timing of water temperature (213 ± 2), while the longest day of the year is 21 June (day 170). This highlights a loading, or hysteresis in the system and provides the basis for understanding the ‘fall shoulder’ of elevated concentrations of V. parahaemolyticus that extend into the late September.
Peak timing was used to assess each environmental variable individually to detect how environmental variables may contribute to the development of ideal conditions for
V. parahaemolyticus. Data in this study were collected either monthly or biweekly, while
V. parahaemolyticus replicates every eight minutes under ideal conditions. In this instance, the accurate detection of lagged effects on
V. parahaemolyticus would require more frequent sampling and fine-scale temporal resolution. Due to this level of biological complexity and the irregular temporal intervals of the data in our study, the mean from 12 h proceeding collection was used for regression with environmental variables and peak timing was used to assess temporally how each environmental variable may contribute to the development of ideal conditions for
V. parahaemolyticus. Using this approach, we determined that significant predictive variables peak in advance of
V. parahaemolyticus potentially contributing to a hysteresis or loading of the systems, setting up conditions that are optimal for
V. parahaemolyticus. Davis et al., 2019 [
79], recently reported that environmental variables approximately one month proceeding collection were significant to predicting
V. parahaemolyticus concentrations in the Chesapeake Bay suggesting they might also be observing this type of lagged effect from a loading of the system. The application of harmonic regression and peak timing here demonstrates how biological complexities and limitations of sampling frequency can be overcome while also providing the resolution to detect temporal patterns between dependent and independent variables. The determination of peak timing is also a potentially important tool for forecasting the commonly observed mid-summer peaks in illnesses in the Northeast US [
80].
A major characteristic of the
V. parahaemolyticus concentration data is their wide dispersion. The comparison between Gaussian and negative binomial GLMs determined that the dispersion of
V. parahaemolyticus concentrations, especially the extreme high concentrations, was best fit by the negative binomial model, as it can better account for the wide range of
V. parahaemolyticus concentrations (0.3 to 4600 MPN/g) observed annually in the GBE. Effective risk models, with negative binomial regression as an essential model attribute, developed to predict the increasing and more dispersed
V. parahaemolyticus concentrations will become more important as global warming and other climate and ecosystem changes will probably cause increased concentrations and persistence of
V. parahaemolyticus in temperate coastal areas [
8,
81,
82,
83] with a likely increase in public health risks.
Model evaluation, estimations, and predictions illustrate how each model provides fit and prediction ability of the variability in V. parahaemolyticus concentration observed over the course of the study. Though a forecasting model consisting of environmental variables could be more appealing because of its ecological interpretability, there are potential limitations to models that rely solely on environmental predictors. For example, it is unlikely that a well-fit model can contain all the environmental variables that effect V. parahaemolyticus given its ecological complexity and the collinearity between seasonal-driven variables that relate to V. parahaemolyticus dynamics. Further, the strength of environmental variables to predict V. parahaemolyticus over time can change, as was observed in the interaction between pH and salinity between time intervals. Additionally, salinity became insignificant when the model was adjusted for a linear trend. The negative binomial harmonic regression and hybrid models fit the seasonality and trend features, and account well for the dispersion of V. parahaemolyticus. All models demonstrated good forecasting ability. Importantly, these models also enabled the determination of key characteristics of V. parahaemolyticus in the GBE including peak timing and a seasonal loading contributing to prolonged elevated concentrations that last into fall months. The hybrid model provides the optimal level of ecological interpretability a reasonable ability to capture the dynamics of V. parahaemolyticus concentrations in oysters in the GBE, and a stable platform for forecasting V. parahaemolyticus concentrations in coming seasons. Thus, the use of both significant environmental variables and stable parameters in the hybrid negative binomial regression model lead to successful forecasting model development that captures seasonality, temporal trends, and the high degree of data variability and dispersion.
The increased incidence of illnesses caused by
V. parahaemolyticus infections in the Northeast US has co-occurred with increases in regional surface water temperatures and other environmental parameters, as shown in this study, suggesting an increase in the presence of pathogenic
V. parahaemolyticus strains and/or population evolution [
13,
14]. The model approach developed in this study illustrates how characteristics of
V. parahaemolyticus dynamics can be captured as environmental conditions continue to become more favorable for the pathogen to enable accurate prediction of public health risk to shellfish consumers and recreational users of coastal waters. This information, coupled with recent advances [
13,
14,
19] that improve detection methods for endemic and invasive pathogenic
V. parahaemolyticus sequence types (ST) in the Northeast, could be useful for shellfish harvest management in the Northeast US based on this new improved and integrated capacity to forecast concentration dynamics of both total and pathogenic
V. parahaemolyticus populations and potential disease outbreak risks. The developed modeling approach also has the potential to inform more in-depth mechanistic studies in order to gain a better understanding of the ecology of
V. parahaemolyticus and other water-borne pathogens.