# Forecasting Seasonal Vibrio parahaemolyticus Concentrations in New England Shellfish

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Sites, Environmental Sampling and Bacterial Analysis

#### 2.2. Oyster Sample Collection and Processing

#### 2.3. Statistical Analysis

#### 2.3.1. Model Development Strategy

_{t}) and applied a Gaussian family distribution with an identity link function relating the expected value of response variable Y

_{t}to selected predictors [47,48]. The transformation of water temperature, salinity, pH, DO, turbidity, CHL, TDN and rainfall was also explored as response variables in seasonality analysis and as predictor variables for V. parahaemolyticus in regression analysis with log or log + 1. We assessed the shape of relationships (linear and non-linear) between V. parahaemolyticus concentrations in oysters and environmental predictors. Variables that were significant in univariate regression were used to develop multiple regression models. We also assessed seasonality and trends over time and explored alternative variables representing seasonality with respect to their ability to improve the stability of forecasting. Assumptions of inter-correlation among predictors were evaluated using Spearman correlation analysis. Below we provide the detailed description of model building.

#### 2.3.2. Seasonality and Trend Analysis

_{0}is the intercept, t is the daily time series, ${\beta}_{1}$ indicated a general trend in the outcome of interest, β

_{s}and β

_{c}are the coefficients of the harmonic terms and ω is the term representing the annual cycle (365.25 days, ω = 1 / 365.25). The harmonic terms in Model 2 are expected to depict the periodic oscillation that can also be captured by the β

_{p}in Model 1. The phase shift of periodic oscillations identified by Model 2 was determined as follows:

^{2}) value. The trend term was determined to be non-linear based on visual assessment, positive ∆AIC and positive ∆r

^{2}and ∆Deviance > 0.1.

#### 2.3.3. Extreme Value Trend Analysis

#### 2.3.4. Variable Selection and Non-Linearity Assessment

_{t}):

^{2}) value. Positive values indicate that the measure improved in Model 6 compared to Model 5 and negative values indicate a decrease in the model evaluation measurement. Variables were determined to be non-linear based on visual assessment, positive ΔAIC and positive $\Delta {r}^{2}$ and ΔDeviance > 0.1. When strong non-linear non-monotonic relationships were detected, we re-parametrized the predictor by centering the variable around its V. parahaemolyticus concentration maximum and created a new variable to provide biological interpretability to the model [49]. For example, a new variable for pH was created by squaring the difference between the observed pH values and the value of 7.8 selected for the centering. Re-parametrized variables are indicated as C-variable name (e.g., C-pH).

#### 2.3.5. Model Building

_{0}is the intercept and t is the daily time series; ${X}_{1,t}\dots {X}_{\mathrm{k},t}$ are the daily time series for environmental predictors, including the reparametrized centered variables and interaction terms; ${\beta}_{1}\dots {\beta}_{k}$ are the corresponding coefficients.

_{0}is the intercept and t is the daily time series; ${X}_{1,t}\dots {X}_{\mathrm{k},t}$ are the daily time series for the selected environmental predictors, including the reparametrized centered variables and interaction terms; ${\beta}_{1}\dots {\beta}_{l}$ are the corresponding coefficients. In Model 8 and 9, ${\beta}_{\mathrm{p}}$ is the coefficient of the photoperiod variable. In Model 10, β

_{s}and β

_{c}are the coefficients of the harmonic terms and ω is the term representing the annual cycle (365.25 days), as in Model 2.

^{2}) value. Model selection was based on AIC value and improvement of ${r}^{2}$ and deviance explained >0.1.

#### 2.4. Assessment of Model Forecasting Ability

^{2}), and overall residual deviance. Forecasting error was evaluated by root mean square error (RMSE).

## 3. Results

#### 3.1. V. parahaemolyticus Concentrations in the GBE, 2007–2016

#### 3.1.1. Trends and Seasonality

#### 3.1.2. Univariate Regression

^{2}and p values from 4.8%, 0.04 and 0.008 (for the unmodified pH data) to 8.6%, 0.1 and 0.0003, respectively.

#### 3.2. Sequential Model Building

#### 3.3. Model Performance Prediction

^{2}values compared to the harmonic regression (Model 10.1) and environmental model (Model 7.4) (Table 5). The fits for all three models were relatively consistent even though the significance of some variables changed between time intervals. Although the estimations of precision for the harmonic regression model across training/test datasets were slightly lower than for other models, it is advantageous because important attributes of the data can be identified. For example, the V. parahaemolyticus concentrations peaked on 222 ± 5 day of the 365.25-day period for all three intervals. Similarly, the peak timing of water temperature and salinity were stable between the overall, training and test datasets (212 ± 2 day and 251 ± 18 day, respectively).

## 4. Discussion

## 5. Conclusions

**Figure 1.**Study area and sites for oyster and water sampling in the Great Bay Estuary, New Hampshire, USA. OR = Oyster River; NI = Nannie Island.

**Figure 2.**Vibrio parahaemolyticus concentrations in oysters from NI and OR at low tide in the Great Bay Estuary (GBE) in 2007–2016.

**Figure 3.**Patterns in (

**a**) V. parahaemolyticus concentration, (

**b**) water temperature, (

**c**) dissolved oxygen, (

**d**) salinity, (

**e**) pH, (

**f**) turbidity, (

**g**) CHL, (

**h**) TDN, and (

**i**) rainfall versus the calendar day of the year superimposed from 2007 to 2016.

**Figure 4.**The number of observations per year above the 75th percentile for (

**a**) V. parahaemolyticus concentrations, (

**b**) salinity, (

**c**) TDN and between the 25th and 75th percentile for (

**d**) pH.

**Figure 5.**Loess smoothing applied to V. parahaemolyticus concentrations and (

**a**) water temperature, (

**b**) salinity, (

**c**) pH, (

**d**) DO—dissolved oxygen, (

**e**) CHL—chlorophyll-a, and (

**h**) rainfall.

**Figure 6.**Model estimations (filled circle) and observed V. parahaemolyticus concentrations (x) are superimposed by the calendar day of the year from 2007 to 2016: GLM-G for (

**a**) Model 7.4, (

**b**) Model 8.1, and (

**c**) Model 9.1 and GLM-NB for (

**d**) Model 7.4, (

**e**) Model 8.1, and (

**f**) Model 9.1. The dashed vertical line at day 170 for the hybrid model (

**b**,

**e**) marks the longest day of the year, and the dashed lines at day 222 ± 5 days and at day 221 ± 7 days indicate the calculated peak timing of V. parahaemolyticus concentration for Model 7.1 for (

**c**) GLM-G and (

**f**) GLM-NB versions.

**Figure 7.**Spearman correlation analysis of V. parahaemolyticus concentrations and environmental variables for three intervals: (

**a**) 2007–2016, (

**b**) 2007–2013 and (

**c**) 2014–2016. Red indicates positive and blue negative correlations and the degree of significance is highlighted by color intensity.

**Figure 8.**Estimates of V. parahaemolyticus concentrations (closed circle) with observed V. parahaemolyticus concentrations for: (

**a**) environmental model, (

**b**) hybrid model, and (

**c**) harmonic regression model for the training (2007–2013) and test (2014–2016) periods. The 95th percentile prediction interval is represented by the gray shading. Model fit values are shown in the upper left corner of each figure.

**Table 1.**Trend and seasonality estimates detected by Model 1 and Model 2 for V. parahaemolyticus concentrations and environmental variables (Model 1, top and Model 2, bottom).

Variable ^{a} | Coefficients ^{b} | Standard Error | r^{2} | Deviance | AIC | Peak Timing ^{c} | ||
---|---|---|---|---|---|---|---|---|

Trend | Seasonality | Trend | Seasonality | |||||

Vp (MPN/g) | 0.0005 *** | 0.57 *** | 0.0001 | 0.11 | 0.19 | 0.21 | 673.4 | |

0.0006 *** | −2.87 *** −3.66 *** | 0.0001 | 0.34 0.33 | 0.50 | 0.51 | 597.4 | 222 ± 5 | |

Water Temperature (°C) | <0.001 | 2.01 *** | <0.001 | 0.15 | 0.53 | 0.54 | 774.1 | |

0.002 * | −5.81 *** −10.22 *** | <0.001 | 0.24 0.23 | 0.93 | 0.93 | 497.9 | 213 ± 2 | |

Dissolved Oxygen (mg/L) | <0.001 | −0.31 *** | <0.001 | 0.05 | 0.22 | 0.23 | 441.5 | |

<0.001 | 1.45 *** 1.91 *** | <0.001 | 0.15 0.14 | 0.58 | 0.59 | 352.0 | 220 ± 6 | |

Salinity (ppt) | 0.001 *** | −0.19 | 0.0003 | 0.20 | 0.12 | 0.13 | 849.4 | |

0.002 *** | −4.06 *** −1.77 ** | 0.0003 | 0.76 0.72 | 0.26 | 0.28 | 825.5 | 251 ± 18 | |

pH | <0.001 *** | −0.02 * | <0.001 | 0.01 | 0.08 | 0.10 | 19.9 | |

<0.001 *** | −0.06 0.03 | 0.006 | 0.05 0.05 | 0.09 | 0.11 | 20.9 | 298 ± 98 | |

Turbidity (NTU) | −0.02 *** | 3.93 | 0.007 | 4.10 | 0.06 | 0.09 | 1723.6 | |

−0.02 *** | −6.34 −9.83 | 0.007 | 16.77 15.87 | 0.06 | 0.08 | 1716.5 | 135 ± 111 | |

Chlorophyll-a (µg/L) | −0.0002 | 0.62 *** | 0.005 | 0.0002 | 0.09 | 0.10 | 775.3 | |

<0.001 | 0.11 −2.02 *** | <0.001 | 0.65 0.61 | 0.09 | 0.10 | 778.2 | 180 ± 37 | |

Total Dissolved Nitrogen (mg/L) | <0.001 *** | −0.008 * | <0.001 | 0.005 | 0.15 | 0.16 | −229.0 | |

<0.001 *** | 0.02 0.04 * | <0.001 | 0.02 0.02 | 0.15 | 0.17 | −228.2 | 206 ± 45 | |

Rainfall (mm) | <0.001 | 0.01 * | <0.001 | <0.001 | 0.01 | 0.02 | −76.7 | |

<0.001 | −0.03 −0.07 ** | <0.001 | 0.001 <0.001 | 0.01 | 0.04 | −74.6 | 209 ± 38 |

^{a}Variables are shown for Model 1, top row and Model 2, two bottom rows for sine and cosine terms;

^{b}the significance of coefficients is indicated as *** 0.001, ** 0.01, and * 0.1;

^{c}peak timing estimates are represented by the mean and standard error values; for two parameters, dissolved oxygen (DO) and total dissolved nitrogen (TDN), the estimates reflect the seasonal nadir. AIC, Akaike’s Information Criterion.

**Table 2.**Trends of the frequency of days when V. parahaemolyticus concentrations, water temperature and salinity exceeded the 75th percentile of data and pH data were within the 25th to 75th percentile range in GBE during the period 2007 to 2016.

Year | V. Parahaemolyticus | Salinity | TDN | pH | ||||
---|---|---|---|---|---|---|---|---|

75th Percentile | 25th and 75th Percentile | |||||||

220 MPN/g | 27 ppt | 0.27 mg/L | 7.56–7.88 | |||||

n | % | n | % | n | % | n | % | |

2007 | 2/17 | 11.8% | 196/488 | 40.2% | 6/17 | 35.3% | 215/488 | 44.1% |

2008 | 2/18 | 11.1% | 10/465 | 2.2% | 0/18 | 0.0% | 148/465 | 31.8% |

2009 | 1/11 | 9.1% | 18/463 | 3.9% | 1/11 | 9.0% | 173/449 | 38.5% |

2010 | 3/14 | 21.4% | 58/451 | 12.9% | 0/14 | 0.0% | 157/451 | 34.8% |

2011 | 0/9 | 0.0% | 46/377 | 12.2% | 0/9 | 0.0% | 102/430 | 23.7% |

2012 | 3/7 | 42.9% | 135/475 | 28.4% | 0/7 | 0.0% | 217/447 | 48.5% |

2013 | 1/6 | 16.7% | 65/438 | 14.8% | 3/6 | 50.0% | 231/438 | 52.7% |

2014 | 7/22 | 31.8% | 135/432 | 31.3% | 13/22 | 59.1% | 277/432 | 64.1% |

2015 | 8/24 | 33.3% | 205/443 | 46.3% | 10/22 | 45.5% | 230/408 | 56.3% |

2016 | 8/21 | 38.1% | 266/479 | 55.5% | 4/18 | 22.2% | 289/465 | 62.1% |

**Table 3.**The relationship between V. parahaemolyticus concentrations and environmental variables and fit improvement based on linear (Model 5) and non-linear (Model 6) regression models in GBE in 2007–2017. Positive values indicate that the measure improved in Model 6 compared to Model 5 and negative values indicate a decrease in the model evaluation measurement.

Variable | Model 5 | Model 6 | $\Delta $Model 6–Model 5 | ||
---|---|---|---|---|---|

p-Value | p-Value | $\Delta {\mathit{r}}^{2}$ | $\Delta $Deviance | $\Delta $AIC | |

Water Temperature (°C) | <0.001 | <0.001 | 0.03 | 0.03 | 8.27 |

Dissolved Oxygen (mg/L) | <0.001 | <0.001 | 0.04 | 0.05 | 7.28 |

Salinity (ppt) | <0.001 | <0.001 | −0.01 | 0.0 | 0.0 |

pH | 0.009 | 0.002 | 0.14 | 0.08 | 8.48 |

Chlorophyll a (µg/L) | 0.05 | 0.09 | 0.01 | 0.29 | 0.11 |

Rainfall (mm) | 0.03 | 0.02 | 0.04 | 0.04 | −6.31 |

Turbidity (NTU) | 0.27 | 0.48 | 0.01 | 0.25 | 0.43 |

Total Dissolved Nitrogen (mg/L) | 0.38 | 0.31 | 0.02 | 0.03 | 3.20 |

**Table 4.**The sequential building of multiple regression models for V. parahaemolyticus concentrations in oysters using Gaussian (GLM-G) and negative binomial (GLM-NB) models (Models 7, 8, 9).

Model Composition ^{a} | Coefficients | St. Error | Deviance | AIC | Coefficients | St. Error | Deviance | AIC |
---|---|---|---|---|---|---|---|---|

Model 7 GLM-G | GLM-NB | |||||||

1. Temperature Salinity | 0.34 *** 0.12 ** | 0.03 0.03 | 0.54 | 586.9 | 0.34 *** 0.13 *** | 0.03 0.03 | 0.48 | 1533.4 |

2. Temperature C-pH ^{b} | 0.37 *** −4.73 *** | 0.03 0.94 | 0.57 | 583.1 | 0.41 *** −5.52 *** | 0.03 0.91 | 0.51 | 1521.6 |

3. Temperature C-pH Salinity | 0.35 *** −3.93 *** 0.07 ** | 0.03 0.99 0.03 | 0.59 | 572.5 | 0.34 *** −4.38 *** 0.07 ** | 0.02 0.93 0.02 | 0.53 | 1518.3 |

4. Temperature C-pH Salinity C-pH*Salinity | 0.35 *** 4.61 0.11 *** −0.41 *** | 0.02 3.35 0.04 0.16 | 0.61 | 567.8 | 0.34 *** 5.52 * 0.10 ** −0.53 ** | 0.02 0.03 2.97 0.14 | 0.57 | 1507.1 |

Model 8 GLM-G | GLM-NB | |||||||

1. Trend Photoperiod Temperature C-pH | 0.0003 ** −0.35 ** 0.46 *** −3.77 *** | 0.0001 0.11 0.04 0.95 | 0.62 | 564.4 | 0.0003 *** −0.32 *** 0.43 *** −4.52 *** | 0.0001 0.09 0.03 0.86 | 0.58 | 1501.9 |

2. Trend Photoperiod Temperature C-pH Salinity | 0.0002 ** −0.32 ** 0.44 *** −3.77 *** 0.02 | 0.001 0.11 0.04 0.99 0.06 | 0.62 | 565.9 | 0.0003 *** −0.32 *** 0.43 *** −4.48 *** −0.004 | 0.0001 0.13 0.04 0.89 0.03 | 0.58 | 1503.9 |

Model 9 GLM-G | GLM-NB | |||||||

1. Trend Sin(.) Cos(.) Temperature C-pH | 0.0003 ** 0.07 1.47 0.50 *** −3.78 *** | 0.0001 0.69 1.12 0.11 0.96 | 0.62 | 566.3 | 0.0003 *** −0.28 0.79 0.41 *** −4.49 *** | 0.0001 0.56 0.91 0.09 0.87 | 0.58 | 1504.2 |

2. Trend Sin(.) Cos(.) Temp C-pH Salinity | 0.0003 ** 0.15 1.47 0.49 *** −3.60 *** 0.03 | 0.0001 0.69 1.12 0.11 0.99 0.04 | 0.62 | 567.7 | 0.0003 *** −0.31 0.77 0.41 *** −4.61 *** −0.006 | 0.0001 0.57 0.91 0.09 0.89 0.03 | 0.58 | 1506.2 |

^{a}The significance of coefficients is indicated as *** 0.001, ** 0.01, and * 0.1;

^{b}C-pH data were treated as reparametrized C-pH variables.

**Table 5.**The performance of three selected models: environmental model (Model 7.4), hybrid model (Model 8.1), and harmonic regression model (Model 9.1) for three time periods: full (P1), training (P2), and testing (P3) intervals.

Model | Variable ^{a} | Time Interval | ||
---|---|---|---|---|

P1 | P2 | P3 | ||

Model 7.4 | Coefficient: Temperature | 0.34 *** | 0.37 *** | 0.31 *** |

Salinity | 0.10 *** | 0.08 ** | 0.24 ** | |

C-pH | 5.51 * | 5.12 | 266.01 *** | |

Salinity*C-pH | −0.53 *** | −0.53 *** | −11.01 *** | |

r^{2} | 0.54 | 0.58 | 0.57 | |

Deviance | 0.57 | 0.58 | 0.54 | |

RMSE | 1.91 | 1.79 | 1.96 | |

Model 8.1 | Coefficient: Trend | 0.0003 *** | 0.0003 | 0.0007 |

Photoperiod | −0.31 *** | −0.28 ** | −0.48 ** | |

Temperature | 0.43 *** | 0.45 *** | 0.44 *** | |

C-pH | −4.51 *** | −4.32 *** | -5.10 | |

r2 | 0.61 | 0.57 | 0.61 | |

Deviance | 0.58 | 0.59 | 0.53 | |

RMSE | 1.85 | 1.81 | 1.92 | |

Model 9.1 | Coefficient: Trend | 0.0004 *** | 0.0004 * | 0.0008 |

Sin(.) | −0.41 | −1.88 * | 1.72 * | |

Cos(.) | 0.63 | −1.54 | 4.66 ** | |

Temperature | 0.40 *** | 0.29 ** | 0.74 *** | |

C-pH | −4.30 *** | −4.20 *** | 1.60 | |

r^{2} | 0.61 | 0.55 | 0.63 | |

Deviance | 0.58 | 0.60 | 0.54 | |

RMSE | 1.81 | 1.82 | 1.83 |

^{a}The significance of coefficients is indicated as *** 0.001, ** 0.01, and * 0.1.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

